1. 程式人生 > >爬取facebook的思路

爬取facebook的思路

自己無聊的嘗試,不知道能不能成功,只是記錄:

1. 個人動態:

https://m.facebook.com/profile/timeline/stream/?cursor=tmln_strm%3A1341235186%3A4123521292106084490%3A0&profile_id=100003102976600&replace_id=u_z_0

抓包發現是通過post請求的,引數很複雜,但是,多次嘗試後發現可以通過get請求得到,前提是得到replace_id,replace_id就是來自上一個數請求的__req引數,__req又在心跳包裡面,不停的發生變化,發現每次請求的時候會發生的請求有:

get:
https://edge-chat.facebook.com/pull?channel=p_100003102976600&seq=1&clientid=4166e2a6&profile=mobile&partition=-2&sticky_token=588&msgs_recv=1&qp=y&cb=2838170899&state=active&sticky_pool=ash4c09_chat-proxy&uid=100003102976600&viewer_uid=100003102976600&m_sess=&__dyn=1KQdAmm1gxu4U4ifGh28sBBgS5UqxKcwRwAxu3-UcodUbE6u7HzE4p0Yxm6Uhx6484G58O0PEhxm3O3q1rwxwdC2O1gCwSxu0BU7W1KxO1ZxO3W3G1uxmcG1lwf-68WUS2G2DxK18wXCwn8mw&__req=13&__ajax__=AYnvmks18JXzR0XmAgzkyTe1jE_EqXv8w1Gy89AKwm_kyMYEQzG4asGXoRwYbKNBTc6nKql4LCx3320Uy4Y66xytbvwlhkY_SE6Qzt5UTHx3XQ&__user=100003102976600

response: for (;;); {"t":"fullReload","seq":1}

post  form_data 為空
https://edge-chat.facebook.com/sub?cb=lfnh&sticky_token=588&uid=100003102976600&viewer_uid=100003102976600&sticky_pool=ash4c09_chat-proxy&profile=mobile&clientid=4166e2a6&cap=0

response: for (;;); {"t":"pong"}
post  form_data 為空
https://edge-chat.facebook.com/sub?cb=iif3&sticky_token=588&uid=100003102976600&viewer_uid=100003102976600&sticky_pool=ash4c09_chat-proxy&profile=mobile&clientid=4166e2a6&cap=0
# 與上面一個差別在cb這個引數上
response: for (;;); {"t":"pong"}