1. 程式人生 > >百度的反爬機制很容易破解的!你看我三分鐘解決他的反爬!

百度的反爬機制很容易破解的!你看我三分鐘解決他的反爬!

這篇文章的主要目的還是破解JS引數加密,百度翻譯的JS過程並不是很複雜,非常適合新手練手。

首先,開啟百度翻譯,隨便輸點詞語,點選翻譯。

百度的反爬機制很容易破解的!你看我三分鐘解決他的反爬!

進群進群:943752371可以獲取Python各類入門學習資料!

這是我的微信公眾號【Python程式設計之家】各位大佬用空可以關注下,每天更新Python學習方法,感謝!

111111111111.png

 

不斷點選翻譯,在network中會一直跳出框框內的內容。

百度的反爬機制很容易破解的!你看我三分鐘解決他的反爬!

開啟第二個檔案v2transapi,可以看到我們需要的內容。

百度的反爬機制很容易破解的!你看我三分鐘解決他的反爬!

現在來分析一下這個檔案,它的請求方式為post,下圖是它post時所需的data。from是你輸入詞語的型別,to是需要翻譯成的型別,query是翻譯的詞語,sign和token是通過js檔案生成的;現在來找一找這兩個引數。

百度的反爬機制很容易破解的!你看我三分鐘解決他的反爬!

https://fanyi.bdstatic.com/static/translation/pkg/index_9b62d56.js:formatted是JS檔案的連結,打斷點除錯可以看到,sign是通過m(‘翻譯’ )生成的,token是通過 window.common.token生成。

百度的反爬機制很容易破解的!你看我三分鐘解決他的反爬!

其中window.common.token在頁面的原始碼中有出現 ‘04a7c540f2a1e1d6be3dee208d1b7525’;第二個框的引數後面會用到。

百度的反爬機制很容易破解的!你看我三分鐘解決他的反爬!

<pre style="-webkit-tap-highlight-color: transparent; box-sizing: border-box; font-family: Consolas, Menlo, Courier, monospace; font-size: 16px; white-space: pre-wrap; position: relative; line-height: 1.5; color: rgb(153, 153, 153); margin: 1em 0px; padding: 12px 10px; background: rgb(244, 245, 246); border: 1px solid rgb(232, 232, 232); font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">def parame():
url = '

https://fanyi.baidu.com/?aldtype=16047'
headers = {
'Accept': '/',
'Accept-Encoding': 'gzip, deflate, br',
'Accept-Language': 'zh-CN,zh;q=0.9',
'Connection': 'keep-alive',
'Content-Type': 'application/x-www-form-urlencoded; charset=UTF-8',
'Cookie': 'BIDUPSID=A0BE57EF0645F17DEC806F36F3E38844; PSTM=1531234350; BAIDUID=EEEDF0D3A7636804D4AF070CB10CC56A:FG=1; REALTIME_TRANS_SWITCH=1; FANYI_WORD_SWITCH=1; HISTORY_SWITCH=1; SOUND_SPD_SWITCH=1; SOUND_PREFER_SWITCH=1; BDUSS=VUTVlhSTlOdnhnN3pRRlNBdU0tT21KMnBUMUlJS3Z0ZlJRMzd5MlVVQU1zdmRiQVFBQUFBJCQAAAAAAAAAAAEAAAAl6sYTz8TM7LXEzqLQpk5WAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAwl0FsMJdBbZz; MCITY=-315%3A; BDORZ=B490B5EBF6F3CD402E515D22BCDA1598; H_PS_PSSID=1442_21095_28132_26350_27751_27244_27509; delPer=0; PSINO=1; locale=zh; Hm_lvt_64ecd82404c51e03dc91cb9e8c025574=1545304167; to_lang_often=%5B%7B%22value%22%3A%22zh%22%2C%22text%22%3A%22%u4E2D%u6587%22%7D%2C%7B%22value%22%3A%22en%22%2C%22text%22%3A%22%u82F1%u8BED%22%7D%5D; Hm_lpvt_64ecd82404c51e03dc91cb9e8c025574=1545305626; from_lang_often=%5B%7B%22value%22%3A%22en%22%2C%22text%22%3A%22%u82F1%u8BED%22%7D%2C%7B%22value%22%3A%22zh%22%2C%22text%22%3A%22%u4E2D%u6587%22%7D%5D',
'Host': 'fanyi.baidu.com',
'Origin': 'https://fanyi.baidu.com',
'Referer': 'https://fanyi.baidu.com/',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36',
'X-Requested-With': 'XMLHttpRequest'
}
html = requests.get(url,headers=headers).text
windows_gtk = re.findall(r';window.gtk = (.?);</script>', html)[0][1:-1]
window_bdstoken = re.findall(r'<script>window.bdstoken = (.
?);window.gtk', html)[0]
token = re.findall(r"token: (.?),", html)[0]
logid = re.findall(r'logid: (.
?),', html)[0][1:-1]

print(window_bdstoken, windows_gtk, token, logid)

return token,windows_gtk
</pre>

在來看m()函式,它的引數就是要翻譯的內容,定位到5725-5727行,這三行是用來生成一個引數u,且u的值為window.gtk,也就是上一幅圖第二個框框內的內容;為了使程式碼更加靈活,在構造js函式的時候,將引數u從定值轉為m()函式的引數,這一部分進行調整。

百度的反爬機制很容易破解的!你看我三分鐘解決他的反爬!

調整後的JS程式碼如下:

<pre style="-webkit-tap-highlight-color: transparent; box-sizing: border-box; font-family: Consolas, Menlo, Courier, monospace; font-size: 16px; white-space: pre-wrap; position: relative; line-height: 1.5; color: rgb(153, 153, 153); margin: 1em 0px; padding: 12px 10px; background: rgb(244, 245, 246); border: 1px solid rgb(232, 232, 232); font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">function a(r) {
if (Array.isArray(r)) {
for (var o = 0, t = Array(r.length); o < r.length; o++)
t[o] = r[o];
return t
}
return Array.from(r)
}
function n(r, o) {
for (var t = 0; t < o.length - 2; t += 3) {
var a = o.charAt(t + 2);
a = a >= "a" ? a.charCodeAt(0) - 87 : Number(a),
a = "+" === o.charAt(t + 1) ? r >>> a : r << a,
r = "+" === o.charAt(t) ? r + a & 4294967295 : r ^ a
}
return r
}
function e(r,u) {
var o = r.match(/[\uD800-\uDBFF][\uDC00-\uDFFF]/g);
if (null === o) {
var t = r.length;
t > 30 && (r = "" + r.substr(0, 10) + r.substr(Math.floor(t / 2) - 5, 10) + r.substr(-10, 10))
} else {
for (var e = r.split(/[\uD800-\uDBFF][\uDC00-\uDFFF]/), C = 0, h = e.length, f = []; h > C; C++)
"" !== e[C] && f.push.apply(f, a(e[C].split(""))),
C !== h - 1 && f.push(o[C]);
var g = f.length;
g > 30 && (r = f.slice(0, 10).join("") + f.slice(Math.floor(g / 2) - 5, Math.floor(g / 2) + 5).join("") + f.slice(-10).join(""))
}
for (var d = u.split("."), m = Number(d[0]) || 0, s = Number(d[1]) || 0, S = [], c = 0, v = 0; v < r.length; v++) {
var A = r.charCodeAt(v);
128 > A ? S[c++] = A : (2048 > A ? S[c++] = A >> 6 | 192 : (55296 === (64512 & A) && v + 1 < r.length && 56320 === (64512 & r.charCodeAt(v + 1)) ? (A = 65536 + ((1023 & A) << 10) + (1023 & r.charCodeAt(++v)),
S[c++] = A >> 18 | 240,
S[c++] = A >> 12 & 63 | 128) : S[c++] = A >> 12 | 224,
S[c++] = A >> 6 & 63 | 128),
S[c++] = 63 & A | 128)
}
for (var p = m, F = "" + String.fromCharCode(43) + String.fromCharCode(45) + String.fromCharCode(97) + ("" + String.fromCharCode(94) + String.fromCharCode(43) + String.fromCharCode(54)), D = "" + String.fromCharCode(43) + String.fromCharCode(45) + String.fromCharCode(51) + ("" + String.fromCharCode(94) + String.fromCharCode(43) + String.fromCharCode(98)) + ("" + String.fromCharCode(43) + String.fromCharCode(45) + String.fromCharCode(102)), b = 0; b < S.length; b++)
p += S[b],
p = n(p, F);
return p = n(p, D),
p ^= s,
0 > p && (p = (2147483647 & p) + 2147483648),
p %= 1e6,
p.toString() + "." + (p ^ m)
}
var i = null;
</pre>

至此整個流程就分析完了,下面來構造自己的翻譯器把。順便一說,百度翻譯可以實現88種語言的互轉噢。

<pre style="-webkit-tap-highlight-color: transparent; box-sizing: border-box; font-family: Consolas, Menlo, Courier, monospace; font-size: 16px; white-space: pre-wrap; position: relative; line-height: 1.5; color: rgb(153, 153, 153); margin: 1em 0px; padding: 12px 10px; background: rgb(244, 245, 246); border: 1px solid rgb(232, 232, 232); font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">def translate(key,fro,to):
node = execjs.get()
file = '百度翻譯.js'
ctx = node.compile(open(file, encoding='utf-8').read())
token,u=parame()
js = 'e("{0}","{1}")'.format(key,u)
sign = ctx.eval(js)
print(sign)
headers = {
'Accept': '/',
'Accept-Encoding': 'gzip, deflate, br',
'Accept-Language': 'zh-CN,zh;q=0.9',
'Connection': 'keep-alive',
'Content-Type': 'application/x-www-form-urlencoded; charset=UTF-8',
'Cookie':'BIDUPSID=A0BE57EF0645F17DEC806F36F3E38844; PSTM=1531234350; BAIDUID=EEEDF0D3A7636804D4AF070CB10CC56A:FG=1; REALTIME_TRANS_SWITCH=1; FANYI_WORD_SWITCH=1; HISTORY_SWITCH=1; SOUND_SPD_SWITCH=1; SOUND_PREFER_SWITCH=1; BDUSS=VUTVlhSTlOdnhnN3pRRlNBdU0tT21KMnBUMUlJS3Z0ZlJRMzd5MlVVQU1zdmRiQVFBQUFBJCQAAAAAAAAAAAEAAAAl6sYTz8TM7LXEzqLQpk5WAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAwl0FsMJdBbZz; MCITY=-315%3A; BDORZ=B490B5EBF6F3CD402E515D22BCDA1598; H_PS_PSSID=1442_21095_28132_26350_27751_27244_27509; delPer=0; PSINO=1; locale=zh; Hm_lvt_64ecd82404c51e03dc91cb9e8c025574=1545304167; to_lang_often=%5B%7B%22value%22%3A%22zh%22%2C%22text%22%3A%22%u4E2D%u6587%22%7D%2C%7B%22value%22%3A%22en%22%2C%22text%22%3A%22%u82F1%u8BED%22%7D%5D; Hm_lpvt_64ecd82404c51e03dc91cb9e8c025574=1545307350; from_lang_often=%5B%7B%22value%22%3A%22en%22%2C%22text%22%3A%22%u82F1%u8BED%22%7D%2C%7B%22value%22%3A%22zh%22%2C%22text%22%3A%22%u4E2D%u6587%22%7D%5D',
'Host': 'fanyi.baidu.com',
'Origin': 'https://fanyi.baidu.com',
'Referer': 'https://fanyi.baidu.com/',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36',
'X-Requested-With': 'XMLHttpRequest'
}
data = {
'from': fro,
'to':to,
'query': key,
'transtype': 'translang',
'simple_means_flag': '3',
'sign': sign,

'token':'04a7c540f2a1e1d6be3dee208d1b7525'

'token':token[1:-1]
}
url = 'https://fanyi.baidu.com/v2transapi'
html = requests.post(url, data=data, headers=headers).json()
html=html['trans_result']['data'][0]
result={
html['src']:html['dst']
}
print(result)
return result
import re
def parame():
url = 'https://fanyi.baidu.com/?aldtype=16047'
headers = {
'Accept': '/',
'Accept-Encoding': 'gzip, deflate, br',
'Accept-Language': 'zh-CN,zh;q=0.9',
'Connection': 'keep-alive',
'Content-Type': 'application/x-www-form-urlencoded; charset=UTF-8',
'Cookie': 'BIDUPSID=A0BE57EF0645F17DEC806F36F3E38844; PSTM=1531234350; BAIDUID=EEEDF0D3A7636804D4AF070CB10CC56A:FG=1; REALTIME_TRANS_SWITCH=1; FANYI_WORD_SWITCH=1; HISTORY_SWITCH=1; SOUND_SPD_SWITCH=1; SOUND_PREFER_SWITCH=1; BDUSS=VUTVlhSTlOdnhnN3pRRlNBdU0tT21KMnBUMUlJS3Z0ZlJRMzd5MlVVQU1zdmRiQVFBQUFBJCQAAAAAAAAAAAEAAAAl6sYTz8TM7LXEzqLQpk5WAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAwl0FsMJdBbZz; MCITY=-315%3A; BDORZ=B490B5EBF6F3CD402E515D22BCDA1598; H_PS_PSSID=1442_21095_28132_26350_27751_27244_27509; delPer=0; PSINO=1; locale=zh; Hm_lvt_64ecd82404c51e03dc91cb9e8c025574=1545304167; to_lang_often=%5B%7B%22value%22%3A%22zh%22%2C%22text%22%3A%22%u4E2D%u6587%22%7D%2C%7B%22value%22%3A%22en%22%2C%22text%22%3A%22%u82F1%u8BED%22%7D%5D; Hm_lpvt_64ecd82404c51e03dc91cb9e8c025574=1545305626; from_lang_often=%5B%7B%22value%22%3A%22en%22%2C%22text%22%3A%22%u82F1%u8BED%22%7D%2C%7B%22value%22%3A%22zh%22%2C%22text%22%3A%22%u4E2D%u6587%22%7D%5D',
'Host': 'fanyi.baidu.com',
'Origin': 'https://fanyi.baidu.com',
'Referer': 'https://fanyi.baidu.com/',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36',
'X-Requested-With': 'XMLHttpRequest'
}
html = requests.get(url,headers=headers).text
windows_gtk = re.findall(r';window.gtk = (.?);</script>', html)[0][1:-1]
window_bdstoken = re.findall(r'<script>window.bdstoken = (.
?);window.gtk', html)[0]
token = re.findall(r"token: (.?),", html)[0]
logid = re.findall(r'logid: (.
?),', html)[0][1:-1]

print(window_bdstoken, windows_gtk, token, logid)

return token,windows_gtk
if name == 'main':

key=input('請輸入要翻譯的文字')

dic = {'中文': 'zh', '日語': 'jp', '日語假名': 'jpka', '泰語': 'th', '法語': 'fra', '英語': 'en', '西班牙語': 'spa', '韓語': 'kor',
'土耳其語': 'tr', '越南語': 'vie', '馬來語': 'ms', '德語': 'de', '俄語': 'ru', '伊朗語': 'ir', '阿拉伯語': 'ara', '愛沙尼亞語': 'est',
'白俄羅斯語': 'be', '保加利亞語': 'bul', '印地語': 'hi', '冰島語': 'is', '波蘭語': 'pl', '波斯語': 'fa', '丹麥語': 'dan',
'菲律賓語': 'tl', '芬蘭語': 'fin', '荷蘭語': 'nl', '加泰羅尼亞語': 'ca', '捷克語': 'cs', '克羅埃西亞語': 'hr', '拉脫維亞語': 'lv',
'立陶宛語': 'lt', '羅馬尼亞語': 'rom', '南非語': 'af', '挪威語': 'no', '巴西語': 'pt_BR', '葡萄牙語': 'pt', '瑞典語': 'swe',
'塞爾維亞語': 'sr', '世界語': 'eo', '斯洛伐克語': 'sk', '斯洛維尼亞語': 'slo', '斯瓦希里語': 'sw', '烏克蘭語': 'uk', '希伯來語': 'iw',
'希臘語': 'el', '匈牙利語': 'hu', '亞美尼亞語': 'hy', '義大利語': 'it', '印尼語': 'id', '阿爾巴尼亞語': 'sq', '阿姆哈拉語': 'am',
'阿薩姆語': 'as', '亞塞拜然語': 'az', '巴斯克語': 'eu', '孟加拉語': 'bn', '波斯尼亞語': 'bs', '加利西亞語': 'gl', '喬治亞語': 'ka',
'古吉拉特語': 'gu', '豪薩語': 'ha', '伊博語': 'ig', '因紐特語': 'iu', '愛爾蘭語': 'ga', '祖魯語': 'zu', '卡納達語': 'kn', '哈薩克語': 'kk',
'吉爾吉斯語': 'ky', '盧森堡語': 'lb', '馬其頓語': 'mk', '馬耳他語': 'mt', '毛利語': 'mi', '馬拉提語': 'mr', '尼泊爾語': 'ne',
'奧利亞語': 'or', '旁遮普語': 'pa', '凱楚亞語': 'qu', '塞茨瓦納語': 'tn', '僧加羅語': 'si', '泰米爾語': 'ta', '塔塔爾語': 'tt',
'泰盧固語': 'te', '烏爾都語': 'ur', '烏茲別克語': 'uz', '威爾士語': 'cy', '約魯巴語': 'yo', '粵語': 'yue', '文言文': 'wyw',
'中文繁體': 'cht'}
key='為樂為魂之語與通〜'
fro =dic['文言文']
to=dic['英語']
translate(key,fro,to)
</pre>

效果圖如下:

百度的反爬機制很容易破解的!你看我三分鐘解決他的反爬!

提前劇透一下,下一篇會介紹基於有道翻譯的自制翻譯器。。。百度