1. 程式人生 > 其它 >沫沫漫畫網Js逆向分析爬取全站資源入庫處理圖片合併

沫沫漫畫網Js逆向分析爬取全站資源入庫處理圖片合併

網站分析


  1. 開啟目標網站:https://www.momomh.com/

  2. 選擇一部漫畫作為分析物件:《渴望:愛火難耐》

  3. 進到漫畫詳情頁這裡,發現並沒有需要逆向分析。直接可以獲取漫畫資訊。隨便點選一章進去:渴望:愛火難耐-第1話

  4. F12 開啟開發者工具,選擇elements檢視原始碼。找到突破口,發現在某個script標籤下,有一段加密得字串

  5. 由裡到外對這一行程式碼進行分析,其中 _0x232c('0x7', 'T]C8') 意思就是往_0x232c 函數出入兩個值不變得字串,得到一個固定得返回結果。

    換句話說, 就是_0x232c('0x7', 'T]C8') 這個結果就是個固定值EReVr

  6. 向外擴散分析 ,其中_0xe1f02a[_0x232c('0x7', 'T]C8')]和上面分析一個套路,得到一個固定返回值

  7. 點選輸出得函式進行跳轉分析程式碼,其實就是在我們斷點的上面

  8. 稍微看下就能得出,其實這個函式就是把第一個引數作為函式, 第二個引數作為第一個引數函式的引數。

    所以總結得出打斷點處的這段程式碼就是

  9. function_0x317597(_0x1b9bc1){
    var_0x42f71a=CryptoJS[_0x232c('0x11','e*R8')]['Utf8'][_0x232c('0x12','e*R8')](_0x1b9bc1['k']);
    var_0x2ea3c6=CryptoJS[_0x232c('0x13','O3X#')][_0x232c('0x14','e*R8')](_0x1b9bc1['i'],_0x42f71a,{
    'iv':_0x42f71a,
    'padding':CryptoJS['pad'][_0x232c('0x15','oEHH')]
    });
    _0x2ea3c6=_0x2ea3c6[_0x232c('0x16','Plzz')](CryptoJS[_0x232c('0x17','fWan')][_0x232c('0x18','H8Db')]);
    if(_0xe1f02a['wmyOd'](_0x2ea3c6,'')){
    if(_0xe1f02a[_0x232c('0x19','(tyc')](_0xe1f02a[_0x232c('0x1a','RcQ4')],_0xe1f02a[_0x232c('0x1b','fWan')])){
    return'';
    }else{
    return'';
    }
    }
    imgs=_0x2ea3c6[_0x232c('0x1c','88oY')]('|');
    if(_0xe1f02a[_0x232c('0x1d','O3X#')](imgs[_0x232c('0x1e','oEHH')],0x0)){
    s='';
    len=imgs[_0x232c('0x1f','fyE)')];
    for(var_0xff1a4d=0x0;_0xe1f02a[_0x232c('0x20','e*R8')](_0xff1a4d,len);_0xff1a4d++){
    if(_0xe1f02a[_0x232c('0x21','jAwS')](imgs[_0xff1a4d][_0x232c('0x22','4of&')](_0xe1f02a[_0x232c('0x23','jAwS')]),-0x1)){
    info=_0xe1f02a[_0x232c('0x24','fyE)')](_0x2ca615,imgs[_0xff1a4d]);
    w=_0xe1f02a[_0x232c('0x25','oEHH')](info[0x1],0x96)?0x14:0x64;
    s+=_0xe1f02a[_0x232c('0x26','Plzz')](_0xe1f02a[_0x232c('0x27','zw$3')](_0xe1f02a[_0x232c('0x28','wrg$')](_0xe1f02a[_0x232c('0x29','Vsp#')](_0xe1f02a['qcRQe'](_0xe1f02a[_0x232c('0x2a','Q]pH')](_0x232c('0x2b','saz)'),w),_0xe1f02a['sSLrn']),info[0x0]),_0xe1f02a[_0x232c('0x2c','H8Db')]),_0x1b9bc1['l']),'\x22>');
    continue;
    }
    if(_0x1b9bc1['c']&&_0xe1f02a[_0x232c('0x2d','yert')](_0x1b9bc1['c'],0x0)){
    var_0x3b6771=_0x232c('0x2e','saz)')[_0x232c('0x2f','jAwS')]('|')
    ,_0x128fc0=0x0;
    while(!![]){
    switch(_0x3b6771[_0x128fc0++]){
    case'0':
    k=_0xe1f02a[_0x232c('0x30','z#4F')](_0xff1a4d,0x1);
    continue;
    case'1':
    mod=_0xe1f02a['RboUH'](k,_0x1b9bc1['c']);
    continue;
    case'2':
    if(k!=0x1&&_0xe1f02a[_0x232c('0x31','1pZZ')](mod,0x0)){}
    continue;
    case'3':
    if(_0xe1f02a['ymPPG'](_0x1b9bc1['c'],0x6)){
    if(_0xe1f02a[_0x232c('0x32','zw$3')]===_0xe1f02a[_0x232c('0x33','PgS1')]){
    returnstr[_0x232c('0x34','T]C8')](sp);
    }else{
    if(_0xe1f02a[_0x232c('0x35','PgS1')](k,0x1)||k!=0x1&&mod==0x1){
    w=0x64;
    }else{
    w=0x14;
    }
    }
    }
    continue;
    case'4':
    s+=_0xe1f02a['TWFcO'](_0xe1f02a['WngaM'](_0xe1f02a['mTPxd'](_0xe1f02a['osuEz'](_0xe1f02a[_0x232c('0x36','4of&')]+w+_0xe1f02a[_0x232c('0x37','I0J#')],imgs[_0xff1a4d]),_0xe1f02a[_0x232c('0x38','#5gG')]),_0x1b9bc1['l']),'\x22>');
    continue;
    case'5':
    w=_0xe1f02a['WPfTk'](0x64,_0x1b9bc1['c']);
    continue;
    }
    break;
    }
    }else{
    s+=_0xe1f02a[_0x232c('0x39','TX#a')](_0xe1f02a[_0x232c('0x3a','zw$3')](_0xe1f02a['qkGRr'](_0xe1f02a[_0x232c('0x3b','aS*w')]('<img\x20style=\x22width:100%;\x22\x20class=\x22lazy\x22\x20data-original=\x22',imgs[_0xff1a4d]),_0xe1f02a[_0x232c('0x3c','I0J#')]),_0x1b9bc1['l']),'\x22>');
    }
    }
    _0xe1f02a[_0x232c('0x3d','jAA%')]($,_0x1b9bc1['f'])[_0x232c('0x3e','u5iv')](s);
    }
  10. 分析這個函式這一個關鍵imgs變數處,在打個斷點

  11. 輸出一個imgs, 發現這就是我們想要的結果

  12. 分析到這裡基本已經得到完成, 下面繼續寫程式碼爬取圖片入庫等操作

  13. 總結整理一下,得出以下程式碼

  14. varloadConf={
    i:"2fwFfyil4wHJqrgEtXgpFAfgoiD47DksIXZNdbrHtA4C+iN5hH3rK3ZohZoz/tBeXkzqlFDtVhqHdceI/Lo7jUBW2z9JRmAWORxfrfO/fCP1E8jjGI4bpLDzisIaOi1X/lA0rv+pUieoftsDVSOq9hclmcV38tsTghaxT0Tqx0Z28sXK8PX93UjdLrdnqj1ESng8x25FAz9d4B5SANBO+NqKanBZ/kyYZ7q96OygRc+Qf7k29A792SQMtu20ZpA+/1PGgC4vpOZyS8No7CN7dSkfC+0tfqDCU3I6Bhixq13uJ114ryF8Cod+0d7WO5GakDr7mIjlemugfT3jprKlSFZKoNLlDt07M6MRT73QPZPIZxGkKiZlGAgYuIIWtvGXNy8wtsI7Olwkk9YIBD7TduUmMiWhNEvSqfxeVEsk1f6/r2/U/qPYJiWGgWKLwl4M0CXLaU2NV8htYyyLAA/6bSP7dTm6+hFmF/ktcJ6ow8bHsQpoVjjlIgERtptARrUjHlg567Mqk3IZRf3zQE3hqoN4iN9DvPlpKez8a8fBuPbdPB0jUj3xpCr8yoggXW9Sb9SxTXB1/yxKG2OhoboqoOK9rjxEZucp5P+AEae+UKpiN5j7SaOW8SEZ48wl6Ln0kRBmpfbodDronlR/vIXFhWZiHTEgLAifWk1fckwUEQg9IQPd1CeTlAGwgUm+1zYqKiziAs34arPp+faLL25RYrGkhU6OldrYStQsE7TDj1p1pWbDNiZBzA3+H7en2wXIBEqSvC+FXL9ODBooB9DSaGbjIUWrQ6Q+QToUVdU8uFs5siQGFQ1cpI8GAyqUD/NQPMF4mjG2IqYJUvCyJj8ZEgXG0FUAj24H1FMAWn4W3h4D+zrmGFHR+Q5jecg+tZSrsyYG0tpTJZ59lwi/+Iw8bcXALVMD1QXfkgosN1M2gMl5sBGgkGMFl2hivRs37RmUMVic2PceW9pLQO1DQLyWMfP8YhYSdjegKM7m/wEwcu9FhN3DszNgkCGhWCqIuiJfzwRIMrSozs9RSl7CcLaekWWF+08IPFcLVWCuiOTKKNXjsOZ/4VtgLCkBMfDQwVmI1pwgYwXyeOcE37PBgGqy229hafx+KPkGPBtGXMCEE1SG/9GBEU1JdvQthGmtMkMWFQ9UZS00VvwGdYArZNPXEOgjZEQRlKwRvZ/dtVRpH6T3VSPudjNxaiLVjYvhP4lXtuCHSXQ1glWIFMM+14ZBl+7VQEOAJ11+Yggqskbv/WEu0PxpK8EvnHx4QTlo0KHXwTNzz29CpejJ5LZwDBKogCsaAlkKDflfNRkhaxpJavkqi2SOX5q3R1CU1bhsPyx00c7mRnv1LIY5fXqNLLoDjlzq91tE9FdqudOuJWR/GciSCQnaXzd+Y0OgTDBN2Szach9bjr2uzW2JuoN945vHfHvKUxdcBPy1eVSqRwjkXA8zpsgETxkRutWBeW74ZQGnlDb4QgHxsxTFJd4nHJydV2W1YZd6lOosO7C6Ryl34b1MLq8qL/zgwArt/xe1qHuY2PMKIpC+zBOX/WHjxWsZs9c4RU1akfnkcl5tCxnjl1pI4NyDpEEjE2RHhXHVAQayr84tAMtNcdLoVdl9cJWRKJ87wfXfgCED5zZLUxGbg7CXk8iQZHE+RZnEQ3m979Xipn0sbT1wtqB4y5B2oFAGzX5CfrAMj3Z8tOXMftj9EZBB+Ms8Lfz1Fr0wvcT2NUwUdvdf4ZXk99r2Z2gNrEJEG9yU6lFOLONVCwkDBGHqD3J5FL7P6xHwUXTb8mXILtB2h9+hdu4s6wrHJ1y0THBM2G42DE8DXf2Ca0sztlvFvxAOOWqYuT4ENev9ows0lkXDclchGIiQ+LVGpBwBWPpWhFiZeM16UGzC0C/nSL4irO4SXvDltdhcSEuQRAxM4mQyJB0pQs3k8WDi1fi0qN8lUjnPszkun4PImxZEiVw2KLKActzPqVW+LTT2R9KkD6SbHNRBXyMdt5FSx9UkLZVa0urCweWQvKe73xmcp1S6Jkr5Ifmi21hxiCbMx50sOm6EkIRiHVhhzIEftTKVXH/ioDzUY43ROxeqTkmHc7fdpu9l0esNGnTMF+emucq9G9IsoiWPvLEnHURMlbNeKtHS5Y8K8G9cyGHe15+KsqZWv0OObys7WMzZuoKl+AtbaJCixzxdX/cHNuDPpEXvRbLVvicUjfPzt1sk6SYjd9pzyDjR5tcIPIRSoz87iJJUzH+yqTDREJKURmbIq8Pjfn+a8RU+LVyL3xFap3jSyCVPi0LbQzcGrg2E0d457I4RLTzj0JtjPnn7DnElzD+WAUNdnaKPfs6tgej47pczPTVf2TE6apwBk1joC1JICsCPN6QEm1CokvGWQgis+1rpPi2hEuC+FPPNqOfE4BTpbxBfyl/QsGEwu1VBvGXJmcgsuh7ogvGXUmx2Xo00TMTgoLrc+2t9cWtuMq1T9ralJ1wxnKFH29ZOVVvPDHZw/uzZ98f4u5L1wpQQ4PxzzwJp0DLOwgxV9vRbznZnQBWT2ABQ6z8786WdkJ6srjtzUNVKesplg+aOLeoiRLAE9UrjUTmICzS0B1u4FVAQAIldDGOmtLNTfIP7TALmyLzKnxKfCnQSwWf3ZXVEBrZugzbTHSh0uU+oiwHz9Nemgsw/HW6qwSoFii2hIRS+EKWvkIgsnCZfZP8CHZss7Cy6DEeL5GZA6jMNMNbnnafJxyqM4K6rNljP9dUFsDeWfvCFYpoOwG4JuiCSR/O9cYed05wl0Qk88pEhRm5VkmUoBmM1AnBgki4W3OEFZaALhhzI78EgfpG+Pj5NfvetvVCcwLzAIZZhWKX1pK/P6Z0fxF9vvVcqqNr0vMY4h6x2oCzK0FdeBN4lZrjnveXMDYti1nVT5mgfgmJt9xJmYpvWd7/we/zyoBESkkpPaNlyZMzBoBQrrXIzxQ2dUsi/dQVf/hfajXRKn4FdK9Upz9ky9BU9HYZR/1AyOI0LIlPc0ve2ZnB/2ZsOuoR/eWkSHuQukOlf7eG2fXUXV9gHwOqtyNgGvymOQk4rHi4hfUns8LExCjwcgb9nDQiZgZvqcaSqPB3quqN1zSTIhk12Was2C8QbbPo9hhe4i6cK4hec9VWm4sf9OKa8A7PCQgSG/dywoS+LTHaQ0vZjFqREE3Z6rKp/0Mbrffpfd2p25lxKDX7oVtmJBLmHwu8AEZWzijT1H495tz+2b8/r1RhKfgaeV1wyXVSb/AYSJAMvHu3RMnMQFkLoz+60ltuOM4HBXXUZyYlYARMyXV6PFTNnnfb8aJfldHB7CJyVBcUtSpbtybDL4+tLfWMXFfnILszUrOkQaM1SXw+6Rw2KFvkMwkSryeeVDI9j+4vovfL1I9iBt5wh/F2AF/phA08vxvRWcTcjM8RYowEE2uhdnk4q6ur2Ev00N8ggOKhnLG56yuRm2j+T5D8exgfbeiiN/iz4YP16wAvVcWX96PC/eezKNvnpu8NcenId6B4UkB2qwiP2/UuDJUpkxSCh2WAovk8mGYpXDLl4Ev1Aby5m51ePG9WGK57Rm4QQouJ45mYeZUZtf3pNllMBONTfLPtutp/TwTNJ2Y5/OqnLoXsNgo5L0Gpv4/ZeGu6oA0pmWy2Db/Jk/n0PjjLhLPtvI75Ew7H7jGtwvtKcwydsmhzjJL86X9p82piAtDUEtIC0014n3inh9/6HMJQyh2ItjSxlBxvicuhiGUf5xl0tRaiEKEchH+7R9k/s/w3iveUhGJmVBQbFDyxjtqLh8jbEjwkuD1rgihq0gDVEVcBJ8FAtz2jXNXXPlLe7FFhEebryiNnDswA8wMLMyZ5xJgnUyfxqsT5oeUVfEEkqWtTUjzRY5xCfDqgAWjvFZQg7pDkhlHSPlvsePreNGFjPpphxgjRWURKhZXXT0j6VT5PDAmlPy03pRS3i8k64WeWlOkIqb7Evp4aDTjZh/ZTVaKzeXnQ4iGeKcJTNTnN/LLQUO3Y8nhLio18M8S79rR5/4sG+zP6yO10ThOyuRlTDc/9weWEoQQEt+4TbqfB4ORHAix7S/IpCGdzV2O8ifku5/v34KMjaRWLn0UgeqfwDO2y1P7W08jOr7vkYIyzcnUUpLV/5xJ76UBiXcVbGMCg/f/Uv9dz/RENKaOEnnB1J5uvrZ4tFHc6eatNLFO9622mgGiTI6MSs6Hse9zyBE76qhDZbiDH2ENEwrTGZzXZ5YKolWlCsIau7iH/a3r/LA6iOJp28QJckUM6MFY1L8kQO/qay81528M+8Bg3U0ebGDzccsg8FmLRCNH5OBoStmLkcEKVQjBvKmKKafxbxwEz3jyW8zPkli6LqItp9Oy+Pf2NYpMSEh9r67GL2GClmrz0FfLczrHoj80M68oyRqt+EF4gzFfMMjipzzOnuVFTzwhFsyjFkEKKDY7UDCzko32pAKgd2YkqmurVa4A8/cYeat+ugcheKjkzWx3KQ1ttkXZa+gqEp1wNGCyosQVxiwdN/3SNi7ra0NGvMahLMIByJmGOidoO/efc/1kUJ7fqtVYYOJab2TLPzTAaerkMBW8WLCsFWpet05drHspv+nO3heo+mN7EF3oG6COEmJ8RdWcvDVqLQ8QPY3phg75ksqGqDYExRUZoJGsbax/2tXo8bQx5WaZMNGEXPZMeQoDvSDyxLRdIRv4k4TXRWccxSg9QNR+PZqCQZsp7bYZl/4NZ/GEU=",
    c:"0",
    k:"fd946a640a65eb1d",
    d:"momomh.com",
    l:"https://ae01.alicdn.com/kf/Uf8692d06f3694b03b1881ded2b087438H.png",
    f:"#cp_img"
    };
    vara=CryptoJS.enc.Utf8.parse(loadConf['k'])
    console.log(a);
    varb=CryptoJS.AES.decrypt(loadConf["i"],a,{
    'iv':a,
    'padding':CryptoJS.pad.Pkcs7
    })
    varc=b.toString(CryptoJS.enc.Utf8)
    console.log(c);

編寫程式碼爬取漫畫


  1. 新建一個_momomh.js 檔案, 把上面的js程式碼稍微整理下copy進去

  2. 新建個momomh_com.py 檔案 編寫爬蟲邏輯

  3. #!/usr/bin/python3
    #-*-coding:utf-8-*-
    #Time:2020/10/2815:35
    #Author:Amd794
    #Email:[email protected]
    #Github:https://github.com/Amd794
    importre
    
    importexecjs
    
    fromthreading_download_imagesimportget_response
    
    
    classMomomh(object):
    @staticmethod
    def_momomh(detail_url):
    header={
    'User-Agent':'Mozilla/5.0(iPad;CPUOS11_0likeMacOSX)AppleWebKit/604.1.34(KHTML,likeGecko)Version/11.0Mobile/15A5341fSafari/604.1Edg/85.0.4183.83',
    }
    response=get_response(detail_url,header=header)
    load_conf=re.findall('varloadConf=({.*?})',response.text,re.S)[0].strip('\n')
    word=['i:','c:','k:','d:','l:','f:']
    foriinword:
    load_conf=load_conf.replace(i,l:=f'"{i[0]}":')
    ctx=execjs.get().compile(open('../js/_momomh.js').read(),cwd='../js/node_modules')
    data=ctx.call('getArr',eval(load_conf))
    image_url=[url.strip('_w_720')forurlindata]
    returnimage_url
    
    
    if__name__=='__main__':
    print(Momomh._momomh('https://m.momomh.com/view/ZJBBO.html'))

4. 最後整合到主程式中測試, 沒問題後就可以部署到伺服器上進行爬取入庫。

5. 稍微配置以下

6. 執行看最終結果

7.

圖片合併


8. 下載完成後,會發現一個問題。下載下來的圖片被切割了

所以,還要做合併一下。程式碼如下:

#!/usr/bin/python3
#-*-coding:utf-8-*-
#Time:2020/11/2219:35
#Author:Amd794
#Email:[email protected]
#Github:https://github.com/Amd794
importos
importre
fromshutilimportcopyfile

fromPILimportImage


deff(s):
try:
returnint(re.findall('\d+',s)[0])
exceptIndexError:
return999


suffix=['jpg','png','jpeg']
page=5
file_list=[imgFileNameforimgFileNameinos.listdir('.')if
imgFileName.endswith(tuple(suffix))and'_w_144'inimgFileName]
file_list.sort(key=f)
file_groups=[[xforxinfile_list][i:i+page]foriinrange(0,len(file_list),page)]
file_name=''
forgroupinfile_groups:
print(f'-----正在操作{group}分組-----')
image=Image.open(group[0])
width,height=image.size
to_image=Image.new('RGB',(width*page,height))#建立一個新圖
forpicingroup:
file_name=pic.replace('_w_144','')
to_image.paste(Image.open(pic),(int(width)*group.index(pic),0))
to_image.save(file_name)
#及時釋放檔案
image.close()
to_image.close()
foriinfile_list:
try:
os.remove(i)
exceptPermissionError:
print(f'-----{i}PermissionError-----')

withopen('error_urls.txt','w')asfw:
fw.close()

copyfile('try_to_fix.py',
os.path.join('./','try_to_fix.py'))
os.system("pythontry_to_fix.py")
os.remove(__file__)