沫沫漫畫網Js逆向分析爬取全站資源入庫處理圖片合併
阿新 • • 發佈:2021-07-01
網站分析
-
開啟目標網站:https://www.momomh.com/
-
選擇一部漫畫作為分析物件:《渴望:愛火難耐》
-
進到漫畫詳情頁這裡,發現並沒有需要逆向分析。直接可以獲取漫畫資訊。隨便點選一章進去:渴望:愛火難耐-第1話
-
F12 開啟開發者工具,選擇elements檢視原始碼。找到突破口,發現在某個script標籤下,有一段加密得字串
-
由裡到外對這一行程式碼進行分析,其中 _0x232c('0x7', 'T]C8') 意思就是往_0x232c 函數出入兩個值不變得字串,得到一個固定得返回結果。
換句話說, 就是_0x232c('0x7', 'T]C8') 這個結果就是個固定值EReVr
-
向外擴散分析 ,其中_0xe1f02a[_0x232c('0x7', 'T]C8')]和上面分析一個套路,得到一個固定返回值
-
點選輸出得函式進行跳轉分析程式碼,其實就是在我們斷點的上面
-
稍微看下就能得出,其實這個函式就是把第一個引數作為函式, 第二個引數作為第一個引數函式的引數。
所以總結得出打斷點處的這段程式碼就是
-
function_0x317597(_0x1b9bc1){ var_0x42f71a=CryptoJS[_0x232c('0x11','e*R8')]['Utf8'][_0x232c('0x12','e*R8')](_0x1b9bc1['k']); var_0x2ea3c6=CryptoJS[_0x232c('0x13','O3X#')][_0x232c('0x14','e*R8')](_0x1b9bc1['i'],_0x42f71a,{ 'iv':_0x42f71a, 'padding':CryptoJS['pad'][_0x232c('0x15','oEHH')] }); _0x2ea3c6=_0x2ea3c6[_0x232c('0x16','Plzz')](CryptoJS[_0x232c('0x17','fWan')][_0x232c('0x18','H8Db')]); if(_0xe1f02a['wmyOd'](_0x2ea3c6,'')){ if(_0xe1f02a[_0x232c('0x19','(tyc')](_0xe1f02a[_0x232c('0x1a','RcQ4')],_0xe1f02a[_0x232c('0x1b','fWan')])){ return''; }else{ return''; } } imgs=_0x2ea3c6[_0x232c('0x1c','88oY')]('|'); if(_0xe1f02a[_0x232c('0x1d','O3X#')](imgs[_0x232c('0x1e','oEHH')],0x0)){ s=''; len=imgs[_0x232c('0x1f','fyE)')]; for(var_0xff1a4d=0x0;_0xe1f02a[_0x232c('0x20','e*R8')](_0xff1a4d,len);_0xff1a4d++){ if(_0xe1f02a[_0x232c('0x21','jAwS')](imgs[_0xff1a4d][_0x232c('0x22','4of&')](_0xe1f02a[_0x232c('0x23','jAwS')]),-0x1)){ info=_0xe1f02a[_0x232c('0x24','fyE)')](_0x2ca615,imgs[_0xff1a4d]); w=_0xe1f02a[_0x232c('0x25','oEHH')](info[0x1],0x96)?0x14:0x64; s+=_0xe1f02a[_0x232c('0x26','Plzz')](_0xe1f02a[_0x232c('0x27','zw$3')](_0xe1f02a[_0x232c('0x28','wrg$')](_0xe1f02a[_0x232c('0x29','Vsp#')](_0xe1f02a['qcRQe'](_0xe1f02a[_0x232c('0x2a','Q]pH')](_0x232c('0x2b','saz)'),w),_0xe1f02a['sSLrn']),info[0x0]),_0xe1f02a[_0x232c('0x2c','H8Db')]),_0x1b9bc1['l']),'\x22>'); continue; } if(_0x1b9bc1['c']&&_0xe1f02a[_0x232c('0x2d','yert')](_0x1b9bc1['c'],0x0)){ var_0x3b6771=_0x232c('0x2e','saz)')[_0x232c('0x2f','jAwS')]('|') ,_0x128fc0=0x0; while(!![]){ switch(_0x3b6771[_0x128fc0++]){ case'0': k=_0xe1f02a[_0x232c('0x30','z#4F')](_0xff1a4d,0x1); continue; case'1': mod=_0xe1f02a['RboUH'](k,_0x1b9bc1['c']); continue; case'2': if(k!=0x1&&_0xe1f02a[_0x232c('0x31','1pZZ')](mod,0x0)){} continue; case'3': if(_0xe1f02a['ymPPG'](_0x1b9bc1['c'],0x6)){ if(_0xe1f02a[_0x232c('0x32','zw$3')]===_0xe1f02a[_0x232c('0x33','PgS1')]){ returnstr[_0x232c('0x34','T]C8')](sp); }else{ if(_0xe1f02a[_0x232c('0x35','PgS1')](k,0x1)||k!=0x1&&mod==0x1){ w=0x64; }else{ w=0x14; } } } continue; case'4': s+=_0xe1f02a['TWFcO'](_0xe1f02a['WngaM'](_0xe1f02a['mTPxd'](_0xe1f02a['osuEz'](_0xe1f02a[_0x232c('0x36','4of&')]+w+_0xe1f02a[_0x232c('0x37','I0J#')],imgs[_0xff1a4d]),_0xe1f02a[_0x232c('0x38','#5gG')]),_0x1b9bc1['l']),'\x22>'); continue; case'5': w=_0xe1f02a['WPfTk'](0x64,_0x1b9bc1['c']); continue; } break; } }else{ s+=_0xe1f02a[_0x232c('0x39','TX#a')](_0xe1f02a[_0x232c('0x3a','zw$3')](_0xe1f02a['qkGRr'](_0xe1f02a[_0x232c('0x3b','aS*w')]('<img\x20style=\x22width:100%;\x22\x20class=\x22lazy\x22\x20data-original=\x22',imgs[_0xff1a4d]),_0xe1f02a[_0x232c('0x3c','I0J#')]),_0x1b9bc1['l']),'\x22>'); } } _0xe1f02a[_0x232c('0x3d','jAA%')]($,_0x1b9bc1['f'])[_0x232c('0x3e','u5iv')](s); }
-
分析這個函式這一個關鍵imgs變數處,在打個斷點
-
輸出一個imgs, 發現這就是我們想要的結果
-
分析到這裡基本已經得到完成, 下面繼續寫程式碼爬取圖片入庫等操作
-
總結整理一下,得出以下程式碼
-
varloadConf={ i:"2fwFfyil4wHJqrgEtXgpFAfgoiD47DksIXZNdbrHtA4C+iN5hH3rK3ZohZoz/tBeXkzqlFDtVhqHdceI/Lo7jUBW2z9JRmAWORxfrfO/fCP1E8jjGI4bpLDzisIaOi1X/lA0rv+pUieoftsDVSOq9hclmcV38tsTghaxT0Tqx0Z28sXK8PX93UjdLrdnqj1ESng8x25FAz9d4B5SANBO+NqKanBZ/kyYZ7q96OygRc+Qf7k29A792SQMtu20ZpA+/1PGgC4vpOZyS8No7CN7dSkfC+0tfqDCU3I6Bhixq13uJ114ryF8Cod+0d7WO5GakDr7mIjlemugfT3jprKlSFZKoNLlDt07M6MRT73QPZPIZxGkKiZlGAgYuIIWtvGXNy8wtsI7Olwkk9YIBD7TduUmMiWhNEvSqfxeVEsk1f6/r2/U/qPYJiWGgWKLwl4M0CXLaU2NV8htYyyLAA/6bSP7dTm6+hFmF/ktcJ6ow8bHsQpoVjjlIgERtptARrUjHlg567Mqk3IZRf3zQE3hqoN4iN9DvPlpKez8a8fBuPbdPB0jUj3xpCr8yoggXW9Sb9SxTXB1/yxKG2OhoboqoOK9rjxEZucp5P+AEae+UKpiN5j7SaOW8SEZ48wl6Ln0kRBmpfbodDronlR/vIXFhWZiHTEgLAifWk1fckwUEQg9IQPd1CeTlAGwgUm+1zYqKiziAs34arPp+faLL25RYrGkhU6OldrYStQsE7TDj1p1pWbDNiZBzA3+H7en2wXIBEqSvC+FXL9ODBooB9DSaGbjIUWrQ6Q+QToUVdU8uFs5siQGFQ1cpI8GAyqUD/NQPMF4mjG2IqYJUvCyJj8ZEgXG0FUAj24H1FMAWn4W3h4D+zrmGFHR+Q5jecg+tZSrsyYG0tpTJZ59lwi/+Iw8bcXALVMD1QXfkgosN1M2gMl5sBGgkGMFl2hivRs37RmUMVic2PceW9pLQO1DQLyWMfP8YhYSdjegKM7m/wEwcu9FhN3DszNgkCGhWCqIuiJfzwRIMrSozs9RSl7CcLaekWWF+08IPFcLVWCuiOTKKNXjsOZ/4VtgLCkBMfDQwVmI1pwgYwXyeOcE37PBgGqy229hafx+KPkGPBtGXMCEE1SG/9GBEU1JdvQthGmtMkMWFQ9UZS00VvwGdYArZNPXEOgjZEQRlKwRvZ/dtVRpH6T3VSPudjNxaiLVjYvhP4lXtuCHSXQ1glWIFMM+14ZBl+7VQEOAJ11+Yggqskbv/WEu0PxpK8EvnHx4QTlo0KHXwTNzz29CpejJ5LZwDBKogCsaAlkKDflfNRkhaxpJavkqi2SOX5q3R1CU1bhsPyx00c7mRnv1LIY5fXqNLLoDjlzq91tE9FdqudOuJWR/GciSCQnaXzd+Y0OgTDBN2Szach9bjr2uzW2JuoN945vHfHvKUxdcBPy1eVSqRwjkXA8zpsgETxkRutWBeW74ZQGnlDb4QgHxsxTFJd4nHJydV2W1YZd6lOosO7C6Ryl34b1MLq8qL/zgwArt/xe1qHuY2PMKIpC+zBOX/WHjxWsZs9c4RU1akfnkcl5tCxnjl1pI4NyDpEEjE2RHhXHVAQayr84tAMtNcdLoVdl9cJWRKJ87wfXfgCED5zZLUxGbg7CXk8iQZHE+RZnEQ3m979Xipn0sbT1wtqB4y5B2oFAGzX5CfrAMj3Z8tOXMftj9EZBB+Ms8Lfz1Fr0wvcT2NUwUdvdf4ZXk99r2Z2gNrEJEG9yU6lFOLONVCwkDBGHqD3J5FL7P6xHwUXTb8mXILtB2h9+hdu4s6wrHJ1y0THBM2G42DE8DXf2Ca0sztlvFvxAOOWqYuT4ENev9ows0lkXDclchGIiQ+LVGpBwBWPpWhFiZeM16UGzC0C/nSL4irO4SXvDltdhcSEuQRAxM4mQyJB0pQs3k8WDi1fi0qN8lUjnPszkun4PImxZEiVw2KLKActzPqVW+LTT2R9KkD6SbHNRBXyMdt5FSx9UkLZVa0urCweWQvKe73xmcp1S6Jkr5Ifmi21hxiCbMx50sOm6EkIRiHVhhzIEftTKVXH/ioDzUY43ROxeqTkmHc7fdpu9l0esNGnTMF+emucq9G9IsoiWPvLEnHURMlbNeKtHS5Y8K8G9cyGHe15+KsqZWv0OObys7WMzZuoKl+AtbaJCixzxdX/cHNuDPpEXvRbLVvicUjfPzt1sk6SYjd9pzyDjR5tcIPIRSoz87iJJUzH+yqTDREJKURmbIq8Pjfn+a8RU+LVyL3xFap3jSyCVPi0LbQzcGrg2E0d457I4RLTzj0JtjPnn7DnElzD+WAUNdnaKPfs6tgej47pczPTVf2TE6apwBk1joC1JICsCPN6QEm1CokvGWQgis+1rpPi2hEuC+FPPNqOfE4BTpbxBfyl/QsGEwu1VBvGXJmcgsuh7ogvGXUmx2Xo00TMTgoLrc+2t9cWtuMq1T9ralJ1wxnKFH29ZOVVvPDHZw/uzZ98f4u5L1wpQQ4PxzzwJp0DLOwgxV9vRbznZnQBWT2ABQ6z8786WdkJ6srjtzUNVKesplg+aOLeoiRLAE9UrjUTmICzS0B1u4FVAQAIldDGOmtLNTfIP7TALmyLzKnxKfCnQSwWf3ZXVEBrZugzbTHSh0uU+oiwHz9Nemgsw/HW6qwSoFii2hIRS+EKWvkIgsnCZfZP8CHZss7Cy6DEeL5GZA6jMNMNbnnafJxyqM4K6rNljP9dUFsDeWfvCFYpoOwG4JuiCSR/O9cYed05wl0Qk88pEhRm5VkmUoBmM1AnBgki4W3OEFZaALhhzI78EgfpG+Pj5NfvetvVCcwLzAIZZhWKX1pK/P6Z0fxF9vvVcqqNr0vMY4h6x2oCzK0FdeBN4lZrjnveXMDYti1nVT5mgfgmJt9xJmYpvWd7/we/zyoBESkkpPaNlyZMzBoBQrrXIzxQ2dUsi/dQVf/hfajXRKn4FdK9Upz9ky9BU9HYZR/1AyOI0LIlPc0ve2ZnB/2ZsOuoR/eWkSHuQukOlf7eG2fXUXV9gHwOqtyNgGvymOQk4rHi4hfUns8LExCjwcgb9nDQiZgZvqcaSqPB3quqN1zSTIhk12Was2C8QbbPo9hhe4i6cK4hec9VWm4sf9OKa8A7PCQgSG/dywoS+LTHaQ0vZjFqREE3Z6rKp/0Mbrffpfd2p25lxKDX7oVtmJBLmHwu8AEZWzijT1H495tz+2b8/r1RhKfgaeV1wyXVSb/AYSJAMvHu3RMnMQFkLoz+60ltuOM4HBXXUZyYlYARMyXV6PFTNnnfb8aJfldHB7CJyVBcUtSpbtybDL4+tLfWMXFfnILszUrOkQaM1SXw+6Rw2KFvkMwkSryeeVDI9j+4vovfL1I9iBt5wh/F2AF/phA08vxvRWcTcjM8RYowEE2uhdnk4q6ur2Ev00N8ggOKhnLG56yuRm2j+T5D8exgfbeiiN/iz4YP16wAvVcWX96PC/eezKNvnpu8NcenId6B4UkB2qwiP2/UuDJUpkxSCh2WAovk8mGYpXDLl4Ev1Aby5m51ePG9WGK57Rm4QQouJ45mYeZUZtf3pNllMBONTfLPtutp/TwTNJ2Y5/OqnLoXsNgo5L0Gpv4/ZeGu6oA0pmWy2Db/Jk/n0PjjLhLPtvI75Ew7H7jGtwvtKcwydsmhzjJL86X9p82piAtDUEtIC0014n3inh9/6HMJQyh2ItjSxlBxvicuhiGUf5xl0tRaiEKEchH+7R9k/s/w3iveUhGJmVBQbFDyxjtqLh8jbEjwkuD1rgihq0gDVEVcBJ8FAtz2jXNXXPlLe7FFhEebryiNnDswA8wMLMyZ5xJgnUyfxqsT5oeUVfEEkqWtTUjzRY5xCfDqgAWjvFZQg7pDkhlHSPlvsePreNGFjPpphxgjRWURKhZXXT0j6VT5PDAmlPy03pRS3i8k64WeWlOkIqb7Evp4aDTjZh/ZTVaKzeXnQ4iGeKcJTNTnN/LLQUO3Y8nhLio18M8S79rR5/4sG+zP6yO10ThOyuRlTDc/9weWEoQQEt+4TbqfB4ORHAix7S/IpCGdzV2O8ifku5/v34KMjaRWLn0UgeqfwDO2y1P7W08jOr7vkYIyzcnUUpLV/5xJ76UBiXcVbGMCg/f/Uv9dz/RENKaOEnnB1J5uvrZ4tFHc6eatNLFO9622mgGiTI6MSs6Hse9zyBE76qhDZbiDH2ENEwrTGZzXZ5YKolWlCsIau7iH/a3r/LA6iOJp28QJckUM6MFY1L8kQO/qay81528M+8Bg3U0ebGDzccsg8FmLRCNH5OBoStmLkcEKVQjBvKmKKafxbxwEz3jyW8zPkli6LqItp9Oy+Pf2NYpMSEh9r67GL2GClmrz0FfLczrHoj80M68oyRqt+EF4gzFfMMjipzzOnuVFTzwhFsyjFkEKKDY7UDCzko32pAKgd2YkqmurVa4A8/cYeat+ugcheKjkzWx3KQ1ttkXZa+gqEp1wNGCyosQVxiwdN/3SNi7ra0NGvMahLMIByJmGOidoO/efc/1kUJ7fqtVYYOJab2TLPzTAaerkMBW8WLCsFWpet05drHspv+nO3heo+mN7EF3oG6COEmJ8RdWcvDVqLQ8QPY3phg75ksqGqDYExRUZoJGsbax/2tXo8bQx5WaZMNGEXPZMeQoDvSDyxLRdIRv4k4TXRWccxSg9QNR+PZqCQZsp7bYZl/4NZ/GEU=", c:"0", k:"fd946a640a65eb1d", d:"momomh.com", l:"https://ae01.alicdn.com/kf/Uf8692d06f3694b03b1881ded2b087438H.png", f:"#cp_img" }; vara=CryptoJS.enc.Utf8.parse(loadConf['k']) console.log(a); varb=CryptoJS.AES.decrypt(loadConf["i"],a,{ 'iv':a, 'padding':CryptoJS.pad.Pkcs7 }) varc=b.toString(CryptoJS.enc.Utf8) console.log(c);
編寫程式碼爬取漫畫
-
新建一個_momomh.js 檔案, 把上面的js程式碼稍微整理下copy進去
-
新建個momomh_com.py 檔案 編寫爬蟲邏輯
-
#!/usr/bin/python3 #-*-coding:utf-8-*- #Time:2020/10/2815:35 #Author:Amd794 #Email:[email protected] #Github:https://github.com/Amd794 importre importexecjs fromthreading_download_imagesimportget_response classMomomh(object): @staticmethod def_momomh(detail_url): header={ 'User-Agent':'Mozilla/5.0(iPad;CPUOS11_0likeMacOSX)AppleWebKit/604.1.34(KHTML,likeGecko)Version/11.0Mobile/15A5341fSafari/604.1Edg/85.0.4183.83', } response=get_response(detail_url,header=header) load_conf=re.findall('varloadConf=({.*?})',response.text,re.S)[0].strip('\n') word=['i:','c:','k:','d:','l:','f:'] foriinword: load_conf=load_conf.replace(i,l:=f'"{i[0]}":') ctx=execjs.get().compile(open('../js/_momomh.js').read(),cwd='../js/node_modules') data=ctx.call('getArr',eval(load_conf)) image_url=[url.strip('_w_720')forurlindata] returnimage_url if__name__=='__main__': print(Momomh._momomh('https://m.momomh.com/view/ZJBBO.html'))
4. 最後整合到主程式中測試, 沒問題後就可以部署到伺服器上進行爬取入庫。
5. 稍微配置以下
6. 執行看最終結果
7.
圖片合併
8. 下載完成後,會發現一個問題。下載下來的圖片被切割了
所以,還要做合併一下。程式碼如下:
#!/usr/bin/python3 #-*-coding:utf-8-*- #Time:2020/11/2219:35 #Author:Amd794 #Email:[email protected] #Github:https://github.com/Amd794 importos importre fromshutilimportcopyfile fromPILimportImage deff(s): try: returnint(re.findall('\d+',s)[0]) exceptIndexError: return999 suffix=['jpg','png','jpeg'] page=5 file_list=[imgFileNameforimgFileNameinos.listdir('.')if imgFileName.endswith(tuple(suffix))and'_w_144'inimgFileName] file_list.sort(key=f) file_groups=[[xforxinfile_list][i:i+page]foriinrange(0,len(file_list),page)] file_name='' forgroupinfile_groups: print(f'-----正在操作{group}分組-----') image=Image.open(group[0]) width,height=image.size to_image=Image.new('RGB',(width*page,height))#建立一個新圖 forpicingroup: file_name=pic.replace('_w_144','') to_image.paste(Image.open(pic),(int(width)*group.index(pic),0)) to_image.save(file_name) #及時釋放檔案 image.close() to_image.close() foriinfile_list: try: os.remove(i) exceptPermissionError: print(f'-----{i}PermissionError-----') withopen('error_urls.txt','w')asfw: fw.close() copyfile('try_to_fix.py', os.path.join('./','try_to_fix.py')) os.system("pythontry_to_fix.py") os.remove(__file__)