1. 程式人生 > 其它 >起點中文網字型反爬注意事項

起點中文網字型反爬注意事項

技術標籤:xpathpython爬蟲

請首先閱讀:
Code皮皮蝦
Python爬蟲進階之起點中文網字型反扒保姆級教程!!!
https://blog.csdn.net/llllllkkkkkooooo/article/details/108430930?ops_request_misc=%25257B%252522request%25255Fid%252522%25253A%252522161119264116780255297604%252522%25252C%252522scm%252522%25253A%25252220140713.130102334.pc%25255Fall.%252522%25257D&request_id=161119264116780255297604&biz_id=0&utm_medium=distribute.pc_search_result.none-task-blog-2all

first_rank_v2~rank_v29-1-108430930.first_rank_v2_pc_rank_v29&utm_term=Python%E7%88%AC%E8%99%AB%E8%BF%9B%E9%98%B6%E4%B9%8B%E8%B5%B7%E7%82%B9%E4%B8%AD%E6%96%87%E7%BD%91%E5%AD%97%E4%BD%93%E5%8F%8D

import requests
import re
from fontTools.ttLib import TTFont
from lxml import etree
url = "https://book.qidian.com/info/1025457786"
response = requests.get(url=url) response.encoding = 'utf-8' html_data = response.text with open("d:/zhusc/反扒成功.html","w",encoding="utf-8") as f: f.write(html_data) t1 = re.findall("(\S{27})</span></em><cite>萬字",html_data)[0] print(t1) t2 = re.
findall("(\S{27})</span></em><cite>萬總推薦",html_data)[0] print(t2) t3 = re.findall("(\S{27})</span></em><cite>周推薦",html_data)[0] print(t3) selector = etree.HTML(html_data) x1='/html/body/div/div[6]/div[1]/div[2]/p[3]/em[1]/span/@class' #萬字 a1=selector.xpath(x1)[0] print(a1) x2='/html/body/div/div[6]/div[1]/div[2]/p[3]/em[2]/span/@class' #萬總推薦 a2=selector.xpath(x2)[0] print(a2) x3='/html/body/div/div[6]/div[1]/div[2]/p[3]/em[3]/span/@class' #周推薦 a3=selector.xpath(x3)[0] print(a3)

如圖示,執行結果次次不同。介面一重新整理,加密的數字馬上變,span class也變
在這裡插入圖片描述
在這裡插入圖片描述
在這裡插入圖片描述

import requests
import time
from lxml import etree
url="https://book.qidian.com/info/1025457786"
x='/html/body/div/div[6]/div[1]/div[2]/p[3]/em[1]/span/text()'
res=requests.get(url)
html=res.content
selector = etree.HTML(html)
target=str(selector.xpath(x)[0])
print(target)   #