起點中文網字型反爬注意事項
阿新 • • 發佈:2021-02-02
請首先閱讀:
Code皮皮蝦
Python爬蟲進階之起點中文網字型反扒保姆級教程!!!
https://blog.csdn.net/llllllkkkkkooooo/article/details/108430930?ops_request_misc=%25257B%252522request%25255Fid%252522%25253A%252522161119264116780255297604%252522%25252C%252522scm%252522%25253A%25252220140713.130102334.pc%25255Fall.%252522%25257D&request_id=161119264116780255297604&biz_id=0&utm_medium=distribute.pc_search_result.none-task-blog-2all
import requests
import re
from fontTools.ttLib import TTFont
from lxml import etree
url = "https://book.qidian.com/info/1025457786"
response = requests.get(url=url)
response.encoding = 'utf-8'
html_data = response.text
with open("d:/zhusc/反扒成功.html","w",encoding="utf-8") as f:
f.write(html_data)
t1 = re.findall("(\S{27})</span></em><cite>萬字",html_data)[0]
print(t1)
t2 = re. findall("(\S{27})</span></em><cite>萬總推薦",html_data)[0]
print(t2)
t3 = re.findall("(\S{27})</span></em><cite>周推薦",html_data)[0]
print(t3)
selector = etree.HTML(html_data)
x1='/html/body/div/div[6]/div[1]/div[2]/p[3]/em[1]/span/@class' #萬字
a1=selector.xpath(x1)[0]
print(a1)
x2='/html/body/div/div[6]/div[1]/div[2]/p[3]/em[2]/span/@class' #萬總推薦
a2=selector.xpath(x2)[0]
print(a2)
x3='/html/body/div/div[6]/div[1]/div[2]/p[3]/em[3]/span/@class' #周推薦
a3=selector.xpath(x3)[0]
print(a3)
如圖示,執行結果次次不同。介面一重新整理,加密的數字馬上變,span class也變
import requests
import time
from lxml import etree
url="https://book.qidian.com/info/1025457786"
x='/html/body/div/div[6]/div[1]/div[2]/p[3]/em[1]/span/text()'
res=requests.get(url)
html=res.content
selector = etree.HTML(html)
target=str(selector.xpath(x)[0])
print(target) #