爬蟲2 urllib3 爬取30張百度圖片
阿新 • • 發佈:2019-01-12
other utf tab class eight find thumb height spa
import urllib3 import re # 下載百度首頁頁面的所有圖片 # 1. 找到目標數據 # page_url = ‘http://image.baidu.com/search/index?tn=baiduimage&ct=201326592&lm=-1&cl=2&ie=gb18030&word=%CD%BC%C6%AC&fr=ala&ala=1&alatpl=others&pos=0‘ # http = urllib3.PoolManager() # res = http.request(‘get‘,page_url)# print(res.data.decode(‘utf-8‘)) # Ajax的 ajax_url = ‘http://image.baidu.com/search/acjson?tn=resultjson_com&ipn=rj&ct=201326592&is=&fp=result&queryWord=%E5%9B%BE%E7%89%87&cl=2&lm=-1&ie=utf-8&oe=utf-8&adpicid=&st=&z=&ic=&hd=&latest=©right=&word=%E5%9B%BE%E7%89%87&s=&se=&tab=&width=&height=&face=&istype=&qc=&nc=&fr=&expermode=&force=&pn=30&rn=30&gsm=1e&1546957772498=‘ http = urllib3.PoolManager() res = http.request(‘get‘,ajax_url) # print(res.data.decode()) img_urls = re.findall(r‘"thumbURL":"(.*?),‘,res.data.decode()) # print(img_urls) # print(len(img_url)) headers = { ‘Referer‘:‘https://www.baidu.com/s?ie=utf-8&wd=%E5%9B%BE%E7%89%87‘ } for i , img_url inenumerate(img_urls): # print(img_url) img = http.request(‘get‘,img_url,headers=headers)
爬蟲2 urllib3 爬取30張百度圖片