python 簡單爬蟲
阿新 • • 發佈:2018-03-26
.... ror gbk 訪問 req 爬取 exc .cn 所有
使用urllib.request 和re 模塊
1 from urllib.request import * 2 import re #處理網絡訪問 3 #獲取網頁 4 url = ‘https://image.baidu.com/search/index?tn=baiduimage&ct=201326592&lm=-1&cl=2&ie=gbk&word=%C3%C0%C5%AE%CD%BC%C6%AC&fr=ala&ala=1&alatpl=adress&pos=0&hs=2&xthttps=111111‘ 5#打開網頁 6 hmtl = urlopen(url) 7 #獲取html代碼 ,decode 解碼 8 obj = hmtl.read().decode() 9 #使用re,找出所有的objURL鏈接 .*?匹配所有結果 10 urls = re.findall(r‘"objURL":"(.*?)"‘,obj) 11 index = 1 12 for url in urls: 13 try: 14 if re.search(‘.jpg$‘,url): 15 print(‘downloading........%d‘%index) 16urlretrieve(url,‘pic‘ +str(index)+ ‘.jpg‘) 17 else: 18 print(‘downloading........%d‘ % index) 19 urlretrieve(url, ‘pic‘ + str(index) + ‘.png‘) 20 index += 1 21 22 except Exception: 23 print(‘download error....%d‘%index) 24 else: 25print(‘download complete‘)
爬取一張圖片
使用requests 模塊
1 import requests 2 image_url = ‘http://www.cnblogs.com/Images/Skins/BJ2008.jpg‘ 3 response = requests.get(image_url) 4 with open(‘outlook.jpg‘,‘wb‘) as f: 5 f.write(response.content)
python 簡單爬蟲