python 2.7 圖片下載爬蟲
阿新 • • 發佈:2018-11-03
寫圖片爬蟲的一些心得 1.先到所要下載圖片的網址看看,頁面請求的網址是哪個(我用的是goolge瀏覽器) 2.點選所要下載的圖片,檢視其具體位置,(方便查詢img連結) 3.找好之後就可以寫程式碼了 4.主要難度是找到img=“”的具體位置,需要正則表達搜尋一下 不會正則的或是beautifulsoup的小夥伴可以參考一下這兩個視屏 beautifulsoup:https://www.youtube.com/watch?v=KLq0W1wUVmw&index=3&list=PLXO45tsB95cIuXEgV-mvYWRd_hVC43Akk 正則:https://www.youtube.com/watch?v=l1MAW1z641E 4.搜尋成功後將其下載到本地檔案中 以下是小編我自己寫的程式碼
未改良版的:
#coding=utf-8 import requests import os from bs4 import BeautifulSoup url = "http://www.ngchina.com.cn/magazine/2018/10/1337.html" html = requests.get(url).text soup = BeautifulSoup(html,'lxml') all_img = soup.find_all('a',{'class':'img_btn'}) root = "C://img222//" os.makedirs(root,mode=0o777) for ul in all_img: imgs = ul.find_all('img') for ull in imgs: imgss = ull['src'] r=requests.get(imgss,stream=True) path =root + imgss.split('/')[-1] try: with open(path, 'wb') as f: for chunk in r.iter_content(chunk_size=100): f.write(chunk) print path except: print "ERRor"
改良版的:
#coding=utf-8 import requests import os from bs4 import BeautifulSoup def get_url(url): headers = { "user-agent" : "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.67 Safari/537.36", "referer" : "http://www.ngchina.com.cn/magazine/2018/10/1337.html" } res = requests.get (url, headers = headers ) return res def main(): url = "http://www.ngchina.com.cn/magazine/2018/10/1337.html" res = get_url(url) html = res.text soup = BeautifulSoup(html, 'lxml') all_imgs = soup.find_all('a', {'class': 'img_btn'}) for ul in all_imgs: imgs = ul.find_all('img') for l in imgs: imgss = l['src'] r = requests.get(imgss, stream=True) root = "C://img222//" path = root + imgss.split('/')[-1] try: with open (path,"wb") as f: for chunk in r.iter_content(chunk_size=128): f.write(chunk) print path except: print ERROE if __name__ == "__main__": main()