爬取關鍵詞相關圖片
阿新 • • 發佈:2020-12-20
需求
看到抖音上有人展示的一個小應用,輸入任意一個關鍵詞,自動儲存網路上的相關圖片。處於興趣,我也來試試。
工具
程式語言:Python
IDE:PyCharm
思路
要完成這個需求,第一想法就是藉助百度圖片先把相關圖片搜出來,然後用Python儲存頁面上的圖片。
實現
明確了思路後,就動手寫程式碼。
# -*- coding:utf-8 -*- import re import requests import os def download_pic(html, keyword): pic_url = re.findall('"objURL":"(.*?)",', html, re.S) i = 1 print('找到關鍵詞:' + keyword + '的圖片,現在開始下載圖片...') file_path = 'F:/images/' + keyword if os.path.exists(file_path): print() else: os.mkdir(file_path) for each in pic_url: print('正在下載第' + str(i) + '張圖片,圖片地址:' + str(each)) try: pic = requests.get(each, timeout=5) except requests.exceptions.ConnectionError: print('【錯誤】當前圖片無法下載') continue pic_dir = 'F:/images/' + keyword + '/' + keyword + '_' + str(i) + '.jpg' fp = open(pic_dir, 'wb') fp.write(pic.content) fp.close() i += 1 if __name__ == '__main__': word = input("請輸入關鍵詞: ") url = 'http://image.baidu.com/search/flip?tn=baiduimage&ie=utf-8&word=' + word + '&ct=201326592&v=flip' result = requests.get(url) download_pic(result.text, word)
問題及解決方法
問題:控制檯報如下錯誤
Traceback (most recent call last): File "F:\Python_Projects\WordCloudTest\GrabPics\Demo1.py", line 39, in <module> result = requests.get(url) File "F:\Python_Projects\WordCloudTest\venv\lib\site-packages\requests\api.py", line 76, in get return request('get', url, params=params, **kwargs) File "F:\Python_Projects\WordCloudTest\venv\lib\site-packages\requests\api.py", line 61, in request return session.request(method=method, url=url, **kwargs) File "F:\Python_Projects\WordCloudTest\venv\lib\site-packages\requests\sessions.py", line 542, in request resp = self.send(prep, **send_kwargs) File "F:\Python_Projects\WordCloudTest\venv\lib\site-packages\requests\sessions.py", line 677, in send history = [resp for resp in gen] File "F:\Python_Projects\WordCloudTest\venv\lib\site-packages\requests\sessions.py", line 677, in <listcomp> history = [resp for resp in gen] File "F:\Python_Projects\WordCloudTest\venv\lib\site-packages\requests\sessions.py", line 166, in resolve_redirects raise TooManyRedirects('Exceeded {} redirects.'.format(self.max_redirects), response=resp) requests.exceptions.TooManyRedirects: Exceeded 30 redirects.
解決方法:定義headers
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) '
'Chrome/70.0.3538.25 Safari/537.36 Core/1.70.3760.400 QQBrowser/10.5.4083.400',
}
完整程式碼
# -*- coding:utf-8 -*- import re import requests import os def download_pic(html, keyword): pic_url = re.findall('"objURL":"(.*?)",', html, re.S) i = 1 print('找到關鍵詞:' + keyword + '的圖片,現在開始下載圖片...') file_path = 'F:/images/' + keyword if os.path.exists(file_path): print() else: os.mkdir(file_path) for each in pic_url: print('正在下載第' + str(i) + '張圖片,圖片地址:' + str(each)) try: pic = requests.get(each, timeout=5) except requests.exceptions.ConnectionError: print('【錯誤】當前圖片無法下載') continue pic_dir = 'F:/images/' + keyword + '/' + keyword + '_' + str(i) + '.jpg' fp = open(pic_dir, 'wb') fp.write(pic.content) fp.close() i += 1 if __name__ == '__main__': word = input("請輸入關鍵詞: ") url = 'http://image.baidu.com/search/flip?tn=baiduimage&ie=utf-8&word=' + word + '&ct=201326592&v=flip' headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) ' 'Chrome/70.0.3538.25 Safari/537.36 Core/1.70.3760.400 QQBrowser/10.5.4083.400', } result = requests.get(url, headers=headers) download_pic(result.text, word)
成果
執行程式,輸入關鍵詞“蘇州”,程式就會將爬取到的圖片存入指定路徑。
從下圖可以看到,有些圖片無法顯示。猜測原因可能是這些圖片的源地址已經失效,具體是為什麼暫時不明。