ImageNet 資料集下載
阿新 • • 發佈:2019-01-28
一、獲取urls
登入官網, www.image-net.org, SEARCH自己需要的圖片種類, 以搜尋插座(socket)為例,必須輸入英文(這是廢話)。可能需要翻牆,不一定,如何翻牆請去隔壁下載VPN
點選需要的選項進入
點選downloads標籤,出現URLS類
點擊出現很可怕的大量url,下載儲存在txt中備用。
二、python指令碼批量下載
用python的urlib庫批量下載獲取的urls,此處用的python2.7環境,程式碼如下
# from urllib import request import urllib2, urllib import signal path ='/home/hzc/Pictures/URL-TXT/watermeter.txt'def handler(signum, frame): raise AssertionError file = open(path) for line in file: try: signal.signal(signal.SIGALRM, handler) signal.alarm(5) print(line) # # fake header # headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:52.0) Gecko/20100101 Firefox/52.0'}# req = urllib.request.Request(url=line, headers=headers) # urllib.request.urlopen(req).read() try: f = urllib2.urlopen(line) data = f.read() with open('%s ' % line.split('/')[-1], "wb") as code: code.write(data) except: pass# pic_link = line # save_path = r'/home/hzc/Pictures/%s.JPG '% line.split('/')[-1] # request.urlretrieve(pic_link, save_path) except AssertionError: print("%s timeout " % line) continue file.close()
得到一大串圖片,重新命名,加標籤的請移步我的另一篇部落格:
完畢