1. 程式人生 > >ImageNet 資料集下載

ImageNet 資料集下載

一、獲取urls

登入官網,  www.image-net.org, SEARCH自己需要的圖片種類, 以搜尋插座(socket)為例,必須輸入英文(這是廢話)。可能需要翻牆,不一定,如何翻牆請去隔壁下載VPN

點選需要的選項進入

點選downloads標籤,出現URLS類

點擊出現很可怕的大量url,下載儲存在txt中備用。

二、python指令碼批量下載

用python的urlib庫批量下載獲取的urls,此處用的python2.7環境,程式碼如下

# from urllib import request
import urllib2, urllib
import signal

path ='/home/hzc/Pictures/URL-TXT/watermeter.txt'
def handler(signum, frame): raise AssertionError file = open(path) for line in file: try: signal.signal(signal.SIGALRM, handler) signal.alarm(5) print(line) # # fake header # headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:52.0) Gecko/20100101 Firefox/52.0'}
# req = urllib.request.Request(url=line, headers=headers) # urllib.request.urlopen(req).read() try: f = urllib2.urlopen(line) data = f.read() with open('%s ' % line.split('/')[-1], "wb") as code: code.write(data) except: pass
# pic_link = line # save_path = r'/home/hzc/Pictures/%s.JPG '% line.split('/')[-1] # request.urlretrieve(pic_link, save_path) except AssertionError: print("%s timeout " % line) continue file.close()

得到一大串圖片,重新命名,加標籤的請移步我的另一篇部落格:

完畢