python爬蟲--利用xpath爬取圖片(虛擬機器ubuntu16.04)
阿新 • • 發佈:2019-02-04
此篇爬蟲的背景是:虛擬機器剛裝好的ubuntu 16.04,系統環境還需配置,爬蟲的程式是之前幾個月前在windows上寫的,今天放到虛擬機器上跑一跑!(安裝了VMware Tools就可以把宿主機上的檔案拉進虛擬機器中!)
xpath爬取用到了urllib2與lxml庫,ubuntu16.04自帶python2.7.11,包含了urllib2庫,但lxml還需安裝!
上程式:
執行結果為:# -*- coding:utf-8 -*- import urllib2 from lxml import etree def loadPage(url): headers = {"User-Agent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:54.0) Gecko/20100101 Firefox/54.0","Referer":"http://www.mmonly.cc/mmtp/xgmn/175265_4.html"} request = urllib2.Request(url,headers = headers) response = urllib2.urlopen(request) html = response.read() #print html content = etree.HTML(html) link_list = content.xpath('//div[@class="thumb"]/img/@src') for link in link_list: writeImage(link) def writeImage(link): request = urllib2.Request(link) image = urllib2.urlopen(request).read() filename = link[-10:] with open(filename,'wb') as f: f.write(image) print "download successful" + filename if __name__ == "__main__": url = "http://www.xiaoliaobaike.cn/qutu" p = input("please input a tegert: ") fullurl = url + "?p=" + str(p) loadPage(fullurl) ~
檢視檔案:
開啟對應的資料夾即可檢視圖片