Python簡單圖片爬蟲

阿新 • • 發佈：2019-01-07

# -*- coding=utf-8 -*-
import requests as req
from bs4 import BeautifulSoup
from PIL import Image
from io import BytesIO
import os
from skimage import io

url = "https://www.zhihu.com/question/37787176"
headers = {'User-Agent' : 'Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.96 Mobile Safari/537.36' 
}
response = req.get(url,headers=headers)
content = str(response.content)
#print content

soup = BeautifulSoup(content,'lxml')
images = soup.find_all('img')
print u"共有%d張圖片" % len(images)

if not os.path.exists("images"):
    os.mkdir("images")

for i in range(len(images)):
    img = images[i]
    print 
 u"正在處理第%d張圖片..." % (i+1)
    img_src = img.get('src')
    if img_src.startswith("http"):
        ## use PIL
        '''
        print img_src
        response = req.get(img_src,headers=headers)
        image = Image.open(BytesIO(response.content))
        w,h = image.size
        print w,h
        img_path = "images/" + str(i+1) + ".jpg"
        if w>=500 and h>500:
            #image.show()
            image.save(img_path)

        ''' 


        ## use OpenCV
        import numpy as np
        import urllib
        import cv2

        resp = urllib.urlopen(img_src)

        image = np.asarray(bytearray(resp.read()), dtype="uint8")
        image = cv2.imdecode(image, cv2.IMREAD_COLOR)
        w,h = image.shape[:2]
        print w,h
        img_path = "images/" + str(i+1) + ".jpg"
        if w>=400 and h>400:
            cv2.imshow("Image", image)
            cv2.waitKey(3000)
            ##cv2.imwrite(img_path,image)

        ## use skimage

        ## image = io.imread(img_src)
        ## w,h = image.shape[:2]
        ## print w,h
        #io.imshow(image)
        #io.show()

        ## img_path = "images/" + str(i+1) + ".jpg"
        ## if w>=500 and h>500:
            ## image.show()
            ## image.save(img_path)
            ## io.imsave(img_path,image)

print u"處理完成！"

Python簡單圖片爬蟲

# -*- coding=utf-8 -*- import requests as req from bs4 import BeautifulSoup from PIL import Image from io import BytesIO import

python實現簡單圖片爬蟲並保存

.com 貪婪模式 web頁面 logs urn 並不是 python 保存 light 先po代碼 #coding=utf-8 import urllib.request #3之前的版本直接用urllib即可，下同 #該模塊提供了web頁面讀取數據的接口，使得我們可以

Python簡單網頁爬蟲

tab write open python2.x row browser mod err urlopen 由於Python2.x與Python3.x存在很的差異，Python2.x調用urllib用指令urllib.urlopen（），運行時報錯：AttributeErr

Python簡單的爬蟲

tex spa html -a per com odin 6.0 n) Python3 的requests的requests 庫 1 安裝：　　在配好python的基礎上，在dos命令框中，使用 pip install requests 就行了 2 演示：　　pyth

Python 簡單業務爬蟲

python 爬蟲如何快速下載貼吧圖片呢？#!/usr/bin/python # -*- coding: UTF-8 -*- import urllib import re def getHtml(url): page = urllib.urlopen(url) html = pa

python 簡單的爬蟲

import urllib.request import re import ssl # 處理https請求 import time import os # 建立目錄用 def get_html(url): page = urllib.request.urlopen(url) h

python尤果網圖片爬蟲(簡單)__selenium+phantomJS+urllib2

1.首先給python安裝selenium庫,然後下載phantomJS並配置環境變數(網上搜索一堆) 2.直接放python程式碼: youguo_image_spider.py #!/usr/bin/env python #_*_coding:utf-8_*_ fr

python爬蟲實現登陸簡單圖片驗證碼識別（Tesseract識別）

Tesseract下載與安裝附：德國曼海姆大學發行的3.05版本下載安裝與配置PATH環境變數安裝略，環境變數只要將目錄新增到PATH路徑，PATH路徑針對於命令列解析。 tesseract 1.png output-l eng -psm 7 -ps

Python簡單爬蟲爬取多頁圖片

初學爬蟲簡單的爬了爬貼吧圖片 #!/usr/bin/python # coding utf-8 import re import time import urllib def getHtml():

python批量下載色影無忌和蜂鳥的圖片爬蟲小應用

exce pen 應用 content 沒有 str1 .com pat tar 有些冗余信息。由於之前測試正則表達式。所以沒有把它們給移走。只是不影響使用。# -*- coding:utf-8 -*- import re,urllib,sys,os,time de

Python之Scrapy爬蟲框架安裝及簡單使用

intern 原理 seda api release linux發行版 3.5 pic www 題記：早已聽聞python爬蟲框架的大名。近些天學習了下其中的Scrapy爬蟲框架，將自己理解的跟大家分享。有表述不當之處，望大神們斧正。一、初窺Scrapy Scrapy是

Python 實現簡單圖片驗證碼登錄

需要 spa tps dem 圖片背景 round alt word exc 朋友說公司要在測試環境做接口測試，登錄時需要傳入正確的圖片的驗證碼，本著懶省事的原則，推薦他把測試環境的圖片驗證碼寫死，我們公司也是這麽做的^_^。勸說無果/(ㄒoㄒ)/~~，只能通過 OCR 技

機器學習入門之python實現圖片簡單分類

numbers org 路徑圖片分類 jpg animal 入門 res windows 小任務：實現圖片分類 1.圖片素材 python批量壓縮jpg圖片: PIL庫 resize http://blog.csdn.net/u012234115/article/

python-實現一個貼吧圖片爬蟲

fix request arm agent x64 pan http python2 png 今天沒事回家寫了個貼吧圖片下載程序，工具用的是PyCharm，這個工具很實用，開始用的Eclipse，但是再使用類庫或者其它方便並不實用，所以最後下了個專業開發python程序的工

Python第三周之文件的讀寫以及簡單的爬蟲介紹

以及 under url error: except __name__ quest for div 文件的讀寫　　讀 import time def main(): """ 文件的讀寫，註意open的用法以及，文件地址的輸入。 :retur

python 簡單爬蟲

.... ror gbk 訪問 req 爬取 exc .cn 所有使用urllib.request 和re 模塊 1 from urllib.request import * 2 import re #處理網絡訪問 3 #獲取網頁 4 url = ‘https:/

python下載圖片簡單教程

python下載圖片通過urlretrieve方法import os#導入os包，沒有該模塊的可通過pip installl 命令安裝模塊from urllib.request import urlretrieve#導入urllib模塊IMAGE_URL="https://bpic.588ku.co

使用簡單的python語句編寫爬蟲定時拿取信息並存入txt

item line 簡單 ror article 5.5 quest win tail # -*- coding: utf-8 -*- #解決編碼問題import urllibimport urllib2import reimport osimport timepag

18、OpenCV Python 簡單實現一個圖片生成（類似抖音生成字母人像）

gaussian int read 。。 str gray clas range TE 1 __author__ = "WSX" 2 import cv2 as cv 3 import numpy as np 4 5 def local_threshold(i

python 鬥圖圖片爬蟲

創建文件夾下載 exceptio 文件 div 內容 urn all pad 搗鼓了三小時，有一些小Bug，望大佬指導廢話不說，直接上代碼： #!/usr/bin/python3 # -*- coding:UTF-8 -*- import os,re,request

Python簡單圖片爬蟲

相關推薦