爬蟲概念 requests模組

阿新 • • 發佈：2018-12-05

requests模組

- 基於如下5點展開requests模組的學習

什麼是requests模組
- requests模組是python中原生的基於網路請求的模組，其主要作用是用來模擬瀏覽器發起請求。功能強大，用法簡潔高效。在爬蟲領域中佔據著半壁江山的地位。
為什麼要使用requests模組
- 因為在使用urllib模組的時候，會有諸多不便之處，總結如下：
  - 手動處理url編碼
  - 手動處理post請求引數
  - 處理cookie和代理操作繁瑣
  - ......
- 使用requests模組：
  - 自動處理url編碼
  - 自動處理post請求引數
  - 簡化cookie和代理操作
  - ......
如何使用requests模組
- 安裝：
  - pip install requests
- 使用流程
  - 指定url
  - 基於requests模組發起請求
  - 獲取響應物件中的資料值
  - 持久化儲存
通過5個基於requests模組的爬蟲專案對該模組進行學習和鞏固
- 基於requests模組的get請求
  - 需求：爬取搜狗指定詞條搜尋後的頁面資料
- 基於requests模組的post請求
  - 需求：登入豆瓣電影，爬取登入成功後的頁面資料
- 基於requests模組ajax的get請求
  - 需求：爬取豆瓣電影分類排行榜 https://movie.douban.com/中的電影詳情資料
- 基於requests模組ajax的post請求
  - 需求：爬取肯德基餐廳查詢http://www.kfc.com.cn/kfccda/index.aspx中指定地點的餐廳資料
- 綜合練習
  - 需求：爬取搜狗知乎指定詞條指定頁碼下的頁面資料

- 程式碼展示

需求：爬取搜狗指定詞條搜尋後的頁面資料

import requests
import os
#指定搜尋關鍵字
word = input('enter a word you want to search:') #自定義請求頭資訊 headers={ 'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36', } #指定url url = 'https://www.sogou.com/web' #封裝get請求引數 prams = { 'query':word, 'ie':'utf-8' } #發起請求 response = requests.get(url=url,params=param) #獲取響應資料 page_text = response.text with open('./sougou.html','w',encoding='utf-8') as fp: fp.write(page_text)

需求：登入豆瓣電影，爬取登入成功後的頁面資料

import requests
import os
url = 'https://accounts.douban.com/login'
#封裝請求引數 data = { "source": "movie", "redir": "https://movie.douban.com/", "form_email": "15027900535", "form_password": "[email protected]5027900535", "login": "登入", } #自定義請求頭資訊 headers={ 'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36', } response = requests.post(url=url,data=data) page_text = response.text with open('./douban111.html','w',encoding='utf-8') as fp: fp.write(page_text)

需求：爬取豆瓣電影分類排行榜 https://movie.douban.com/中的電影詳情資料

#!/usr/bin/env python
# -*- coding:utf-8 -*-

import requests
import urllib.request if __name__ == "__main__": #指定ajax-get請求的url（通過抓包進行獲取） url = 'https://movie.douban.com/j/chart/top_list?' #定製請求頭資訊，相關的頭資訊必須封裝在字典結構中 headers = { #定製請求頭中的User-Agent引數，當然也可以定製請求頭中其他的引數 'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.181 Safari/537.36', } #定製get請求攜帶的引數(從抓包工具中獲取) param = { 'type':'5', 'interval_id':'100:90', 'action':'', 'start':'0', 'limit':'20' } #發起get請求，獲取響應物件 response = requests.get(url=url,headers=headers,params=param) #獲取響應內容：響應內容為json串 print(response.text)

需求：爬取肯德基餐廳查詢http://www.kfc.com.cn/kfccda/index.aspx中指定地點的餐廳資料

#!/usr/bin/env python
# -*- coding:utf-8 -*-

import requests
import urllib.request if __name__ == "__main__": #指定ajax-post請求的url（通過抓包進行獲取） url = 'http://www.kfc.com.cn/kfccda/ashx/GetStoreList.ashx?op=keyword' #定製請求頭資訊，相關的頭資訊必須封裝在字典結構中 headers = { #定製請求頭中的User-Agent引數，當然也可以定製請求頭中其他的引數 'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.181 Safari/537.36', } #定製post請求攜帶的引數(從抓包工具中獲取) data = { 'cname':'', 'pid':'', 'keyword':'北京', 'pageIndex': '1', 'pageSize': '10' } #發起post請求，獲取響應物件 response = requests.get(url=url,headers=headers,data=data) #獲取響應內容：響應內容為json串 print(response.text)

需求：爬取搜狗知乎指定詞條指定頁碼下的頁面資料

import requests
import os
#指定搜尋關鍵字
word = input('enter a word you want to search:') #指定起始頁碼 start_page = int(input('enter start page num:')) end_page = int(input('enter end page num:')) #自定義請求頭資訊 headers={ 'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36', } #指定url url = 'https://zhihu.sogou.com/zhihu' #建立資料夾 if not os.path.exists('./sougou'): os.mkdir('./sougou') for page in range(start_page,end_page+1): #封裝get請求引數 params = { 'query':word, 'ie':'utf-8', 'page':str(page) } #發起post請求，獲取響應物件 response = requests.get(url=url,params=params) #獲取頁面資料 page_text = response.text fileName = word+'_'+str(page)+'.html' filePath = './sougou/'+fileName with open(filePath,'w',encoding='utf-8') as fp: fp.write(page_text) print('爬取'+str(page)+'頁結束')

爬蟲概念 requests模組

requests模組 - 基於如下5點展開requests模組的學習什麼是requests模組 requests模組是python中原生的基於網路請求的模組，其主要作用是用來模擬瀏覽器發起請求。功能強大，用法簡潔高效。在爬蟲領域中佔據著半壁江山的地位。

爬蟲及requests模組

什麼是爬蟲網路爬蟲（又被稱為網頁蜘蛛，網路機器人，在FOAF社群中間，更經常的稱為網頁追逐者），是一種按照一定的規則，自動地抓取全球資訊網資訊的程式或者指令碼。另外一些不常使用的名字還有螞蟻、自動索引、模擬程式或者蠕蟲。例如：給個網址，可以獲取到該網址裡邊的（圖片， url，視訊，

初級爬蟲(一) requests模組實現網頁批量圖片爬取

思路分析: 已知網頁:如http://www.jiangxian.gov.cn/N20180821093426.html 1, 檢查網頁分析網頁中圖片的地址形式, 2,獲取網頁內容,正則匹配出所有圖片的地址, 3,拼接地址生成列表 4,迴圈列表,生成圖片地址的文字檔案 5,迴圈列表,取出

孤荷凌寒自學python第六十七天初步瞭解Python爬蟲初識requests模組

孤荷凌寒自學python第六十七天初步瞭解Python爬蟲初識requests模組（完整學習過程螢幕記錄視訊地址在文末）從今天起開始正式學習Python的爬蟲。今天已經初步瞭解了兩個主要的模組： requests BeautifulSoup 一

Python爬蟲——利用requests模組爬取妹子圖

近期學了下python爬蟲，利用requests模組爬取了妹子圖上的圖片，給單身狗們發波福利，哈哈！順便記錄一下第一次發部落格。話不多說，進入正題開發環境 python 3.6 涉及到的庫 requests lxml 先上一波爬取的截圖

Python爬蟲之requests模組

獲取響應資訊 import requests response = requests.get('http://www.baidu.com') print(response.status_code) # 狀態碼 print(response.url) # 請求url print(respon

爬蟲--Python常用模組之requests,urllib和re

一、爬蟲的步驟　　1.發起請求，模擬瀏覽器傳送一個http請求　　2.獲取響應的內容　　3.解析內容（解析出對自己有用的部分）　　　　a.正則表示式　　　　b.BeautifulSoup模組　　　　c.pyquery模組　　　　d.selenium模組　　4.儲存資料　　　

針對requests模組的詳細講解！Python爬蟲必學模組！

requests requests庫是 python3 中非常優秀的第三方庫，它使用 Apache2 Licensed 許可證的 HTTP 庫，用 Python 編寫，真正的為人類著想。requests 使用的是 urllib3(python3.x中的urllib)，因此繼承了它的所有特性。Re

[爬蟲小記] 優秀的requests模組

前言除了當初學習爬蟲的時候用過urllib、urllib2，後來再沒用過了。都是使用的requests，本文將記錄一直以來個人使用 requests的經驗總結。正文 import reques

爬蟲-requests模組

引入 Requests 唯一的一個非轉基因的 Python HTTP 庫，人類可以安全享用。警告：非專業使用其他 HTTP 庫會導致危險的副作用，包括：安全缺陷症、冗餘程式碼症、重新發明輪子症、啃文件症、抑鬱、頭疼、甚至死亡。今日概要基於requests的get請求基於r

使用ip代理池爬蟲時，requests模組get請求出現問題_AttributeError: 'str' object has no attribute 'get'

問題描述：專案使用ip代理池對網頁進行資料爬取，但是requests模組get方法出現問題，出錯如下： File "E:\project\venv\lib\site-packages\requests\api.py", line 75, in get r

[Python爬蟲]requests模組使用post方法提交表單

使用requests庫中的post(url,params)方法,先通過觀察表單的網頁原始碼,或者是通過逆向工程的方法獲取表單提交的欄位,構造引數params,就能實現模擬登入操作. 例如: url =

Python爬蟲【urllib3模組】和【requests模組】

前面介紹了urllib為啥還要引入urllib3模組？原因是：urllib3是比urllib更好用的API。需要自行安裝。在Pycharm的Terminal中輸入：pip install urllib3。例：urllib3中的PoolManager()模組使用ur

爬蟲（1）：requests模組

requests介紹： reqeusts模組：python原生一個基於網路請求的模組，模擬瀏覽器發起請求。 requests模組的優點： - 1.自動處理url編碼 - 2.自動處理post請求的引數 - 3.簡化cookie的代理的操作： cookie操作： - 建立一個coo

Python3爬蟲實戰（requests模組）

上次我通過兩個實戰教學展示瞭如何使用urllib模組（http://blog.csdn.net/mr_blued/article/details/79180017）來構造爬蟲，這次告訴大家一個更好的實現爬蟲的模組，requests模組。使用requests模組進行爬蟲構造時最

爬蟲（一）——用Requests模組獲取網頁資訊

呼叫requests庫裡面的get方法，獲取網頁的資訊，呼叫page.text獲取網頁原始碼，然後通過print打印出網頁原始碼 import requests page = requests.get(

20170717_python爬蟲之requests+cookie模擬登陸

ssi alert 之前 lose net .html .net 裝載 onos 在成功登陸之前,失敗了十幾次。完全找不到是什麽原因導致被網站判斷cookie是無效的。直到用了firefox的httpfox之後才發現cookie裏還有一個ASP.NET_SessionI

python爬蟲之requests模塊

.post 過大 form表單提交 www xxxxxx psd method date .com 一. 登錄事例 a. 查找汽車之家新聞標題鏈接圖片寫入本地 import requests from bs4 import BeautifulSoup import

爬蟲之requests介紹

get() cer 高級用法 alter name www. error 時間 conn 一介紹 1 介紹：使用requests可以模擬瀏覽器的請求，比起之前用到的urllib，requests模塊的api更加便捷（本質就是urllib3） 2 註意：requests庫

爬蟲之requests庫

響應 image ocs dex ren 人性化 setting req ems Why requests python的標準庫urllib2提供了大部分需要的HTTP功能，但是API太逆天了，一個簡單的功能就需要一大堆代碼。 Requests 使用的是 urllib3

爬蟲概念 requests模組

requests模組

- 基於如下5點展開requests模組的學習

- 程式碼展示

相關推薦