Python3爬蟲學習筆記（2.Requests庫詳解)

阿新 • • 發佈：2018-12-27

Requests庫功能相比Urllib庫更強大，也許是自帶的

如果沒有，cmd輸入pip install requests獲取即可

例項：

import requests
response = requests.get("http://www.baidu.com")
print(type(response))
print(response.status_code)
print(type(response.text))
print(response.text)
print(response.cookies)

各種請求方式：

import requests
requests.post("http://httpbin.org/post" 
)
requests.put("http://httpbin.org/put")
requests.delete("http://httpbin.org/delete")
requests.head("http://httpbin.org/head")
requests.options("http://httpbin.org/options")
requests.get("http://httpbin.org/get")

基本get請求：

import requests
response = requests.get("http://httpbin.org/get")
print(response.text)

帶引數get請求1:

import requests
response = requests.get("http://httpbin.org/get?name=yiqing&age=18")
print(response.text)

帶引數get請求2：

import requests
data = {
    "name": "yiqing",
"age": 18
}
response = requests.get("http://httpbin.org/get", params=data)
print(response.text)

解析JSON：

import requests
response = requests.get("http://httpbin.org/get" 
)
print(response.json())
print(type(response.json()))

獲取二進位制資料（給出了一個圖片的地址，下載到本地）：

import requests
response = requests.get("http://p1.music.126.net/vvZLXI5EqFLsKLlvfqz0uA==/19088621370291879.jpg")
with open("pic.jpg", "wb") as f:
    f.write(response.content)

新增headers:

import requests
headers = {
    "User-Agent":"Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.84 Safari/537.36"
}
response = requests.get("http://www.baidu.com", headers=headers)
print(response.text)

基本post請求：

import requests
data = {
    "name": "yiqing",
"age": 18
}
response = requests.post("http://httpbin.org/post", data=data)
print(response.text)

帶引數post請求：

import requests
data = {
    "name": "yiqing",
"age": 18
}
headers = {
    "User-Agent":"Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.84 Safari/537.36"
}
response = requests.post("http://httpbin.org/post", data=data, headers=headers)
print(response.json())

狀態碼判斷:

import requests
response = requests.get("http://www.baidu.com")
exit()if not response.status_code == requests.codes.ok else print("Requests Successfully")

相當於：

import requests
response = requests.get("http://www.baidu.com")
exit()if not response.status_code == 200 else print("Requests Successfully")

檔案上傳（用剛才下載的圖片）：

import requests
files = {"file": open("pic.jpg", "rb")}
response = requests.post("http://httpbin.org/post", files=files)
print(response.text)

獲取cookie:

import requests
response = requests.get("http://blog.csdn.net/")
print(response.cookies)
for key, value in response.cookies.items():
    print(key+"="+value)

嘗試模擬登入:

import requests
requests.get("http://httpbin.org/cookies/set/number/123456789")
response = requests.get("http://www.httpbin.org/cookies")
print(response.text)

發現這樣登入會失敗，這裡兩次用get方式發起請求，實際上這兩次是完全獨立的過程，相當於在兩個沒有關聯的瀏覽器上進行操作的。因此獲取不到任何的cookie資訊。要做到兩次訪問用一個瀏覽器，就是下面的方法：

import requests
s = requests.Session()
s.get("http://httpbin.org/cookies/set/number/123456789")
response = s.get("http://www.httpbin.org/cookies")
print(response.text)

證書驗證：

import requests
response = requests.get("https://www.12306.cn")
print(response.status_code)

發現出現了錯誤，原因是沒有合法證書

這裡可以避免：

import requests
response = requests.get("https://www.12306.cn", verify=False)
print(response.status_code)

不過依然有警告資訊，再這樣操作可以去除警告資訊：

import requests
from requests.packages import urllib3
urllib3.disable_warnings()
response = requests.get("https://www.12306.cn", verify=False)
print(response.status_code)

代理設定：

import requests
proxies = {
    "http": "http://222.222.169.60:53281",
"https": "http://222.222.169.60:53281"
}
response = requests.get("http://www.baidu.com", proxies=proxies)
print(response.status_code)

有密碼的代理：

import requests
proxies = {
    "http": "http://user:[email protected]:53281/",
}
response = requests.get("http://www.baidu.com", proxies=proxies)
print(response.status_code)

超時設定：

import requests
response = requests.get("http://www.baidu.com", timeout=1)
print(response.status_code)

抓住異常：

import requests
from requests.exceptions import ConnectTimeout
try:
    response = requests.get("http://httpbin.org/get", timeout=0.1)
    print(response.status_code)
except ConnectTimeout:
    print("TIMEOUT")

Python3爬蟲學習筆記（2.Requests庫詳解)

Requests庫功能相比Urllib庫更強大，也許是自帶的如果沒有，cmd輸入pip install requests獲取即可例項： import requests response = r

Python3爬蟲學習筆記（1.urllib庫詳解）

1.什麼是爬蟲：略，到處都有講解。雖然是入門，不過沒有Python基礎的同學看起來可能費勁，建議稍學下Python 之前學習前端知識也是為了能看懂HTML，便於爬蟲學習，建議瞭解下前端知識 2.re

Python3爬蟲學習筆記（4.BeautifulSoup庫詳解）

這是一個功能強大的庫，可以代替很多需要寫正則的地方這是一個第三方解析庫，常規安裝方法：調出cmd：pip install bs4 簡單瞭解： html = """ <html lang=

python爬蟲學習筆記二：Requests庫詳解及HTTP協議

Requests庫的安裝：https://mp.csdn.net/postedit/83715574 r=requests.get(url,params=None,**kwargs) 這個r是Response物件 url ：擬獲取頁面的url連結 params：url中的額外引數

【Python3 爬蟲學習筆記】部署相關庫的安裝

如果想要大規模抓取資料，那麼一定會用到分散式爬蟲。對於分散式爬蟲來說，我們需要多臺主機，每臺主機有多個爬蟲任務，但是原始碼其實只有一份。對於Scrapy來說，它有一個擴充套件元件，叫作Scrapyd，我們只需要安裝該擴充套件元件，即可遠端管理Scrapy任務

python爬蟲學習筆記1：requests庫及robots協議

The Website is the API requests庫 requests庫的7個主要方法 requests.request 構造一個請求 requests.request(method,url,[**kwarges]) me

Python3爬蟲學習筆記1.2——模擬登入

歡迎捧場，上一篇我們學習了urllib官方庫的一些使用方法，今天的主要工作內容是利用Python來模擬登入網站，我們選擇用知乎做實驗，前一段時間登入知乎好像需要驗證碼，現在又可以直接登陸了，比較簡單，也有各網站登入的普遍性，而且傳輸的過程中沒有對使用者名稱和密碼

python3爬蟲學習筆記（一）初入爬蟲 urllib學習

一、爬蟲是什麼網路爬蟲（也叫做網頁蜘蛛），是一種按照一定的規則，自動地抓取全球資訊網資訊的程式或者指令碼。如果把網際網路比做成一個大的蜘蛛網，蜘蛛網上每個節點都有大量的資料，爬蟲就像一隻小蜘蛛通過網頁的地址找到網站並獲取資訊：HTML程式碼/JSON資料/二進位制資料（圖

PHP7 學習筆記（十三）composer詳解一

導出 php開發 HR build osi oschina sni 區別優秀摘要　　從拷貝第三方代碼到項目中(1994)，到PEAR安裝依賴包(1999)，再到Composer興起(2012)，PHP社區經歷了將近20年的探索。PHP這門古老的語言，也在不斷的發展更新

Bullet 物理引擎學習筆記（1） -- HelloWorld 詳解

本文將對 Bullet 例程 HelloWorld 中的各語句，分析其內部的操作過程。首先是包含了標頭檔案： #include "btBulletDynamicsCommon.h" 1、首先定義了用於配置碰撞的 btCollisionConfigur

Java學習筆記（31）--介面詳解（一）

一、基本概念介面（Interface），在JAVA程式語言中是一個抽象型別，是抽象方法的集合。介面通常以interface來宣告。一個類通過繼承介面的方式，從而來繼承介面的抽象方法。如果一個類只由抽象方法和全域性常量組成，那麼這種情況下不會將其定義為一個抽象類。只會定義為一個介面，所以介面嚴格的來

SpringMVC學習筆記（三）切面詳解

為了更好的理解AOP，先來了解幾個AOP的術語。通知（Advice）指切面（定義為aspect的類）中的工作； spring切面可以應用的五種通知：前置通知（Before）：在目標方法被呼叫之前呼叫通知功能；後置通知（After）：在目標方法完成之後呼叫通知，此時

《崔慶才Python3網路爬蟲開發實戰教程》學習筆記（2）：常用庫函式的安裝與配置

python的一大優勢就是庫函式極其豐富，網路爬蟲工具的開發使用也是藉助於這一優勢來完成的。那麼要想用Python3做網路爬蟲的開發需要那些庫函式的支援呢？與網路爬蟲開發相關的庫大約有6種，分別為：請求庫：requests，selenium，ChromeDrive

Python爬蟲學習筆記（二）——requests庫的使用

pip 安裝 .text rep 瀏覽器 ror clas ade 學習筆記準備工作 requests庫不是python自帶的庫，可以用pip安裝。在使用時導入requests即可。基本用法 GET請求 r = requests.get(url) print(r.tex

【Python3 爬蟲學習筆記】基本庫的使用 7 —— 使用requests

抓取二進位制資料前面我們抓取知乎的一個頁面，實際上它返回的是一個HTML文件。如何抓取圖片、音訊、視訊？圖片、音訊、視訊這些檔案本質上都是由二進位制碼組成的，由於有特定的儲存格式和對應的解析方式，我們才可以看到這些形形色色的多媒體，所以要抓取它們，就要拿到它們的二進位制碼。抓取

【Python3 爬蟲學習筆記】基本庫的使用 9—— 正則表示式 2

1.1 匹配目標如果想從字串中提取出一部分內容，可以用()括號將想提取的子字串括起來。()實際上標記了一個子表示式的開始和結束位置，被標記的每個子表示式會一次對應一個分組，呼叫group()方法傳入分組的索引即可獲取提取的結果。示例如下： import re content =

【Python3 爬蟲學習筆記】解析庫的使用 2 —— 使用XPath 2

8. 文字獲取我們使用XPath中的text()方法獲取節點中文字，接下來嘗試獲取前面li節點中的文字，相關程式碼如下： from lxml import etree html = etree.parse('./test.html', etree.HTMLParser()) re

【Python3 爬蟲學習筆記】解析庫的使用 4 —— Beautiful Soup 2

父節點和祖先節點如果要獲取某個節點元素的父節點，可以呼叫parent屬性： html = """ <html> <head> <title>The Dormouse's story</title> </head> <

【Python3 爬蟲學習筆記】解析庫的使用 9 —— 使用pyquery 2

遍歷 pyquery的選擇結果可能是多個節點，也可能是單個節點，型別都是PyQuery型別，並沒有返回像Beautiful Soup那樣的列表。對於單個節點來說，可以直接列印輸出，也可以直接轉成字串： from pyquery import PyQuery as pq doc =

【Python3 爬蟲學習筆記】基本庫的使用 2

1.3 高階用法 urllib.request模組裡的BaseHandler類，是所有其他Handler的父類，它提供了最基本的方法，例如default_open()、protocol_request()等。 HTTPDefaultErrorHandler

Python3爬蟲學習筆記（2.Requests庫詳解)

相關推薦