python爬蟲入門之————————————————第三節requests詳解

阿新 • • 發佈：2018-11-30

1.下載安裝

（1）命令安裝方式

Windows：開啟命令視窗行，直接執行包管理命令安裝

pip install requests or essy_install requests(簡易版)

unix/linux:開啟 shell 視窗，執行包管理命令安裝 pip install requests

（2）離線安裝

下載離線安裝包 pip install requests-2.20.0-py2.py3-none-any.whl

官方網站 https://pypi.org/project/pip/

2.入門程式

# 引入依賴包 
import requests 
# 傳送請求獲取伺服器資料
 response = requests.get("http://www.sina.com.cn") 
# 得到資料 
print(response.text)

request.get :用於傳送一個get請求給伺服器，可以得到伺服器的響應資料
response.text :從響應物件中獲取文字資料

ps：自己try do it

3.請求物件:請求方式

⚫ requests.request(method, url, **kwargs): ◼ 底層傳送請求的操作方式

⚫ requests.get(url, params=None, **kwargs): ◼ 傳送 GET 請求

⚫ requests.post(url, data=None, json=None, **kwargs): ◼ 傳送 POST 請求

⚫ requests.put(url, data=None, **kwargs): ◼ 傳送 PUT 請求

⚫ requests.delete(url, **kwargs): ◼ 傳送 DELETE 請求

⚫ requests.patch(url, data=None, **kwargs): ◼ 傳送 PATCH 請求

⚫ requests.options(url, **kwargs): ◼ 傳送 OPTIONS 請求

⚫ requests.head(url, **kwargs): ◼ 傳送 HEAD 請求

4.請求物件：GET引數傳遞

requests.get（url， params=None, **kwargs） #傳送GET請求
@param url: get 請求伺服器的地址

@param params: get 請求中附帶的引數

@param kwargs: 其他附帶引數，詳情參照 requests.request()原始碼

import requests 

target_url = 'http://www.baidu.com/s'    #定義目標url

data = {'wd': '魔道祖師'} 

response = requests.get(target_url, params=data) 

print(response.text)

⚫ get 請求方式要傳遞的引數是字典形式的資料，直接賦值給 params 引數即可

5.請求物件：POST引數的傳遞

requests.post(url, data=None, json=None, **kwargs) #傳送POST的請求

@param url: post 請求伺服器的 url 地址

@param data: post 請求中包含的常規引數資料

@param json: post 請求中包含的 like dict 資料/json 引數資料

# 引入依賴的模組
 import requests # 定義目標 url 地址
 # url = 'http://fanyi.youdao.com/translate_o?smartresult=dict&smartresult=rule' url = 'http://fanyi.youdao.com/translate?smartresult=dict&smartresult=rule' # 傳遞 post 中包含的引數
 data = {   

    "i":"hello",
    "from":"AUTO",
    "to":"AUTO",
    "smartresult":"dict",
    "client":"fanyideskweb",    
    "salt":"1541660576025",
    "sign":"4425d0e75778b94cf440841d47cc64fb", 
    "doctype":"json",
    "version":"2.1",
    "keyfrom":"fanyi.web", 
    "action":"FY_BY_REALTIME",
    "typoResult":"false", 

    } # 傳送請求獲取伺服器返回的響應資料
response = requests.post(url, data=data) 
print(response.text)

6.請求物件：定製請求頭

requests模組中的請求底層都是通過requests.request(url,**kwargs)的headers引數進行操作

import requests   # 引入依賴的模組
from fake_useragent import UserAgent 


ua = UserAgent() # 定義請求地址和請求頭資料
url = 'http://www.baidu.com/s' 
headers = {'User-agent': ua.random} 
param = {'wd': 'PYTHON 爬蟲'} # 傳送請求獲取響應資料
response = requests.get(url, headers=headers) 
print(response.text)

7.請求物件：cookie

WEB 開發中 cookie 經常被用於基於客戶端的狀態保持操作，所以在常規爬蟲處理過程中,cookie 的操作是最重要的操作之一。可以直接通過定義一個字典資料傳遞給 requests 請求模組的 cookies 引數，新增請求中的 cookie 資料 requests 中提供的 requests.cookies.RequestCookieJar()也可以在請求中新增處理 cookie 資料的操作並且更加適合跨域場景

8.響應物件
爬蟲從網路上採集資料，採集到的資料主要區分為如下幾種型別：文字資料、二進位制資料 requests 模組在響應物件中，針對返回的資料進行了不同的封裝處理

⚫ response.encoding: 設定響應資料的編碼，可以直接賦值 ◼ response.encoding = ‘utf-8’
⚫ response.text: 獲取響應物件中包含的文字資料
⚫ response.content: 獲取響應物件中包含的二進位制資料
⚫ response.json(): 獲取響應物件中的 JSON 資料，資料必須正確解析，負責 raise ValueError
⚫ response.raw: 特殊情況下直接獲取底層 socket 資料流，此時請求中必須設定引數 stream=True 表示允許資料流處理
⚫ response.headers: 響應物件的響應頭資料
⚫ response.status_code: 響應物件中的響應狀態碼
⚫ response.cookie:獲取響應物件中包含的 cookie 資料

9.案例演示

登陸入口：http://www.renren.com/PLogin.do
登入成功後，儘可能多的採集人人網使用者資料資訊

提取人人網使用者資料的思路
    1. 使用註冊賬號，登入人人網[才有許可權檢視個人資訊]
    2. 關注幾個網紅~爬蟲採集
        採集1：他們關注的人|關注他的人 個人主頁 連結
        採集2：當前使用者的個人資訊[姓名、年齡、畢業學校、家鄉地址...]

    3. 迴圈採集每個人的個人主頁連結改造後的 關注的人|關注他的人  個人主頁

Try DO It

"""
Version 1.1.0
Author lkk
Email [email protected]
date 2018-11-21 16:55
DESC 人人網資訊爬取
"""
from selenium import webdriver
import requests
from lxml import html
from fake_useragent import UserAgent
import time, re
import utils1

ua = UserAgent()

# 人人網：使用者資料採集
login_url = "http://www.renren.com/PLogin.do"
# 賬號密碼
authentication = {'email': '1307170****', 'password': '*******'}
# 偽造請求頭
headers = {
    'User-agent': ua.random,
}
# 傳送請求，登入網站
session = requests.Session()
response = session.post(login_url, data=authentication, headers=headers)
response.encoding = 'utf-8'


# 獲取個人資訊:模擬
personal_url = 'http://follow.renren.com/list/968835593/pub/v7'


# 訪問該地址，需要攜帶身份資訊：狀態保持使用的cookie資料

response2 = session.get(personal_url, headers=headers)
response2.encoding = 'utf-8'
# print()
# 處理結果資料
docs = html.fromstring(response2.text)
data = docs.xpath("//div[@class='module border']/ul[@id='follow_list']/li//div[@class='info']/a[@class='name']/@href")
name = docs.xpath("//a[@class='name']/text()")
for i in range(len(data)):
    print(name[i], data[i])
# 從關注的網紅裡面爬取他們的粉絲資訊
    response3 = session.get(data[i], headers=headers)
    response3.encoding = 'utf-8'
    # print(response3.text)
    docs1 = html.fromstring(response3.text)
    link = docs1.xpath("//div[@class='has-friend']/h5/a[@class='title']/@href")
    id = docs1.xpath("//button[@id='followAdd']/@data-id")
    print(id[0])
    school = docs1.xpath("//ul/li[@class='school']/span/text()")
    sex = docs1.xpath("//ul/li[@class='birthday']/span[1]/text()")
    bron = docs1.xpath("//ul/li[@class='birthday']/span[2]/text()")
    hometown = docs1.xpath("//ul/li[@class='hometown']/text()")
    school = school if len(school) > 0 else "暫無"
    sex = sex if len(sex) > 0 else "暫無"
    bron = bron if len(bron) > 0 else "暫無"
    bron = bron if len(bron) > 0 else "暫無"
    print(link[0])
    time.sleep(5)
    print(school, sex, bron, hometown)
    # 自己關注網紅的粉絲
    response4 = session.get(link[0], headers=headers)
    response4.encoding = 'utf-8'
    docs2 = html.fromstring(response4.text)
    all_link = docs2.xpath("//li/div[@class='info']/a[@class='name']/@href")
    fans_count = docs2.xpath("//ul/li[@class='select']/span/text()")
    name = docs2.xpath("//div[@class='info']/a[@class='name']/text()")
    fans_number = docs2.xpath("//div[@class='info']/p[@class='atten']/text()")
    for info in range(len(name)):
        print(name[info], fans_number[info], all_link[info])
        # utils1.mysql(name[info], fans_number[info], all_link[info])
    for k in range(20, int(fans_count[0]), 10):
        long_index = 'http://follow.renren.com/list/'+id[0]+'/submore?visitId=968835593&offset='+str(k)+'&limit=10&requestToken=1198899405&_rtk=1336bf27'

        # target_url = all_link[j] + '/profile/'
        # print(target_url)
        response5 = session.get(long_index, headers=headers)
        response5.encoding = 'utf-8'
        docs3 = html.fromstring(response5.text)
        # print(response5.text)
        fans_id = re.findall(r'"id":(\d+),', response5.text)
        fans_name = re.findall(r'name":"(.*?)"', response5.text)
        fans_fans_count = re.findall(r',"subscriberCount":(.*?),"', response5.text)
        for n in range(len(fans_id)):
            if int(fans_fans_count[n]) > 0:
                print(fans_name[n], fans_fans_count[n], "http://www.renren.com/" + fans_id[n] + '/profile?v=info_timeline')
                # utils1.mysql(fans_name[n], fans_fans_count[n], "http://www.renren.com/" + fans_id[n] + '/profile/')
            else:
                pass
        # fans_school = docs3.xpath("//ul/li[@class='school']/span/text()")
        # fans_sex = docs3.xpath("//ul/li[@class='birthday']/span[1]/text()")
        # fans_bron = docs3.xpath("//ul/li[@class='birthday']/span[2]/text()")
        # fans_hometown = docs3.xpath("//ul/li[@class='hometown']/text()")
        # fans_school = fans_school if len(fans_school) > 0 else "暫無"
        # fans_sex = fans_sex if len(fans_sex) > 0 else "暫無"
        # fans_bron = fans_bron if len(fans_bron) > 0 else "暫無"
        # fans_hometown = fans_hometown if len(fans_hometown) > 0 else "暫無"
        # print(fans_school, fans_sex, fans_bron, fans_hometown)
        # time.sleep(5)

python爬蟲入門之————————————————第三節requests詳解

1.下載安裝

pip install requests or essy_install requests(簡易版)

unix/linux:開啟 shell 視窗，執行包管理命令安裝 pip install requests

2.入門程式

ps：自己try do it

3.請求物件:請求方式

4.請求物件：GET引數傳遞

5.請求物件：POST引數的傳遞

6.請求物件：定製請求頭

7.請求物件：cookie

8.響應物件
爬蟲從網路上採集資料，採集到的資料主要區分為如下幾種型別：文字資料、二進位制資料 requests 模組在響應物件中，針對返回的資料進行了不同的封裝處理

9.案例演示

python爬蟲入門之————————————————第三節requests詳解

python爬蟲入門之————————————————第四節--使用bs4語法獲取資料

python 爬蟲入門之正則表達式一

Python 爬蟲入門之爬取妹子圖

Python爬蟲入門之五Handler處理器和自定義Opener

Python爬蟲入門之二HTTP(HTTPS)請求與響應

python爬蟲入門之————————————————案例演練

Python爬蟲入門之豆瓣短評爬取

python爬蟲入門之爬取小說.md

python程式設計入門之六：字串詳講

Python爬蟲入門之使用Redis+Flask維護動態代理池

第十七節：Scrapy爬蟲框架之Middleware文件詳解

第十八節：Scrapy爬蟲框架之settings文件詳解

python爬蟲利器 scrapy和scrapy-redis 詳解一入門demo及內容解析

python目錄操作之os.path模組詳解

(二)NIO入門之緩衝區內部細節詳解

Delphi 之第三課詳解資料型別

數字貨幣錢包入門之--極客錢包詳解

Python爬蟲入門一之綜述

Python爬蟲入門三之Urllib庫的基本使用

python爬蟲入門之————————————————第三節requests詳解

1.下載安裝

pip install requests or essy_install requests(簡易版)

unix/linux:開啟 shell 視窗，執行包管理命令安裝 pip install requests

2.入門程式

ps：自己try do it

3.請求物件:請求方式

4.請求物件：GET引數傳遞

5.請求物件：POST引數的傳遞

6.請求物件：定製請求頭

7.請求物件：cookie

8.響應物件 爬蟲從網路上採集資料，採集到的資料主要區分為如下幾種型別：文字資料、二進位制資料 requests 模組在響應物件中，針對返回的資料進行了不同的封裝處理

9.案例演示

相關推薦

8.響應物件
爬蟲從網路上採集資料，採集到的資料主要區分為如下幾種型別：文字資料、二進位制資料 requests 模組在響應物件中，針對返回的資料進行了不同的封裝處理