1. 程式人生 > 其它 >MySQL資料庫函式、儲存過程

MySQL資料庫函式、儲存過程

常用外接模組

1、requests

Python第三方庫requests比python的內建庫urllib處理URL資源更方便

1、使用requests

GET訪問一個頁面

  • 當獲取的首頁亂碼時,可以用encoding/content設定解碼方式
import requests
r = requests.get('https://www.baidu.com/')
#用encoding解碼獲取的內容
r.encoding='utf-8'  #設定編碼方式
print(r.encoding)   #檢測編碼方式
print(r.status_code) #狀態碼判斷請求是否成功
print(r.text)   #文字內容
print(r.url)    #實際請求的url
#用content解碼獲取的內容
r.content.decode()    #用content獲得bytes物件並用decode解碼
print(r.text)

  • 可以用來判斷請求是否成功
assert response.status_code==(num)
  • 檢視請求的響應頭以及相應的url
import requests
response = requests.get('https://www.sina.com')
print(response.headers)
print(response.request.url)
print(response.url)

  • 可以構造正確的headers頭部,來請求網頁得到完整的頁面內容
import requests
headers={'User-Agent':'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.100 Safari/537.36'}
response = requests.get('https://www.baidu.com',headers = headers)
print(response.headers)
print(response.content.decode())

  • 在requests中的response.requests.url的返回結果中存在url編碼,需要url解碼
import requests
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.100 Safari/537.36'}
p = {'wd':'耐克'}
url_tem = 'https://www.baidu.com/s?'
r = requests.get(url_tem,headers = headers, params = p)
print(r.status_code)
print(r.request.url)	#返回結果存在url編碼
print(r.content)
print(r.text)

  • 爬取耐克百度貼吧的網頁資訊,並儲存到本地
import requests
class TiebaSpider:
    def __init__(self,tiebaname):
        self.tiebaname = tiebaname
        self.headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.100 Safari/537.36'}
        self.url_temp = 'https://tieba.baidu.com/f?kw='+tiebaname+'&ie=utf-8&pn={}'


    def get_url_list(self):
        url_list =[]
        for i in range(1000):
            url_list.append(self.url_temp.format(i*50))
        return url_list

    def parse_url(self,url):
        response = requests.get(url,headers = self.headers)
        return response.content.decode()

    def html_save(self,html_str,pagename):
        file_path = '{}第{}頁.html'.format(self.tiebaname,pagename)
        with open(file_path,'w',encoding='utf-8') as f:
            f.write(html_str)

    def run(self):
        get_list = self.get_url_list()
        for url in get_list:
            html_str = self.parse_url(url)
            pagename = get_list.index(url)+1
            save = self.html_save(html_str,pagename)


if __name__ == '__main__':
    tieba_spaider = TiebaSpider('耐克')
    tieba_spaider.run()