1. 程式人生 > 其它 >python爬蟲(使用requests)報錯,UnicodeEncodeError: ‘latin-1‘ codec can‘t encode characters in position

python爬蟲(使用requests)報錯,UnicodeEncodeError: ‘latin-1‘ codec can‘t encode characters in position

技術標籤:爬蟲分割槽python爬蟲post

1、初學爬蟲,在寫爬取拉勾網職位資訊程式時,遇到報錯如下:

2、查詢資料後發現,在使用response.post傳送帶中文的json資料時,就出現如題所示錯誤,是因為編碼問題:

'referer':referer.encode("utf-8").decode("latin1")

附帶下這個爬蟲程式(初學--我也是看著別人敲得):

import requests

class Config:
    kd = '資料分析'
    referer = 'https://www.lagou.com/jobs/list_資料分析?labelWords=&fromSearch=true&suginput='
    headers = {
        'Accept':'application/json,text/javascript,*/*;q=0.01',
        'referer':referer.encode("utf-8").decode("latin1"),
        'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4315.4 Safari/537.36'
    }

class Spider:

    def __init__(self,kd=Config.kd):
        self.kd = kd
        self.url = Config.referer
        self.api = 'https://www.lagou.com/jobs/positionAjax.json'

        # 必須先請求referer網址
        self.sess = requests.session()
        self.sess.get(self.url,headers=Config.headers)

    def get_position(self,pn):
        data = {
            'first':'true',
            'pn':str(pn),
            'kd':self.kd
        }

        # 向API發起POST請求
        r = self.sess.post(self.api,headers=Config.headers,data=data)

        # 直接.json()解析資料
        return r.json()['content']['positionResult']['result']

    def engine(self,total_pn):
        for pn in range(1,total_pn + 1):
            results = self.get_position(pn)
            for pos in results:
                print(pos['positionName'],pos['companyShortName'],pos['workYear'],pos['salary'])

if __name__ == '__main__':
    lagou = Spider()
    lagou.engine(2)

爬取到的結果示例: