python爬蟲(使用requests)報錯,UnicodeEncodeError: ‘latin-1‘ codec can‘t encode characters in position
阿新 • • 發佈:2020-12-27
1、初學爬蟲,在寫爬取拉勾網職位資訊程式時,遇到報錯如下:
2、查詢資料後發現,在使用response.post傳送帶中文的json資料時,就出現如題所示錯誤,是因為編碼問題:
'referer':referer.encode("utf-8").decode("latin1")
附帶下這個爬蟲程式(初學--我也是看著別人敲得):
import requests class Config: kd = '資料分析' referer = 'https://www.lagou.com/jobs/list_資料分析?labelWords=&fromSearch=true&suginput=' headers = { 'Accept':'application/json,text/javascript,*/*;q=0.01', 'referer':referer.encode("utf-8").decode("latin1"), 'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4315.4 Safari/537.36' } class Spider: def __init__(self,kd=Config.kd): self.kd = kd self.url = Config.referer self.api = 'https://www.lagou.com/jobs/positionAjax.json' # 必須先請求referer網址 self.sess = requests.session() self.sess.get(self.url,headers=Config.headers) def get_position(self,pn): data = { 'first':'true', 'pn':str(pn), 'kd':self.kd } # 向API發起POST請求 r = self.sess.post(self.api,headers=Config.headers,data=data) # 直接.json()解析資料 return r.json()['content']['positionResult']['result'] def engine(self,total_pn): for pn in range(1,total_pn + 1): results = self.get_position(pn) for pos in results: print(pos['positionName'],pos['companyShortName'],pos['workYear'],pos['salary']) if __name__ == '__main__': lagou = Spider() lagou.engine(2)