1. 程式人生 > 實用技巧 >【Python】 requests 爬取部落格園內容AttributeError: 'NoneType' object has no attribute 'xpath'

【Python】 requests 爬取部落格園內容AttributeError: 'NoneType' object has no attribute 'xpath'

本篇文章主要介紹requests獲取網頁內容出現 'NoneType' object has no attribute 'xpath' 異常的解決思路

下面是出錯的程式碼:

import requests
from lxml import etree
response = requests.get('https://blog.csdn.net/it_xf?viewmode=contents')
etree_html = etree.HTML(response.text)
content = etree_html.xpath('//*[@id="mainBox"]/main/div[2]/div[1]/h4/a/text()')

for each in content:
    replace = each.replace('\n', '').replace(' ', '')
    if replace == '\n' or replace == '':
        continue
    else:
        print(replace)  

1、錯誤分析

獲取到的html.text 為 空字串;所以下面丟擲異常NoneType

原因是請求Get 需要增加 headers來解決反扒;模擬瀏覽器請求來獲取資料;

2、解決辦法

首先找到需要的headers,headers 如何尋找?看下圖的標記:

然後直接把上面的headers複製出來放到程式碼中進行改造;

改造後的程式碼如下:

import requests
from lxml import etree
headers = {
    'user-agent':
        'Mozilla / 5.0(Windows NT 10.0; WOW64) '
        'AppleWebKit / 537.36(KHTML, likeGecko) '
        'Chrome / 53.0.2785.104Safari / 537.36Core / 1.53.4882.400QQBrowser / 9.7.13059.400'
}
response = requests.get('https://blog.csdn.net/it_xf?viewmode=contents', headers = headers)
etree_html = etree.HTML(response.text)
content = etree_html.xpath('//*[@id="mainBox"]/main/div[2]/div[1]/h4/a/text()')

for each in content:
    replace = each.replace('\n', '').replace(' ', '')
    if replace == '\n' or replace == '':
        continue
    else:
        print(replace)