python爬蟲:獲取標籤內部全部文字
阿新 • • 發佈:2018-12-30
取出以下字串:親測連結
我要取出text內容,怎麼取呢,很多方法,bs4也可以,正則也可以,動態selenium也可以,這次我們先實現xpath,xpath的確很強大,不多說,上程式。
通過text獲取文字
import reqiests from lxml import etree url = 'https://tieba.baidu.com/p/5815118868?pn=&red_tag=1075036600' headers = {'User-Agent':'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36'} response = requests.get(url,headers=headers).tetx code = etree.HTML(response) info = code.xpath('//div['//div[@class="d_post_content_main d_post_content_firstfloor"]/div/cc/div/text()') #/text()獲取標籤的文字 //text()獲取標籤以及子標籤的文字 print(info)#獲取的文字還要進行美化修改
使用xpath('string(.)')獲取文字
import reqiests from lxml import etree url = 'https://tieba.baidu.com/p/5815118868?pn=&red_tag=1075036600' headers = {'User-Agent':'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36'} response = requests.get(url,headers=headers).tetx code = etree.HTML(response) code.xpath('//div[@class="d_post_content_main d_post_content_firstfloor"]')