爬蟲一

阿新 • • 發佈：2017-05-10

.com 封裝 ice nbsp auto 方法覆蓋 parse ext .get

初識爬蟲

 1 #! /usr/bin/env python
 2 # encoding: utf-8
 3 
 4 from bs4 import BeautifulSoup
 5 import requests
 6 
 7 
 8 response = requests.get("http://www.autohome.com.cn/news/")  
 9 # response.text
10 response.encoding = response.apparent_encoding  # 解決爬蟲亂碼
11 
12 soup = BeautifulSoup(response.text, features=" 
html.parser")  # 生成Soup對象
13 soup_obj = soup.find(id="auto-channel-lazyload-article")  # find查找第一個符合條件的對象
14 
15 li_list = soup_obj.find_all("li")  # find_all查找所有符合的對象，查找出來的值在列表中
16 # print(target)
17 for i in li_list:
18     a = i.find("a")
19     if a:
20         a_attrs = a.attrs.get("href")  # attrs查找屬性 

21         print(a_attrs)
22         a_h = a.find("h3")
23         print(a_h)
24         img = a.find("img")
25         print(img)

requests

Python標準庫中提供了：urllib、urllib2、httplib等模塊以供Http請求，但是，它的 API 太渣了。它是為另一個時代、另一個互聯網所創建的。它需要巨量的工作，甚至包括各種方法覆蓋，來完成最簡單的任務。

Requests 是使用 Apache2 Licensed 許可證的基於Python開發的HTTP 庫，其在Python內置模塊的基礎上進行了高度的封裝，從而使得Pythoner進行網絡請求時，變得美好了許多，使用Requests可以輕而易舉的完成瀏覽器可有的任何操作。

爬蟲一

.com 封裝 ice nbsp auto 方法覆蓋 parse ext .get 初識爬蟲 1 #! /usr/bin/env python 2 # encoding: utf-8 3 4 from bs4 import BeautifulSoup 5 im

爬蟲一

爬蟲一

java爬蟲一（分析要爬取數據的網站）

day51——爬蟲(一)

Python3網絡爬蟲(一)：利用urllib進行簡單的網頁抓取

python爬蟲(一)

Python網絡爬蟲(一)

Python 爬蟲一

Xpath語法-爬蟲(一)

爬蟲(一):基本內容回顧

python網路爬蟲一

python進階一（簡易爬蟲一）

Python爬蟲 | 一條高效的學習路徑

【Python實戰】用Scrapyd把Scrapy爬蟲一步一步部署到騰訊雲

一起學爬蟲——一步一步打造爬蟲代理池

爬蟲一 FIDDLER抓包工具的使用

初級爬蟲(一) requests模組實現網頁批量圖片爬取

爬蟲一式—— Jsoup

python爬蟲(一)簡介

我用爬蟲一天時間“偷了”知乎一百萬使用者，只為證明PHP是世界上最好的語言

Python爬蟲一步步抓取房產資訊

爬蟲一

相關推薦