BeautifulSoup中find和find_all的使用詳解

阿新 • • 發佈：2020-12-07

爬蟲利器BeautifulSoup中find和find_all的使用方法

二話不說，先上段HTML例子

<html>
  <head>
    <title>
      index
    </title>
  </head>
  <body>
     <div>
        <ul>
           <li id="flask"class="item-0"><a href="link1.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >first item</a></li>
          <li class="item-1"><a href="link2.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >second item</a></li>
          <li class="item-inactie"><a href="link3.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" >third item</a></li>
          <li class="item-1"><a href="link4.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >fourth item</a></li>
          <li class="item-0"><a href="link5.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" >fifth item</a>
         </ul>
     </div>
    <li> hello world </li>
  </body>
</html>

使用BeautifulSoup前需要先構建BeautifulSoup例項

# 構建beautifulsoup例項
soup = BeautifulSoup(html,'lxml')
# 第一個引數是要匹配的內容
# 第二個引數是beautifulsoup要採用的模組，即規則

需要注意的是，匯入對的模組需要事先安裝，此處匯入的LXML事先已經安裝。可以匯入的模組可通過查詢BeautifulSoup的文件檢視

第一次插入圖片，那，我表個白，我超愛我女朋友呼延羿彤~~

接下來是find和find_all的介紹

1. find
只返回第一個匹配到的物件
語法：

find(name,attrs,recursive,text,**wargs)　　　　
# recursive 遞迴的，迴圈的

BeautifulSoup的find方法

引數：

引數名作用

name 查詢標籤

text 查詢文字

attrs 基於attrs引數

引數名	作用
name	查詢標籤
text	查詢文字
attrs	基於attrs引數

例子：

# find查詢一次
li = soup.find('li')
print('find_li:',li)
print('li.text(返回標籤的內容):',li.text)
print('li.attrs(返回標籤的屬性):',li.attrs)
print('li.string(返回標籤內容為字串):',li.string)

執行結果：

find_li: <li class="item-0" id="flask"><a href="link1.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >first item</a></li>
li.text(返回標籤的內容): first item
li.attrs(返回標籤的屬性): {'id': 'flask','class': ['item-0']}
li.string(返回標籤內容為字串): first item

find也可以通過‘屬性=值'的方法進行匹配

li = soup.find(id = 'flask')
print(li,'\n')

<li class="item-0" id="flask"><a href="link1.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >first item</a></li>

需要注意的是，因為class是python的保留關鍵字，若要匹配標籤內class的屬性，需要特殊的方法，有以下兩種：

在attrs屬性用字典的方式進行引數傳遞
BeautifulSoup自帶的特別關鍵字class_

# 第一種:在attrs屬性用字典進行傳遞引數
find_class = soup.find(attrs={'class':'item-1'})
print('findclass:',find_class,'\n')
# 第二種:BeautifulSoup中的特別關鍵字引數class_
beautifulsoup_class_ = soup.find(class_ = 'item-1')
print('BeautifulSoup_class_:',beautifulsoup_class_,'\n')

執行結果

findclass: <li class="item-1"><a href="link2.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >second item</a></li>

BeautifulSoup_class_: <li class="item-1"><a href="link2.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >second item</a></li>

2. find_all

返回所有匹配到的結果，區別於find（find只返回查詢到的第一個結果）

語法：

find_all(name,limit,**kwargs)

BeautifulSoup的find_all方法

引數名作用

name 查詢標籤

text 查詢文字

attrs 基於attrs引數

引數名	作用
name	查詢標籤
text	查詢文字
attrs	基於attrs引數

與find一樣的語法

上程式碼

# find_all 查詢所有
li_all = soup.find_all('li')
for li_all in li_all:
	print('---')
	print('匹配到的li:',li_all)
	print('li的內容:',li_all.text)
	print('li的屬性:',li_all.attrs)

執行結果：

---
匹配到的li: <li class="item-0" id="flask"><a href="link1.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >first item</a></li>
li的內容: first item
li的屬性: {'id': 'flask','class': ['item-0']}
---
匹配到的li: <li class="item-1"><a href="link2.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >second item</a></li>
li的內容: second item
li的屬性: {'class': ['item-1']}
---
匹配到的li: <li cvlass="item-inactie"><a href="link3.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" >third item</a></li>
li的內容: third item
li的屬性: {'cvlass': 'item-inactie'}
---
匹配到的li: <li class="item-1"><a href="link4.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >fourth item</a></li>
li的內容: fourth item
li的屬性: {'class': ['item-1']}
---
匹配到的li: <li class="item-0"><a href="link5.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" >fifth item</a>
</li>
li的內容: fifth item

附上比較靈活的find_all查詢方法：

# 最靈活的使用方式
li_quick = soup.find_all(attrs={'class':'item-1'})
for li_quick in li_quick:
	print('最靈活的查詢方法:',li_quick)

執行結果：

最靈活的查詢方法: <li class="item-1"><a href="link2.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >second item</a></li>
最靈活的查詢方法: <li class="item-1"><a href="link4.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >fourth item</a></li>

完整程式碼：

# coding=utf8
# @Author= CaiJunxuan
# @QQ=469590490
# @Wechat:15916454524

# beautifulsoup

# 匯入beautifulsoup模組
from bs4 import BeautifulSoup

# HTML例子
html = '''
<html>
  <head>
    <title>
      index
    </title>
  </head>
  <body>
     <div>
        <ul>
           <li id="flask"class="item-0"><a href="link1.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >first item</a></li>
          <li class="item-1"><a href="link2.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >second item</a></li>
          <li cvlass="item-inactie"><a href="link3.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" >third item</a></li>
          <li class="item-1"><a href="link4.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >fourth item</a></li>
          <li class="item-0"><a href="link5.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" >fifth item</a>
         </ul>
     </div>
    <li> hello world </li>
  </body>
</html>
'''

# 構建beautifulsoup例項
soup = BeautifulSoup(html,'lxml')
# 第一個引數是要匹配的內容
# 第二個引數是beautifulsoup要採用的模組,即規則
# html.parser是python內建的結構匹配方法，但是效率不如lxml所以不常用
# lxml 採用lxml模組
# html5lib,該模組可以將內容轉換成html5物件
# 若想要以上功能,就需要具備對應的模組，比如使用lxml就要安裝lxml

# 在bs4當中有很多種匹配方法,但常用有兩種:

# find查詢一次
li = soup.find('li')
print('find_li:',li.string)
print(50*'*','\n')

# find可以通過'屬性 = 值'的方法進行select
li = soup.find(id = 'flask')
print(li,'\n')
# 因為class是python的保留關鍵字，所以無法直接查詢class這個關鍵字
# 有兩種方法可以進行class屬性查詢
# 第一種:在attrs屬性用字典進行傳遞引數
find_class = soup.find(attrs={'class':'item-1'})
print('findclass:','\n')

# find_all 查詢所有
li_all = soup.find_all('li')
for li_all in li_all:
	print('---')
	print('匹配到的li:',li_all.attrs)

# 最靈活的使用方式
li_quick = soup.find_all(attrs={'class':'item-1'})
for li_quick in li_quick:
	print('最靈活的查詢方法:',li_quick)

到此這篇關於BeautifulSoup中find和find_all的使用詳解的文章就介紹到這了,更多相關BeautifulSoup find和find_all內容請搜尋我們以前的文章或繼續瀏覽下面的相關文章希望大家以後多多支援我們！

BeautifulSoup中find和find_all的使用詳解

爬蟲利器BeautifulSoup中find和find_all的使用方法二話不說，先上段HTML例子 <html>

10、JS ES6中export和import詳解

JS ES6中export和import詳解 1.Export 模組是獨立的檔案，該檔案內部的所有的變數外部都無法獲取。如果希望獲取某個變數，必須通過export輸出，

Linux系統中inode和block詳解

　　Linux作業系統中，硬碟的最小儲存單位為扇區，每個扇區大小為512位元組。而作業系統在讀取硬碟資料的時候，一次性最小讀取一個塊（block），預設一個block大小為4k，即8個扇區。

JavaScript中BOM和DOM詳解

目錄BOM（瀏覽器物件模型）1. window 獲取瀏覽器c視窗尺寸2. screen 獲取電腦螢幕大小3. window 開啟關閉視窗4. 瀏覽器事件5. location6. history7. navigator 獲取瀏覽器相關資訊8. 彈窗DOM (文件物件模型)DOM 分類

Maven中GroupID 和ArtifactID詳解

解釋： groupId ：the unique identifier of the organization or group that created the project GroupID 是專案組織唯一的識別符號，實際對應JAVA的包的結構，是main目錄裡java的目錄結構。

淺談python中統計計數的幾種方法和Counter詳解

1) 使用字典dict() 迴圈遍歷出一個可迭代物件中的元素,如果字典沒有該元素,那麼就讓該元素作為字典的鍵,並將該鍵賦值為1,如果存在就將該元素對應的值加1.

Python類中方法getitem和getattr詳解

1、getitem 方法使用這個方法最大的印象就是呼叫物件的屬性可以像字典取值一樣使用中括號[\'key\']

對Tensorflow中Device例項的生成和管理詳解

1. 關鍵術語描述 kernel 在神經網路模型中，每個node都定義了自己需要完成的操作，比如要做卷積、矩陣相乘等。

C語言中指標 int p=0;和int p;*p=0;和”&“的關係和區別詳解

初學者在學習C語言的時候，最頭疼的可能就是指標，話不多說。讓我們直接進入正題

redis中的資料結構和編碼詳解

redis中的資料結構和編碼：背景： 1>redis在內部使用redisObject結構體來定義儲存的值物件。

java中functional interface的分類和使用詳解

java 8引入了lambda表示式，lambda表示式實際上表示的就是一個匿名的function。在java 8之前，如果需要使用到匿名function需要new一個類的實現，但是有了lambda表示式之後，一切都變的非常簡介。

C#中的委託和事件詳解

GPS平臺、網站建設、軟體開發、系統運維，找森大網路科技！http://cnsendnet.taobao.com來自森大科技官方部落格http://www.cnsendblog.com/index.php/?p=591

python字串的index和find的區別詳解

1.find函式 find() 方法檢測字串中是否包含子字串 str ，如果指定 beg（開始）和 end（結束）範圍，則檢查是否包含在指定範圍內，如果指定範圍內如果包含指定索引值，返回的是索引值在字串中的起始位置。如果不包含

從go語言中找&和*區別詳解

*和&的區別 :& 是取地址符號,即取得某個變數的地址,如 ; &a*是指標運算子,可以表示一個變數是指標型別,也可以表示一個指標變數所指向的儲存單元,也就是這個地址所儲存的值 . 從程式碼中驗證 :

css中border-sizing屬性詳解和應用

box-sizing用於更改用於計算元素寬度和高度的預設的css盒子模型。它有content-box、border-box和inherit三種取值。inherit指的是從父元素繼承box-sizing表現形式，不再冗贅。

vue中keep-alive、activated的探討和使用詳解

在修改公司的一個專案的時候發現了activated這個東西，一直覺得很疑惑，之前也沒怎麼用過啊！官網的生命週期那也沒說過這東西啊！生命週期不就create mount update 和destory這幾個東東麼，怎麼多了個activate出來。

Javascript中Math.max和Math.max.apply的區別和用法詳解

最近在做一個小案例的時候遇到了Math.max.apply這麼一個用法，之前很少遇到過感覺挺有趣的，就記錄一下。

js中call()和apply()方法的區別和用法詳解

今天又碰到了JacvaScript中的call()和apply()方法，然後看看學學，敲了遍程式碼，才大概對這兩個方法有些瞭解，這篇部落格是對這兩個方法的歸納整理，如果有寫的不夠詳細或者有錯誤的地方歡迎指出。

【JDBC基本概念、快速入門、對JDBC中各個介面和類詳解】

今日內容 1. JDBC基本概念 2. 快速入門 3. 對JDBC中各個介面和類詳解 JDBC： 1. 概念：Java DataBase ConnectivityJava 資料庫連線， Java語言操作資料庫

Numpy中np.random.rand()和np.random.randn() 用法和區別詳解

numpy.random.rand(d0,d1,…,dn)的隨機樣本位於[0,1)中：本函式可以返回一個或一組服從**“0~1”均勻分佈**的隨機樣本值。

BeautifulSoup中find和find_all的使用詳解

相關推薦