[學習筆記]Beautiful Soup語法基本使用

阿新 • • 發佈：2019-02-04

1. Beautiful_Soup語法

Beautiful_Soup語法
find all搜尋的是全部節點，find搜尋的是滿足條件的第一個節點

2.獲取網頁資訊

獲取網頁資訊

思路如下

# <a href = "123.html" class = 'article_link'> Python </a>

# 根據 HTML 網頁字串建立 BeautifulSoup 物件
soup = BeautifulSoup(
    html_doc,  # HTML 文件字串
    'html.parser',  # HTML 解析器
    from_encoding='utf8')  # HTML 文件的編碼 


# 查詢所有標籤為 a 的節點
soup.find_all('a')

# 查詢所有標籤為 a，連結符合 /view/123.html 形式的節點
soup.find_all('a', href='/view/123.htm')
soup.find_all('a', href=re.compile(r'/view/\d+\.htm'))

# 查詢所有標籤為div,class,為 abc, 文字為 Python 的節點
soup.find_all('div', class_='abc', string='Python')

# 得到節點 <a href = '1.html'>Python</a> 


# 獲取查詢到的節點的標籤名稱
node.name

# 獲取查詢到的節點的href 屬性
node['href']

# 獲取查詢到的節點的連結文字
node.get_text()

3.編碼

3.1 材料準備

html_doc = """
<html><head><title>The Dormouse's story</title></head>
<body>
<p class="title"><b>The Dormouse's story</b></p> 


<p class="story">Once upon a time there were three little sisters; and their names were
<a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>,
<a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and
<a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;
and they lived at the bottom of a well.</p>

<p class="story">...</p>
"""

3.2 解析材料資料

執行示例

soup = BeautifulSoup(html_doc, "html.parser", from_encoding='utf8')

print("獲取所有的連結")
links = soup.find_all('a')

for link in links:
    print(link.name, link['href'], link.get_text())

執行結果

獲取所有的連結
a http://example.com/elsie Elsie
a http://example.com/lacie Lacie
a http://example.com/tillie Tillie

獲取單一連結資料

print("獲取 http://example.com/lacie 的連結")
link_node = soup.find('a', href="http://example.com/lacie")
print(link_node.name, link_node['href'], link_node.get_text())

執行示例

獲取 http://example.com/lacie 的連結
a http://example.com/lacie Lacie

使用正則表示式

print("正則表示式")
link_node = soup.find('a', href=re.compile(r'sie'))
print(link_node.name, link_node['href'], link_node.get_text())

執行結果

正則表示式
a http://example.com/elsie Elsie

根據 p 段落 class 的內容

print("根據 p 段落 class 的內容")
# class_ 需要加下劃線
p_node = soup.find('p', class_="title")
print(p_node.name, p_node.get_text())

根據 p 段落 class 的內容
p The Dormouse's story

3.3 完整執行示例

# coding:utf8
from bs4 import BeautifulSoup
import re

html_doc = """
<html><head><title>The Dormouse's story</title></head>
<body>
<p class="title"><b>The Dormouse's story</b></p>

<p class="story">Once upon a time there were three little sisters; and their names were
<a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>,
<a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and
<a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;
and they lived at the bottom of a well.</p>

<p class="story">...</p>
"""


soup = BeautifulSoup(html_doc, "html.parser", from_encoding='utf8')

print("獲取所有的連結")
links = soup.find_all('a')

for link in links:
    print(link.name, link['href'], link.get_text())

print("獲取 http://example.com/lacie 的連結")
link_node = soup.find('a', href="http://example.com/lacie")
print(link_node.name, link_node['href'], link_node.get_text())

print("正則表示式")
link_node = soup.find('a', href=re.compile(r'sie'))
print(link_node.name, link_node['href'], link_node.get_text())

print("根據 p 段落 class 的內容")
# class_ 需要加下劃線
p_node = soup.find('p', class_="title")
print(p_node.name, p_node.get_text())

執行結果

a http://example.com/lacie Lacie
a http://example.com/tillie Tillie
獲取 http://example.com/lacie 的連結
a http://example.com/lacie Lacie
正則表示式
a http://example.com/elsie Elsie
根據 p 段落 class 的內容
p The Dormouse's story

[學習筆記]Beautiful Soup語法基本使用

1. Beautiful_Soup語法 find all搜尋的是全部節點，find搜尋的是滿足條件的第一個節點 2.獲取網頁資訊思路如下 # <a href = "123

ECMAScript 學習筆記（2）--基本類型及語法相關

ECMAScript 學習筆記1、變量名區分大小寫; 註意代碼塊的概念。變量分為原始值和引用值：原始值存儲在棧，引用值存儲在堆（引用值一般是一個指針或對象名、函數名）。在C語言中編譯的程序占用的內存分為以下幾個部分:棧區（由系統分配/內存塊連續）、堆區(由程序員分配與釋放或系統結束後釋放/鏈表結構不連續的內存

Go語言學習筆記（二）基本語法、變數、常量、型別

基本語法 Go語言中的標記 Go程式是由各種標記組成的，標記可以是關鍵字，識別符號，常量，字串文字或符號。例如，以下Go語句由六個標記組成： fmt.Println("Hello, World!") 每個標記單獨表示為： fmt . Println ( "Hello, World

MySql 基礎學習筆記 1——概述與基本數據類型：整型： 1）TINYINT 2)SMALLINT 3) MEDIUMINT 4)INT 5)BIGINT 主要是大小的差別圖浮點型：命令

where float 函數名 src ron 編碼方式永遠 -m mas 一、CMD中經常使用mysql相關命令 mysql -D, --database=name //打開數據庫 --delimiter=name //指定分隔符 -h, --host=na

學習筆記：矩陣的基本運算的實現

for int size data stdin mat 轉置 span font 2017-09-05 21:33:33 writer：pprp 昨天開始就上課了，沒有整天整天的時間去編代碼了，充分抓住每天晚上的時間吧，今天下午預習了一下線性代數中矩陣最基本的運算，今晚就

IDEA 學習筆記之安裝和基本配置

window eclipse 自動 ref size 工作 ips ctr line 安裝和基本配置：下載：https://www.jetbrains.com/idea/download/#section=windows 下載Zip安裝包：基礎知識：

Scala學習筆記：重要語法特性

返回值 contains curried 路徑名重復 continue 路徑冒號語句 1.變量聲明 Scala 有兩種變量， val 和 var val的值聲明後不可變，var可變val msg: String = "Hello yet again, world!"

java學習筆記之基礎語法（二）

讓其實例高效率使用個數存儲記錄棧內存數組 1.數組：概念：同一種類型數據的集合，其實，數組就是一個容器優點：可以方便的對其進行操作，編號從0開始，方便操作這些元素。 2，數組的格式元素類型[]數組名=new 元素類型[數組元素個

ASP.NET MVC 學習筆記-3.Razor語法

OS 及其 wid resources 日期 ext str oca 區分　　Razor語法是一種嵌入在網頁中基於服務器的代碼的編程語法。使用Razor語法的網頁中包括兩個不同類型的內容：客戶端內容和服務器內容。客戶端內容是網頁中常用的內容，比如，HTML標記（元素）、C

ASP.NET MVC 學習筆記-2.Razor語法

包含鏈接完整 rdquo 復雜 per 幫助完成後 ade 1. 表達式表達式必須跟在“@”符號之後， 2. 代碼塊代碼塊必須位於“@{}”中，並且每行代碼必須以“；

Yii2學習筆記1--Yii2的基本介紹和Composer安裝

writing req project 學習 png ive iis too 使用　　Yii 是一個高性能的，以快速，安全，專業著稱，適用於開發 WEB2.0 應用的 PHP 框架。自帶了豐富的功能，包括 MVC，DAO/ActiveRecord，I18N/L10

TCP/IP學習筆記（1）-----基本概念

使用本機ip 分類公司 idt intern xtend pen 不用 TCP/IP的起源　　在全球各地，各種各樣的電腦運行著各自不同的操作系統一起為大家服務，這些電腦在表達同一種信息的時候所使用的方法是千差萬別。就好像聖經中上帝打亂了各地人的口音，讓他們無法合作

軟件工程學習筆記——軟件工程基本原理

基本建議軟件質量數據工具進度劃分控制項目美國著名的軟件工程專家B.W.Boehm於1983年提出了軟件工程的7條基本原理：用分階段的生命周期計劃嚴格管理這條基本原理意味著：應該把軟件生命周期劃分成若幹個階段，並相應地制定出切實可行的計劃，然後嚴格按

tcp/ip學習筆記（1）-基本概念

為什麼會有tcp/ip 在世界上各地，各種各樣的電腦執行著各自不同的作業系統為大家服務，這些電腦在表達同一種資訊的時候所使用的方法是千差萬別。就好像聖經中上帝打亂了各地人的口音，讓他們無法合作一樣。計算機使用者意識到，計算機只是單兵作戰並不會發揮太大的作用。只有把它們聯合起來，電腦才會發

【Objective-C學習筆記】變數和基本的資料型別

OC是增強了C的特性，所以在變數和基本資料型別上基本與C一致。在OC中變數命名有如下規則：由字母、數字、下劃線、$符號組成必須以字母、下劃線、$符號開頭大小寫敏感在OC中定義變數的時候不能使用OC的保留字，OC的保留字如下： OC中有如下基本資料型別： in

VUE 學習筆記三模板語法

1.插值 a.文字資料繫結最常見的形式就是使用“Mustache”語法 (雙大括號) 的文字插值 <span>Message: {{ msg }}</span> v-once 指令，你也能執行一次性地插值，當資料改變時，插值處的內容不會更新 <span

【原創】pygame學習筆記（1）----基本的線，矩形，圓形，弧形繪製

PYgame的內容（1）這個module很有意思（2）書本至少來源於《Python遊戲程式設計入門》（3）官方權威說明：https://www.pygame.org/docs/ 下面的嘗試把各種圖形在一個程式裡繪製注意點：（1）特別注意，比如引

RabbitMQ學習筆記（一）-----------------基本概念知識

python學習筆記程式執行過程基本資料型別

python一切皆物件。列表的元素可以修改，元組的不能修改。 # python2.7 name = 'The world is like a mirror: when you frown at it, it frowns at y

3ds max學習筆記（六）-- 基本操作（建模前奏）

1.介面設定在3ds Max的版本的介面中，預設是較深。若需要切換至較亮的介面，步驟: 執行“自定義”選單，選擇“載入自定義使用者介面方案”從彈出的介面中選擇樣式檔案，單擊“開啟”即可；注：“amg-light.ui” 介面的顯示方式，是淺灰色為主的顯示介面； 2.單位設定在製作效果圖的時候，

[學習筆記]Beautiful Soup語法基本使用

1. Beautiful_Soup語法

2.獲取網頁資訊

3.編碼

3.1 材料準備

3.2 解析材料資料

3.3 完整執行示例

相關推薦