網路爬蟲中CSS選擇器的使用（BeautifulSoup）

阿新 • • 發佈：2018-12-26

我利用CSS選擇器的原因是：我發現CSS選擇器來提取資訊的時候更加方便。。。

怎麼使用：

from bs4 import BeautifulSoup

soup = BeautifulSoup(html, 'html.parser')
soup.select()

BeautifulSoup物件的.select()方法中傳入字串引數，選擇的結果以列表形式返回.

css基本語法

元素選擇器：
    直接選擇文件元素
    比如head，p
類選擇器：
    元素的class屬性，比如<h1 class="important">
    類名就是important
    .important選擇所有有這個類屬性的元素
    可以結合元素選擇器，比如p.important
ID選擇器：
    元素的id屬性，比如<h1 id="intro">
    id就是intro
    #intro用於選擇id=intro的元素
    可以結合元素選擇器，比如p#intro
屬性選擇器：
    選擇有某個屬性的元素，而不論值是什麼。
    *[title]選擇所有包含title屬性的元素
    a[href]選擇所有帶有href屬性的錨元素
    還可以選擇多個屬性，比如：a[href][title]，注意這裡是要同時滿足。
    限定值：a[href="www.so.com"]
後代（包含）選擇器：
    選擇某元素後代的元素（層級不受限制）
    選擇h1元素的em元素：h1 em
子元素選擇器：
    範圍限制在子元素
    選擇h1元素的子元素strong：h1 > strong

例子

test.html

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <title>hjk</title>
</head>

<body>
<p class="title" name="dromouse"><b>The Dormouse's story</b></p>
<p class="story">Once upon a time there were three little sisters; and their names were
    <a href="http://example.com/elsie" title="12" class="sister" id="link1"><!-- Elsie --></a>,
    <a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and
    <a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;
    and they lived at the bottom of a well.</p>
<p class="story">...</p>
</body>
</html>

解析網頁

from bs4 import BeautifulSoup

soup = BeautifulSoup(open('test.html'), 'html.parser')

1.通過元素標籤查詢

print(soup.select('title'))  # 選擇所有的titel標籤
print(soup.select('p'))  # 選擇所有的p標籤
print(soup.select('p')[0])  # 選擇第一個p標籤

#輸出：
[<title>hjk</title>]
[<p class="title" name="dromouse"><b>The Dormouse's story</b></p>, <p class="story">Once upon a time there were three little sisters; and their names were
    <a class="sister" href="http://example.com/elsie" id="link1" title="12"><!-- Elsie --></a>,
    <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a> and
    <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>;
    and they lived at the bottom of a well.</p>, <p class="story">...</p>]
<p class="title" name="dromouse"><b>The Dormouse's story</b></p>

print(soup.select('p a'))  # 尋找p標籤的a標籤
print(soup.select('body a'))  # 尋找body標籤下的a標籤


#輸出
[<a class="sister" href="http://example.com/elsie" id="link1" title="12"><!-- Elsie --></a>, <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>, <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]
[<a class="sister" href="http://example.com/elsie" id="link1" title="12"><!-- Elsie --></a>, <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>, <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]

print(soup.select('body > a'))  # 尋找body標籤下子節點a標籤
print(soup.select('p > #link1'))  # 尋找p標籤子節點中id='link1'的標籤

#輸出
[]
[<a class="sister" href="http://example.com/elsie" id="link1" title="12"><!-- Elsie --></a>]

body > a 找的是子標籤
4

print(soup.select('#link1 ~ .sister'))  # 尋找id='link1'，class='sister'標籤的兄弟標籤
print(soup.select('#link1 + .sister'))  # 尋找id='link1'，class='sister'標籤的下一個兄弟標籤

#輸出
[<a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>, <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]
[<a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>]

2.通過CSS類名查詢

print(soup.select('.sister'))  # 獲得所有class為sister的標籤
print(soup.select('p.title'))  # 獲得P標籤下class類名為title的標籤。

#輸出
[<a class="sister" href="http://example.com/elsie" id="link1" title="12"><!-- Elsie --></a>, <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>, <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]
[<p class="title" name="dromouse"><b>The Dormouse's story</b></p>]

3.通過標籤的id屬性查詢

print(soup.select('#link1'))  # 尋找所有id='link1'的標籤
print(soup.select('#link1,#link2'))  # 尋找所有id為link1或link2的標籤

#輸出
[<a class="sister" href="http://example.com/elsie" id="link1" title="12"><!-- Elsie --></a>]
[<a class="sister" href="http://example.com/elsie" id="link1" title="12"><!-- Elsie --></a>, <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>]

4.通過是否存在某個屬性來查詢

print(soup.select('a[href]'))  # 查詢a標籤下存在herf屬性的標籤

#輸出
[<a class="sister" href="http://example.com/elsie" id="link1" title="12"><!-- Elsie --></a>, <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>, <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]

5.通過屬性的值來查詢

print(soup.select('a[href="http://example.com/elsie"]'))  # 尋找a標籤中href="http://example.com/elsie"的標籤
print(soup.select('a[href^="http://example.com/"]'))  # 尋找href屬性值是以"http://example.com/"開頭的a標籤
print(soup.select('a[href$="tillie"]'))#尋找href屬性值是以tillie為結尾的a標籤
print(soup.select('a[href*=".com/el"]'))#尋找href屬性值中存在字串”.com/el”的標籤a

#輸出
[<a class="sister" href="http://example.com/elsie" id="link1" title="12"><!-- Elsie --></a>]
[<a class="sister" href="http://example.com/elsie" id="link1" title="12"><!-- Elsie --></a>, <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>, <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]
[<a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]
[<a class="sister" href="http://example.com/elsie" id="link1" title="12"><!-- Elsie --></a>]

6.通過標籤逐層查詢

Atag = soup.select('p')[1]
Btag = Atag.select('[title="12"]')
print(Btag)

#輸出
[<a class="sister" href="http://example.com/elsie" id="link1" title="12"><!-- Elsie --></a>]

7.獲取屬性

a=soup.select('p #link2')
print(a[0].attrs['href'])

#輸出
<a class="sister" href="http://example.com/elsie" id="link1" title="12"><!-- Elsie --></a>

8.獲取文字

print(a[0].string)

#輸出
http://example.com/lacie

網路爬蟲中CSS選擇器的使用（BeautifulSoup）

我利用CSS選擇器的原因是：我發現CSS選擇器來提取資訊的時候更加方便。。。怎麼使用： from bs4 import BeautifulSoup soup = BeautifulSoup(html

css選擇器（基礎）

掌握 class 就是 content tle 版本語法 tex 人物 CSS選擇器：一個樣式的語法是由選擇器+屬性+屬性值三部分組成；到底什麽是選擇器呢？答：個人直白的理解為：選擇一種要對它進行操作的標簽的方法就叫做選擇器。也就是說選擇器就

css選擇器（一）基本選擇器

基本選擇器 1、通用元素選擇器 *表示應用到所有的標籤。　　*{ 　　　padding:0px; 　　　margin:0px; 　　} 2、元素/標籤選擇器匹配所有p標籤的元素　　p{ 　　　color:red; 　　　background:yellow; 　　} 3、類選擇器

Html5中的選擇器（Selector）

最近在專案中使用到了Html5這門超文字標記語言，利用空閒時間自己學習了一下H5。現在在移動APP中的開發，H5使用的越來越多，面對這樣的發展趨勢，把自己所知道的H5常用點，記錄下

CSS中各種選擇器（ID、類等）的優先順序

按權重：內聯樣式：權重+1000 ID選擇器：權重+100 類、偽類和屬性選擇器：權重+10 選擇器中的各個元素（即標籤）和偽元素：權重+1 結合符合通配選擇器：權重+0 CSS樣式的表現取權重最

CSS(CSS3)選擇器（1）

cti str 插入 link 規則 padding 不可情況可能這篇文章主要用於存儲CSS以及CSS3的選擇器部分知識，以便日後查閱及記憶. 該內容分為兩部分，第一部分為css選擇器的一些基本知識。第二部分為CSS3新增加的選擇器。在開始之前，先簡單介紹一下選擇器

CSS(CSS3)選擇器（2）

for 字符 tutorials pty disable post input purple enabled 該部分主要為CSS3新增的選擇器接上一篇 CSS（CSS3）選擇器（1）一.通用兄弟選擇器: 24：E ~

CSS選擇器之兄弟選擇器（~和+）

spa 例子 inf 代碼但是 info 發現效果說話　　今天在改以以前人家寫的網頁的樣式的時候，碰到這個選擇器，‘~’，當時我是懵逼的，傻傻分不清 ‘+’ 跟 ‘~’的區別，雖然我知道他們都是

CSS樣式選擇器（1）

class 是按照同類型來歸類HTML的各種元素的，要把某些元素歸為一類，一般會考慮到下面這些因素：它們具有同樣的樣式，比如不管他們的標籤是什麼，h1 或者是 div，但是字型大小都是 16px；它們是一種具有同樣意義的東西，比如我們做網站，一般首頁的 LOGO

CSS——選擇器（包括神器:nth-child）

1、基本的選擇器：通用元素選擇器——*{…} id選擇器(略) 類選擇器(略) 標籤選擇器——a{ color:xxx; } 2、組合型選擇器： A、同一級別多元素選擇器——使用”,”隔開。如：h1,a,div{…}，表示被h1標籤，a標籤及d

CSS 派生選擇器（四）

派生選擇器通過依據元素在其位置的上下文關係來定義樣式，你可以使標記更加簡潔。在 CSS1 中，通過這種方式來應用規則的選擇器被稱為上下文選擇器 (contextual selectors)，這是由於他們依賴於上下文關係來應用或者避免某項規則。在 CSS2 中，它們稱

CSS 選擇器（選擇符）

CSS 選擇器（選擇符）要使用css對HTML頁面中的元素實現一對一，一對多或者多對一的控制，這就需要用到CSS選擇器。HTML頁面中的元素就是通過CSS選擇器進行控制的。關於選擇器每一條css樣式定義由兩部分組成，形式如下： [code] 選擇器{樣式} [/cod

html中css選擇器

<!DOCTYPE html> <html lang="en"> <head> <meta charset="UTF-8"> <title>css選擇器</title> <s

前端手冊之---CSS 選擇器（包括css1-css3）

CSS3 選擇器在 CSS 中，選擇器是一種模式，用於選擇需要新增樣式的元素。 "CSS" 列指示該屬性是在哪個 CSS 版本中定義的。（CSS1、CSS2 還是 CSS3。）選擇器例子例子描述 CSS .intro 選擇 class="intr

Selenium中CSS選擇器與Xpath根據頁面結構定位元素比較

CSS選擇器和Xpath都能通過頁面結構對位元素，以下為採用兩種方式定位相同元素的例子： <body> <div id='index'> <div></div> <div> <a>abc

（3）選擇元素——（4）css選擇器（CSS selectors）

The jQuery library supports nearly all of the selectors included in CSS specifications 1 through 3, as outlined on the World Wide Web Con

《精彩絕倫的CSS》——選擇器（二）為“目標”元素新增樣式（:target）

二、為“目標”元素新增樣式（:target）有時候我們希望指向文件中某一具體片段時，通常會使用到錨點（anchor）來實現，比如跳轉到某一頁面的id為LLL-target的元素： <a hre

CSS3學習系列之選擇器（二）

計算選擇器 sky :focus ddr gree for 指定元素學習 first-child選擇器和last-child選擇器 first-child指定第一個元素。last-child指定最後一個子元素。例如： <!DOCTYPE html>

jQuery UI 日期選擇器（Datepicker）

jquery ui next ext style cti cto log href region 設置JqueryUI DatePicker默認語言為中文 <!doctype html><html lang="en"> <head&g

css3新特性選擇器（補充）

last inpu child 一行標簽 after 第一個 ren 得到 1.選擇p標簽中的第一個字符 p:first-letter{ color:red; font-size:25px; } 2.選擇p標簽中的第一行 p:first-line{ 　　color:red

網路爬蟲中CSS選擇器的使用（BeautifulSoup）

css基本語法

解析網頁

1.通過元素標籤查詢

2.通過CSS類名查詢

3.通過標籤的id屬性查詢

4.通過是否存在某個屬性來查詢

5.通過屬性的值來查詢

6.通過標籤逐層查詢

7.獲取屬性

8.獲取文字

相關推薦