Z 2. CSS選擇器基本語法
阿新 • • 發佈:2018-12-26
CSS選擇器的語法比XPath更簡單一些,但功能不如XPath強大。實際上,當我們呼叫Selector物件的CSS方法時,在其內部會使用Python庫cssselect將CSS選擇器表示式翻譯成XPath表示式,然後呼叫Selector物件的XPATH方法。
表列出了CSS選擇器的一些基本語法
先建立一個HTML文件並構造一個HtmlResponse物件:
>>> from scrapy.selector import Selector
>>> from scrapy.http import HtmlResponse
>>> body = '''
... <html>
... <head>
... <base href='http://example.com/' />
... <title>Example website</title>
... </head>
... <body>
... <div id='images-1' style="width: 1230px;">
... <a href='image1.html'>Name: Image 1 <br/><img src='image1.jpg' /></a>
... <a href='image2.html'>Name: Image 2 <br/><img src='image2.jpg' /></a>
... <a href='image3.html'>Name: Image 3 <br/><img src='image3.jpg' /></a>
... </div>
...
... <div id='images-2' class='small'>
... <a href='image4.html'>Name: Image 4 <br/><img src='image4.jpg' /></a>
... <a href='image5.html'>Name: Image 5 <br/><img src='image5.jpg' /></a>
... </div>
... </body>
... </html>
... '''
>>> response = HtmlResponse(url='http://www.example.com', body=body, encoding='utf8')
● E1 E2:選中E1後代元素中的E2元素。
# div 後代中的img
>>> response.css('div img')
[<Selector xpath='descendant-or-self::div/descendant-or-self::*/img' data='<img src="image1.jpg">'>,
<Selector xpath='descendant-or-self::div/descendant-or-self::*/img' data='<img src="image2.jpg">'>,
<Selector xpath='descendant-or-self::div/descendant-or-self::*/img' data='<img src="image3.jpg">'>,
<Selector xpath='descendant-or-self::div/descendant-or-self::*/img' data='<img src="image4.jpg">'>,
<Selector xpath='descendant-or-self::div/descendant-or-self::*/img' data='<img src="image5.jpg">'>]
● E1>E2:選中E1子元素中的E2元素。
# body 子元素中的div
>>> response.css('body>div')
[<Selector xpath='descendant-or-self::body/div' data='<div id="images-1" style="width: 1230px;'>,
<Selector xpath='descendant-or-self::body/div' data='<div id="images-2" class="small">\n '>]
● [ATTR]:選中包含ATTR屬性的元素。
# 選中包含style屬性的元素
>>> response.css('[style]')
[<Selector xpath='descendant-or-self::*[@style]' data='<div id="images-1" style="width: 1230px;'>]
● [ATTR=VALUE]:選中包含ATTR屬性且值為VALUE的元素。
# 選中屬性id值為images-1的元素
>>> response.css('[id=images-1]')
[<Selector xpath="descendant-or-self::*[@id = 'images-1']" data='<div id="images-1" style="width:
1230px;'>]
● E:nth-child(n):選中E元素,且該元素必須是其父元素的第n個子元素。
# 選中每個div的第一個a
>>> response.css('div>a:nth-child(1)')
[<Selector xpath="descendant-or-self::div/*[name() = 'a' and (position() = 1)]" data='<a
href="image1.html">Name: Image 1 <br>'>,
<Selector xpath="descendant-or-self::div/*[name() = 'a' and (position() = 1)]" data='<a
href="image4.html">Name: Image 4 <br>'>]
# 選中第二個div的第一個a
>>> response.css('div:nth-child(2)>a:nth-child(1)')
[<Selector xpath="descendant-or-self::*/*[name() = 'div' and (position() = 2)]/*[name() = 'a' and
(position() = 1)]" data='<a href="image4.html">Name: Image 4 <br>'>]
● E:first-child:選中E元素,該元素必須是其父元素的第一個子元素。
● E:last-child:選中E元素,該元素必須是其父元素的倒數第一個子元素。
# 選中第一個div的最後一個a
>>> response.css('div:first-child>a:last-child')
[<Selector xpath="descendant-or-self::*/*[name() = 'div' and (position() = 1)]/*[name() = 'a' and
(position() = last())]" data='<a href="image3.html">Name: Image 3 <br>'>]
● E::text:選中E元素的文字節點。
# 選中所有a的文字
>>> sel = response.css('a::text')
>>> sel
[<Selector xpath='descendant-or-self::a/text()' data='Name: Image 1 '>,
<Selector xpath='descendant-or-self::a/text()' data='Name: Image 2 '>,
<Selector xpath='descendant-or-self::a/text()' data='Name: Image 3 '>,
<Selector xpath='descendant-or-self::a/text()' data='Name: Image 4 '>,
<Selector xpath='descendant-or-self::a/text()' data='Name: Image 5 '>]
>>> sel.extract()
['Name: Image 1 ',
'Name: Image 2 ',
'Name: Image 3 ',
'Name: Image 4 ',
'Name: Image 5 ']