CSS/Xpath 選擇器 第幾個子節點/父節點/兄弟節點
阿新 • • 發佈:2018-05-03
trac from 兄弟節點 -o () pat 下一個 style 計數
0.參考
1.初始化
In [325]: from scrapy import Selector In [326]: text=""" ...: <div> ...: <a>1a</a> ...: <p>2p</p> ...: <p>3p</p> ...: </div>""" In [327]: sel=Selector(text=text) In [328]: print(sel.extract()) <html><body><div> <a>1a</a> <p>2p</p> <p>3p</p> </div></body></html>
2.父節點/上一個下一個兄弟節點
In [329]: sel.xpath(‘//a/parent::*/p‘).extract() Out[329]: [‘<p>2p</p>‘, ‘<p>3p</p>‘] In [330]: sel.xpath(‘//p/preceding-sibling::a‘).extract() Out[330]: [‘<a>1a</a>‘] In [331]: sel.xpath(‘//a/following-sibling::p‘).extract() Out[331]: [‘<p>2p</p>‘, ‘<p>3p</p>‘]
3.CSS 第幾個子節點
3.1 通用
#完整子節點列表,從第一個子節點開始計數,並且滿足子節點tag限定 In [332]: sel.css(‘a:nth-child(1)‘).extract() Out[332]: [‘<a>1a</a>‘] #完整子節點列表,從最後一個子節點開始計數,並且滿足子節點tag限定 In [333]: sel.css(‘a:nth-last-child(1)‘).extract() Out[333]: [] In [334]: sel.css(‘p:nth-child(1)‘).extract() Out[334]: [] In [335]: sel.css(‘p:nth-child(2)‘).extract() Out[335]: [‘<p>2p</p>‘] In [336]: sel.css(‘p:nth-child(3)‘).extract() Out[336]: [‘<p>3p</p>‘] In [337]: sel.css(‘p:nth-last-child(1)‘).extract() Out[337]: [‘<p>3p</p>‘] In [338]: sel.css(‘p:nth-last-child(2)‘).extract() Out[338]: [‘<p>2p</p>‘] In [339]: sel.css(‘p:nth-last-child(3)‘).extract() Out[339]: []
3.2 特別指代
In [340]: sel.css(‘a:first-child‘).extract() Out[340]: [‘<a>1a</a>‘] In [341]: sel.css(‘a:last-child‘).extract() Out[341]: [] In [342]: sel.css(‘p:first-child‘).extract() Out[342]: [] In [343]: sel.css(‘p:last-child‘).extract() Out[343]: [‘<p>3p</p>‘]
3.3 上述 -child 修改為 -of-type ,僅對 過濾後的相應子節點列表 進行計數
4.Xpath 第幾個子節點
In [344]: sel.xpath(‘//div‘).extract() Out[344]: [‘<div>\n <a>1a</a>\n <p>2p</p>\n <p>3p</p>\n</div>‘] In [345]: sel.xpath(‘//div/*‘).extract() Out[345]: [‘<a>1a</a>‘, ‘<p>2p</p>‘, ‘<p>3p</p>‘] In [346]: sel.xpath(‘//div/node()‘).extract() Out[346]: [‘\n ‘, ‘<a>1a</a>‘, ‘\n ‘, ‘<p>2p</p>‘, ‘\n ‘, ‘<p>3p</p>‘, ‘\n‘] In [347]: sel.xpath(‘//div/a‘).extract() Out[347]: [‘<a>1a</a>‘] In [348]: sel.xpath(‘//div/p‘).extract() Out[348]: [‘<p>2p</p>‘, ‘<p>3p</p>‘] In [349]: In [349]: sel.xpath(‘//div/a[1]‘).extract() Out[349]: [‘<a>1a</a>‘] In [350]: sel.xpath(‘//div/a[last()]‘).extract() Out[350]: [‘<a>1a</a>‘] In [351]: In [351]: sel.xpath(‘//div/p[1]‘).extract() #相當於過濾後的子節點列表 Out[351]: [‘<p>2p</p>‘] In [352]: sel.xpath(‘//div/p[last()]‘).extract() Out[352]: [‘<p>3p</p>‘] In [353]: sel.xpath(‘//div/p[last()-1]‘).extract() Out[353]: [‘<p>2p</p>‘] In [354]: In [354]: sel.xpath(‘//div/*[1]‘).extract() #完整子節點列表 Out[354]: [‘<a>1a</a>‘] In [355]: sel.xpath(‘//div/*[last()]‘).extract() Out[355]: [‘<p>3p</p>‘] In [356]: In [356]: sel.xpath(‘//div/node()[1]‘).extract() #包括純文本 Out[356]: [‘\n ‘] In [357]: sel.xpath(‘//div/node()[last()]‘).extract() Out[357]: [‘\n‘]
CSS/Xpath 選擇器 第幾個子節點/父節點/兄弟節點