xpath的使用：定位，獲取文本和屬性值

阿新 • • 發佈：2018-10-09

world src @class foo posit on() .text value oot

myPage = ‘‘‘<html>
<title>TITLE</title>
<body>
<h1></h1>
<div></div>
<div id="photos">
<img src="pic1.jpeg"/><span id="pic1">*</span>
<img src="pic2.jpeg"/><span id="pic2">****</span>

<p><a href="http://www.example.com/more_pic.html">*</a></p>
<a href="http://www.baidu.com">****</a>
<a href="http://www.163.com">*****</a>
<a href="http://www.sohu.com">****</a>
</div>
<p class="myclassname">Hello,\nworld!<br/>-- by Adam</p>

<div class="foot">放在尾部的其他一些說明</div>
</body>
</html>‘‘‘

html = etree.fromstring(myPage)

#一、定位
divs1 = html.xpath(‘//div‘)
divs2 = html.xpath(‘//div[@id]‘)
divs3 = html.xpath(‘//div[@class="foot"]‘)
divs4 = html.xpath(‘//div[@]‘)
divs5 = html.xpath(‘//div[1]‘)
divs6 = html.xpath(‘//div[last()-1]‘)

divs7 = html.xpath(‘//div[position()<3]‘)
divs8 = html.xpath(‘//div|//h1‘)
divs9 = html.xpath(‘//div[not(@)]‘)

二、取文本 text() 區別 html.xpath(‘string()‘)

text1 = html.xpath(‘//div/text()‘)
text2 = html.xpath(‘//div[@id]/text()‘)
text3 = html.xpath(‘//div[@class="foot"]/text()‘)
text4 = html.xpath(‘//div[@*]/text()‘)
text5 = html.xpath(‘//div[1]/text()‘)
text6 = html.xpath(‘//div[last()-1]/text()‘)
text7 = html.xpath(‘//div[position()<3]/text()‘)
text8 = html.xpath(‘//div/text()|//h1/text()‘)

#三、取屬性 @
value1 = html.xpath(‘//a/@href‘)
value2 = html.xpath(‘//img/@src‘)
value3 = html.xpath(‘//div[2]/span/@id‘)

#四、定位（進階）
#1.文檔(DOM)元素(Element)的find，findall方法
divs = html.xpath(‘//div[position()<3]‘)
for div in divs:
ass = div.findall(‘a‘) # 這裏只能找到:div->a, 找不到:div->p->a
for a in ass:
if a is not None:
#print(dir(a))
print(a.text, a.attrib.get(‘href‘)) #文檔(DOM)元素(Element)的屬性：text, attrib

2.與1等價

a_href = html.xpath(‘//div[position()<3]/a/@href‘)
print(a_href)

#3.註意與1、2的區別
a_href = html.xpath(‘//div[position()<3]//a/@href‘)
print(a_href)

參考：https://www.cnblogs.com/hhh5460/p/5079465.html

xpath的使用：定位，獲取文本和屬性值

world src @class foo posit on() .text value oot myPage = ‘‘‘<html><title>TITLE</title><body><h1></h1>

xpath的使用：定位，獲取文本和屬性值

二、取文本 text() 區別 html.xpath(‘string()‘)

2.與1等價

xpath的使用：定位，獲取文本和屬性值

Java自學之路-Java中級教程-5：Spring元件物件註解@Component和屬性值註解@Value

Linux：使用awk命令獲取文本的某一行，某一列

jQuery的DOM操作之設置和獲取HTML、文本和值 html（）text（）val（）

JS nodeValue屬性和 innerText屬性獲取文本

阿裏雲發布黑科技：面對海量的文本翻譯任務，阿裏翻譯團隊是如何解決的

geoip+php演示樣例：通過ip，獲取國家名稱和程式碼

【記錄】selenium+xpath 文字資訊定位，獲取父節點屬性

讀取文本信息，拆分文本信息，根據拆分的文本信息保存在字典中

bootstrap-導航條中的按鈕、文本和鏈接

jQuery獲取文本節點之 text()/val()/html() 方法區別

獲取文本內容

獲取文本中你須要的字段的幾個命令 grep awk cut tr sed

WPF編程：textbox控件文本框數據顯示最後一行

KindEditor獲取多個textarea文本框的值並判斷非空

C# 如何添加文本和圖片超鏈接到Word

利用Python的 counter內置函數，統計文本中的單詞數量

canvas學習（一）：線條，圖像變換和狀態保存

TextBlock 重寫，當文本過長時，自動截斷文本並出現Tooltip

C# 提取PPT文本和圖片的實現方案

xpath的使用：定位，獲取文本和屬性值

二、取文本 text() 區別 html.xpath(‘string()‘)

2.與1等價

相關推薦