python 包之 lxml 中 etree 標籤解析教程

一、建立標籤

from lxml import etree

root = etree.Element('root')

二、新增子節點

from lxml import etree

root = etree.Element('root')
span = etree.SubElement(root, 'span')

三、刪除子節點

from lxml import etree

root = etree.Element('root')

span = etree.SubElement(root, 'span')
root.remove(span)

四、刪除所有子節點

from lxml import etree

root = etree.Element('root')
root.clear()

五、操作子節點

from lxml import etree

root = etree.Element('root')

span = etree.SubElement(root, 'span')

# 獲取標籤數
len(root)
# 獲取標籤索引號,如果有多個相同標籤的話,可以區分
root.index(span)
# 按位置插入
root.insert(0, etree.Element('p'))
# 尾部新增
root.append(etree.Element('strong'))

六、獲取父節點

獲取標籤父節點的兩種方法

from lxml import etree

root = etree.Element('root')

span = etree.SubElement(root, 'span')

# 獲取父節點方法一
span.getparent().tag
# 獲取父節點方法二,用列表獲取子節點,再獲取父節點
root[0].getparent().tag

七、建立屬性

from lxml import etree

root = etree.Element('root')
root.set('title', '這是一個root標籤')

八、獲取屬性

獲取屬性的三種方法

from lxml import etree

root = etree.Element('root')
# 獲取屬性方法一
root.get('title')
# 獲取屬性方法二,參考字典的操作
root.keys(),root.values(),root.items()
# 獲取屬性方法三,直接拿到屬性存放的字典
root.attrib

九、設定標籤文字

新增文字和追加文字

from lxml import etree

root = etree.Element('root')

# 標籤內新增文字
root.text='i am autofelix'
# 標籤後追加文字
root.tail = 'i am autofelix'

十、xpath方法

from lxml import etree

root = etree.Element('root')

word = root.xpath('//text()')
word[0].getparent().tag

十一、判斷文字型別

from lxml import etree

root = etree.Element('root')

word = root.xpath('//text()')
# 是否是text文字
word.is_text
# 是否是tail文字
word.is_tail

十二、字串解析

from lxml import etree

html = etree.fromstring('<root>autofelix</root>')
html.tag
etree.tostring(html)

十三、XML解析

from lxml import etree

html = etree.XML('<root>autofelix</root>')
html.tag
etree.tostring(html)

十四、去除XML中的空行

from lxml import etree

# 去除xml檔案裡的空行
parser= etree.XMLParser(remove_blank_text=True)
root = etree.XML('<root> auto felix </root>', parser)
print etree.tostring(root)

十五、HTML解析

HTML方法，如果沒有<html>和<body>標籤，會自動補上

from lxml import etree

html = etree.HTML('<root>autofelix</root>')
etree.tostring(html)

十六、搜尋和定位

from lxml import etree

root = etree.XML('<root><a class="uname">i am autofelix<b/><c/><b/></a></root>')
# findall操作返回列表
root.findall('a')[0].text
# find操作就相當與找到了這個元素節點,返回匹配到的第一個元素
root.find('.//a').text
# 配合列表解析
[ b.text for b in root.findall('.//a') ]
# 根據屬性查詢
root.findall('.//a[@class]')[0].tag

python 包之 lxml 中 etree 標籤解析教程

一、建立標籤

二、新增子節點

三、刪除子節點

四、刪除所有子節點

五、操作子節點

六、獲取父節點

七、建立屬性

八、獲取屬性

九、設定標籤文字

十、xpath方法

十一、判斷文字型別

十二、字串解析

十三、XML解析

十四、去除XML中的空行

十五、HTML解析

十六、搜尋和定位

python 包之 lxml 中 etree 標籤解析教程

python 包之 JSON 輕量資料操作教程

python 包之 PyQuery 網頁解析教程

python包之drmaa：叢集任務管理

10-python爬蟲之lxml庫

Python爬蟲之Selenium中frame/iframe表單巢狀頁面

python 包之 xlwt 操作 excel 教程

python 包之 os 系統操作教程

python 包之 PrettyTable 優美表格教程

python 包之 multiprocessing 多程序教程

python 包之 random 隨機數庫教程

python 包之 arrow 日期時間教程

python 包之 csv 文件操作教程

python 包之 re 正則匹配教程

python 包之 mongodb 資料庫操作教程

python 包之 redis 資料庫操作教程

python 包之 pymysql 資料庫操作教程

python 包之 urllib 網路請求教程

python 包之 time 時間管理教程

python 包之 APScheduler 定時任務教程

python 包之 lxml 中 etree 標籤解析教程

一、建立標籤

二、新增子節點

三、刪除子節點

四、刪除所有子節點

五、操作子節點

六、獲取父節點

七、建立屬性

八、獲取屬性

九、設定標籤文字

十、xpath方法

十一、判斷文字型別

十二、字串解析

十三、XML解析

十四、去除XML中的空行

十五、HTML解析

十六、搜尋和定位

相關推薦