簡介

XML是被設計用來傳輸和儲存資料的可擴充套件標記語言，Python中可以使用xml.etree.ElementTree操作XML檔案。
Python 2.7採用ElementTree 1.3版本。xml.etree.ElementTree的更高效、佔用記憶體更少的C語言版本的實現是xml.etree.cElementTree。

ElementTree與Elment

XML按照樹型結構組織，ET有兩種類來表示這種組織結構：ElementTree表示整個XML樹，Element表示樹上的單個節點。操作整個XML文件時使用ElementTree類，比如讀寫XML檔案。操作XML元素及其子元素時使用Element類。Element是一個靈活的容器物件，在記憶體中儲存層次化的資料結構。

每個元素包含如下屬性：

tag：表示元素型別的字串。
attrib：以Python字典形式儲存的元素屬性。
text：開始標籤和第一個子元素之間的字串或開始標籤和結束標籤之間的字串或None。
tail：結束標籤和下一個標籤之間的字串或None。
子元素：以Python序列的形式儲存。

建立Element元素，使用Element()構造器或SubElement()工廠函式。

解析XML

比如test.xml內容為：

<?xml version="1.0"?>
<bookstore author="mars loo">
    <book>
        <name> 
LeaderF</name>
        <price>12.33</price>
    </book>
    <book>
        <name>YCM</name>
        <price>11.91</price>
    </book>
</bookstore>

可以呼叫ET.parse(filename)將XML文件解析為ElementTree物件，呼叫ElementTree.getroot()獲取Element型別的樹根，然後分別訪問樹根元素的tag、attrib、text、tail等屬性：

try:
    import xml.etree.cElementTree as ET
except ImportError:
    import xml.etree.ElementTree as ET

tree = ET.parse('test.xml')
root = tree.getroot()
print "Tag:", root.tag, "Attributes:", root.attrib, "Text:", root.text.strip(), "Tail:", root.tail

執行結果：

Tag: bookstore Attributes: {'author': 'mars loo'} Text:  Tail: None

注意，如果test.xml檔案的第2行是如下內容：

<bookstore author="mars loo"><book>

則上面的程式碼中root.text的型別是None。

使用tree = ET.ElementTree(file='test.xml')與使用tree = ET.parse('test.xml')效果一樣。如果使用ET.fromstring()方法解析，則直接返回Element型別的樹根元素。如果一個Element物件有子元素，可以直接遍歷，也可以使用下標訪問：

try:
    import xml.etree.cElementTree as ET
except:
    import xml.etree.ElementTree as ET

xml_string = '''<?xml version="1.0"?>
<bookstore author="mars loo">
    <book id="1">
        <name>LeaderF</name>
        <price>12.33</price>
    </book>
    <book id="2">
        <name>YCM</name>
        <price>11.91</price>
    </book>
</bookstore>'''

root = ET.fromstring(xml_string)
for child in root:
    print "Tag:", child.tag, "Attributes:", child.attrib
print "Book 2's price", root[1][1].text.strip()

執行結果：

Tag: book Attributes: {'id': '1'}
Tag: book Attributes: {'id': '2'}
Book 2's price 11.91

遍歷元素

Element類和ElementTree類都有iter()方法可以遞迴遍歷元素/樹的所有子元素，比如：

for child in root.iter(): //root是Element物件
    print "Tag:", child.tag, "Attributes:", child.attrib, "Text:", child.text.strip()

使用tag引數可以遍歷指定tag的元素：

for child in tree.iter(tag='price'): //tree是ElementTree物件
     print "Tag:", child.tag, "Attributes:", child.attrib, "Text:", child.text.strip()

Element.findall(match)方法通過tag名字或xpath匹配第一層子元素，按照子元素順序以列表形式返回所有匹配的元素。
Element.find(match)方法通過tag名字或xpath在第一層子元素中查詢第一個匹配的元素，返回匹配的元素或None。
Element.get(key, default=None)返回元素名字為key的屬性值，如果沒有找到，返回None或設定的預設值。

修改XML檔案

ElementTree.write(filename)方法可以方便的將ElementTree物件寫入XML檔案。
可以通過呼叫Element物件不同的方法修改Element物件，比如：

Element.text=value可以直接修改其text屬性。
Element.tail=value可以直接修改其tail屬性。
Element.set(key, vlaue)可以新增新的attrib。
Element.append(subelement)可以新增新的子元素。
Element.extend(subelements)新增子元素的列表（引數型別是序列）。
Element.remove(subelement)可以刪除子元素。

將所有圖書的價格加1，給price元素增加一個updated="true"的屬性，同時增加一個內容為"tail"的tail：

tree = ET.parse('test.xml')
for child in tree.iter(tag = 'price'):
    child.text = str(float(child.text.strip()) + 1)
    child.tail = "tail"
    child.set("updated", "true")

tree.write('new.xml')

new.xml內容為：

<bookstore author="mars loo">
    <book>
        <name>LeaderF</name>
        <price updated="true">13.33</price>tail</book>
    <book>
        <name>YCM</name>
        <price updated="true">12.91</price>tail</book>
</bookstore>

將所有圖書元素的price元素刪除，增加出版社press元素，press元素的text屬性內容是CTS：

bookstore = ET.parse('test.xml')
for book in bookstore.findall('book'):
    book.remove(book.find('price'))
    press = ET.Element('press')
    press.text = "CTS"
    book.append(press)
bookstore.write('new2.xml')

new2.xml內容為：

<bookstore author="mars loo">
    <book>
        <name>LeaderF</name>
        <press>CTS</press></book>
    <book>
        <name>YCM</name>
        <press>CTS</press></book>
</bookstore>

使用ET.SubElement(parent, tag_name)可以快速建立子元素關係，使用ET.dump(elem)可以輸出elem的內容到標準輸出（elem可以是一個Element物件或ElementTree物件）：

root = ET.Element('root')
a = ET.SubElement(root, 'a')
b = ET.SubElement(root, 'b')
c = ET.SubElement(root, 'c')
tree = ET.ElementTree(root)

ET.dump(tree)

執行輸出：

<root><a /><b /><c /></root>

使用iterparse處理大檔案

ET.parse(filename)一次性將整個XML檔案載入到記憶體，ET.iterparse(filename)採用增量形式載入XML資料，佔據更小的記憶體空間。如果test.xml包含非常多本書我們想統計書本數量，可以採用iterparse()高效處理：

count = 0
for event, elem in ET.iterparse('test.xml'):
    if event == 'end':
        if elem.tag == "book":
            count += 1
    elem.clear() //重置元素（刪除所有子元素、清除所有attrib、將text和tail設定為None）

print count

出處：https://blog.csdn.net/a464057216/article/details/54915241

-----------------------------------------------------------------------------------------------------------------

概述

對比其他 Python 處理 XML 的方案，xml.etree.ElementTree 模組（下文我們以 ET 來表示）相對來說比較簡單，介面也較友好。
官方文件裡面對 ET 模組進行了較為詳細的描述，總的來說，ET 模組可以歸納為三個部分：ElementTree類，Element類以及一些操作 XML 的函式。
XML 可以看成是一種樹狀結構，ET 使用ElementTree類來表示整個 XML 文件，使用Element類來表示 XML 的一個結點。對整 XML 文件的操作一般是對ElementTree物件進行，而對 XML 結點的操作一般是對Element物件進行。

解析 XML 檔案

ET 模組支援從一個 XML 檔案構造ElementTree物件，例如我們的 XML 檔案example.xml內容如下（下文會繼續使用這個 XML 文件）：

<?xml version="1.0" encoding="utf-8"?>
<data>
    <country name="Liechtenstein">
        <rank>1</rank>
        <year>2008</year>
        <gdppc>141100</gdppc>
        <neighbor name="Austria" direction="E"/>
        <neighbor name="Switzerland" direction="W"/>
    </country>
    <country name="Singapore">
        <rank>4</rank>
        <year>2011</year>
        <gdppc>59900</gdppc>
        <neighbor name="Malaysia" direction="N"/>
    </country>
</data>

可以使用 ET 模組的parse()函式來從指定的 XML 檔案構造一個ElementTree物件：

import xml.etree.ElementTree as ET

# 獲取 XML 文件物件 ElementTree
tree = ET.parse('example.xml')
# 獲取 XML 文件物件的根結點 Element
root = tree.getroot()
# 列印根結點的名稱
print root.tag

從 XML 檔案構造好ElementTree物件後，還可以獲取其結點，或者再繼續對結點進行進一步的操作。

解析 XML 字串

ET 模組的fromstring()函式提供從 XML 字串構造一個Element物件的功能。

xml_str = ET.tostring(root)
print xml_str
root = ET.fromstring(xml_str)
print root.tag

接著上面的程式碼，我們使用 ET 模組的tostring()函式來將上面我們構造的root物件轉化為字串，然後使用fromstring()函式重新構造一個Element物件，並賦值給root變數，這時root代表整個 XML 文件的根結點。

構造 XML

如果我們需要構造 XML 文件，可以使用 ET 模組的 Element類以及SubElement()函式。
可以使用Element類來生成一個Element物件作為根結點，然後使用ET.SubElement()函式生成子結點。

a = ET.Element('a')
b = ET.SubElement(a, 'b')
b.text = 'leehao.me'
c = ET.SubElement(a, 'c')
c.attrib['greeting'] = 'hello'
d = ET.SubElement(a, 'd')
d.text = 'www.leehao.me'
xml_str = ET.tostring(a, encoding='UTF-8')
print xml_str

輸出：

<?xml version='1.0' encoding='UTF-8'?>
<a><b>leehao.me</b><c greeting="hello" /><d>www.leehao.me</d></a>

如果需要輸出到檔案中，可以繼續使用ElementTree.write()方法來處理：

# 先構造一個 ElementTree 以便使用其 write 方法
tree = ET.ElementTree(a)
tree.write('a.xml', encoding='UTF-8')

執行後，便會生成一個 XML 檔案a.xml:

<?xml version='1.0' encoding='UTF-8'?>
<a><b>leehao.me</b><c greeting="hello" /><d>www.leehao.me</d></a>

XML 結點的查詢與更新

1. 查詢 XML 結點

Element類提供了Element.iter()方法來查詢指定的結點。Element.iter()會遞迴查詢所有的子結點，以便查詢到所有符合條件的結點。

# 獲取 XML 文件物件 ElementTree
tree = ET.parse('example.xml')
# 獲取 XML 文件物件的根結點 Element
root = tree.getroot()
# 遞迴查詢所有的 neighbor 子結點
for neighbor in root.iter('neighbor'):
    print neighbor.attrib

輸出：

{'direction': 'E', 'name': 'Austria'}
{'direction': 'W', 'name': 'Switzerland'}
{'direction': 'N', 'name': 'Malaysia'}

如果使用Element.findall()或者Element.find()方法，則只會從結點的直接子結點中查詢，並不會遞迴查詢。

for country in root.findall('country'):
    rank = country.find('rank').text
    name = country.get('name')
    print name, rank

輸出：

Liechtenstein 1
Singapore 4

2. 更新結點

如果需要更新結點的文字，可以通過直接修改Element.text來實現。如果需要更新結點的屬性，可以通過直接修改Element.attrib來實現。
對結點進行更新後，可以使用ElementTree.write()方法將更新後的 XML 文件寫入檔案中。

# 獲取 XML 文件物件 ElementTree
tree = ET.parse('example.xml')
# 獲取 XML 文件物件的根結點 Element
root = tree.getroot()
for rank in root.iter('rank'):
    new_rank = int(rank.text) + 1
    rank.text = str(new_rank)
    rank.attrib['updated'] = 'yes'
tree.write('output.xml', encoding='UTF-8')

新生成的output.xml檔案以下：

<?xml version='1.0' encoding='UTF-8'?>
<data>
    <country name="Liechtenstein">
        <rank updated="yes">2</rank>
        <year>2008</year>
        <gdppc>141100</gdppc>
        <neighbor direction="E" name="Austria" />
        <neighbor direction="W" name="Switzerland" />
    </country>
    <country name="Singapore">
        <rank updated="yes">5</rank>
        <year>2011</year>
        <gdppc>59900</gdppc>
        <neighbor direction="N" name="Malaysia" />
    </country>
</data>

對比example.xml檔案，可以看到output.xml檔案已更新。

出處：https://blog.csdn.net/lihao21/article/details/72891932

===================================================================

Element型別是一種靈活的容器物件，用於在記憶體中儲存層次資料結構。可以說是list和dictionary的交叉。

每個element都有一系列相關屬性：

標籤，用於標識該元素表示哪種資料（即元素型別）
一些屬性，儲存在Python dictionary中
一個文字字串
一個可選的尾字串
一些孩子elements，儲存在Python sequence中

為了建立一個element例項，使用Element 建構函式或者SubElement()工廠函式。

ElementTree 類可以用來包裹一個element結構，用於與XML進行相互轉換。

一個 C語言實現的可用 API ： xml.etree.cElementTree.

Changed in version 2.7: The ElementTree API is updated to 1.3. For more information, see Introducing ElementTree 1.3.

19.7.1. 綜述

這是關於使用xml.etree.ElementTree (ET)的簡要綜述，目的是演示如何建立block和模組的基本概念。

19.7.1.1. XML 樹和elements

XML is an inherently hierarchical data format, and the most natural way to represent it is with a tree. ET has two classes for this purpose - ElementTree 表示整個XML文件, and Element 表示樹中的一個節點。遍歷整個文件r（讀寫檔案）通常使用 ElementTree 遍歷單獨的節點或者子節點通常使用element 。

19.7.1.2. 解析 XML

我們使用下面的XML文件做為示例:

<?xml version="1.0"?>
<data>
    <country name="Liechtenstein">
        <rank>1</rank>
        <year>2008</year>
        <gdppc>141100</gdppc>
        <neighbor name="Austria" direction="E"/>
        <neighbor name="Switzerland" direction="W"/>
    </country>
    <country name="Singapore">
        <rank>4</rank>
        <year>2011</year>
        <gdppc>59900</gdppc>
        <neighbor name="Malaysia" direction="N"/>
    </country>
    <country name="Panama">
        <rank>68</rank>
        <year>2011</year>
        <gdppc>13600</gdppc>
        <neighbor name="Costa Rica" direction="W"/>
        <neighbor name="Colombia" direction="E"/>
    </country>
</data>

我們有多種方法匯入資料。

從硬碟檔案匯入：

import xml.etree.ElementTree as ET
tree = ET.parse('country_data.xml')
root = tree.getroot()

通過字串匯入：

root = ET.fromstring(country_data_as_string)

fromstring() 解析XML時直接將字串轉換為一個 Element，解析樹的根節點。其他的解析函式會建立一個 ElementTree。一個Element, 根節點 有一個tag以及一些列屬性（儲存在dictionary中）

>>> root.tag
'data'
>>> root.attrib
{}

有一些列孩子節點可供遍歷：

>>> for child in root:
...   print child.tag, child.attrib
...
country {'name': 'Liechtenstein'}
country {'name': 'Singapore'}
country {'name': 'Panama'}

孩子節點是巢狀的，我們可以通過索引訪問特定的孩子節點。

>>> root[0][1].text
'2008'

19.7.1.3. 查詢感興趣的element

>>> for neighbor in root.iter('neighbor'):
...   print neighbor.attrib
...
{'name': 'Austria', 'direction': 'E'}
{'name': 'Switzerland', 'direction': 'W'}
{'name': 'Malaysia', 'direction': 'N'}
{'name': 'Costa Rica', 'direction': 'W'}
{'name': 'Colombia', 'direction': 'E'}

>>> for country in root
              
           
              
              
            
            相關推薦
			   
            
            
            
 

    

    
    python -- xml.etree.ElementTree
      
                簡介XML是被設計用來傳輸和儲存資料的可擴充套件標記語言，Python中可以使用xml.etree.ElementTree操作XML檔案。 Python 2.7採用ElementTree 1.3版本。xml.etree.ElementTree的更高效、佔用記憶體更少的C語言版 

  
 

    

    
    [python 2.x] xml.etree.ElementTree module
      print   creat   imp   system   bool   .py   mark   ati   mit   XML 文件：xmlparse.xml

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTY 

  
 

    

    
    python模塊之xml.etree.ElementTree
      pat   symbol   fun   import   數據   pyhton   hat   print   off   Python有三種方法解析XML，SAX，DOM，以及ElementTree###1.SAX (simple API for XML )       pyhton 標準庫包含SAX解 

  
 

    

    
    [python 學習] 使用 xml.etree.ElementTree 模塊處理 XML
      get   try   country   cost   元素   rar   ges   導入   nbsp   ---恢復內容開始---
導入數據（讀文件和讀字符串）
本地文件 country_data.xml

<?xml version="1.0"?>
<data>
    & 

  
 

    

    
    Python：xml讀寫（xml.etree.ElementTree模組使用）
      
                #!/usr/bin/env python
# coding:UTF-8


"""
@version: python3.x
@author:曹新健
@contact: [email protected]
@software: PyCharm
@file: xml. 

  
 

    

    
    (python)xml.etree.ElemenTree 學習
       
 
 最近需要用到VOC2007格式的資料集，需要自己製作xml檔案。但是網上現有的程式都不能很好的執行，因此自學了一下利用python處理xml，在此記錄一下。本文參考官網的文件，可以自行前往學習。 
 1.xml檔案格式 
 首先，來看一下XML所包含的元素型別 
 1. 標籤 <tag> 

  
 

    

    
    xml.etree.ElementTree問題之xmlns:ns0="http://www.fnfr.com/schemas/parameterTree"
       
  
  
 上週在執行Python指令碼時，發現解析後的XML檔案出現瞭如下BUG： 1、開啟XML檔案，再執行指令碼，一切正常，XML檔案及時更新，需要修改的目標文字也修改成功。 2、執行指令碼，再開啟XML檔案，發現XML檔案的頭部自動合併了，出現了xmlns:ns0=“http://www.fnf 

  
 

    

    
    xml.etree.ElementTree — The ElementTree XML API 中文翻譯
      >>> root[0][1].text
'2008'

1.3. 尋找節點
Element擁有一些方法，用來遞迴遍歷元素。比如：
Element.iter()：
>>> for neighbor in root.iter('neighbor'):
...   print n 

  
 

    

    
    xml.etree.ElementTree模塊
      version   重新   nco   _id   discover   exp   temp   fault   安全   　　xml.etree.ElementTree模塊
Element類型是一種靈活的容器對象，用於在內存中存儲結構化數據。
　　xml.etree.ElementTree模塊在應對 

  
 

    

    
    python XML文件解析：用ElementTree解析XML
      eas   通過   使用場景   exc   try   開始   利用   快的   直接   Python標準庫中，提供了ET的兩種實現。一個是純Python實現的xml.etree.ElementTree，另一個是速度更快的C語言實現xml.etree.cElementTree。請記住始終使用C語言實 

  
 

    

    
    Python--xml模塊
      鍵值   操作   qq在線狀態   包含   asmx   進行   emp   元素   insert   XML是實現不同語言或程序之間進行數據交換的協議,XML文件格式如下
讀xml文件


<data>
    <country name="Liechtenstein">
  

  
 

    

    
    ImportError: No module named etree.ElementTree問題解決方法
      再次   手工   沒有   導致   eba   pan   刪除   error:   ttr     學習python操作xml文檔過程中碰到的ImportError: No module named etree.ElementTree問題，問題現象比較奇怪，做個記錄。
前提條件
  1、創建了一個xm 

  
 

    

    
    Python XML
      tor   交互   寫敏感   反序列化   以及   log   ext   文檔   target   2017-07-24 22:58:16
xml 和 json 都是數據交互的格式。
想象一下，python中的字典數據要傳給java使用，但是眾所周知的是java中只有hashmap數據結構，並沒有字 

  
 

    

    
    python XML實例
      number   存儲文件   spa   當前頁   列表   ima   lan   rfi   編碼格式   案例：使用XPath的爬蟲
現在我們用XPath來做一個簡單的爬蟲，我們嘗試爬取某個貼吧裏的所有帖子，並且將該這個帖子裏每個樓層發布的圖片下載到本地。
# tieba_xpath.py


#! 

  
 

    

    
    Python xml模塊
      alex   1.0   import   back   需要   對象   left   -s   per   


xml模塊



 
 
自己創建xml文檔

import xml.etree.cElementTree as ET
new_xml = ET.Element("personinfolis 

  
 

    

    
    python xml 處理
      rem   enc   aaaaa   imp   name   span   find   root   don   
import xml.etree.ElementTree as ET
‘‘‘
xml處理模塊
‘‘‘
from idlelib.IOBinding import encoding
 
 
 

  
 

    

    
    python xml childNodes，childNodes[1].childNodes[0].data例子
      error   代碼   --   document   是個   OS   ber   this   list集合   xml：
<?xml version=‘1.0‘ encoding=‘utf-8‘?><!--this is a test about xml--><book 

  
 

    

    
    python  xml 模塊
      行數據   dump   多系統   ech   attr   col   import   rec   行業   xml 處理：
xml是實現不同語言或程序之間進行數據交換的協議，
跟json差不多，但json使用起來更簡單，
在json還沒誕生的黑暗年代，大家只能選擇用xml呀，
至今很多傳統公司如金融行 

  
 

    

    
    python-xml模塊
      PC   創建   不同   AR   生成文檔   IT   ext   import   json   xml是實現不同語言或程序之間進行數據交換的協議，跟json差不多，但json使用起來更簡單，不過，古時候，在json還沒誕生的黑暗年代，大家只能選擇用xml呀，至今很多傳統公司如金融行業的很多系統的接 

  
 

    

    
    python:xml模塊用法-xml處理、修改、刪除
      節點   文檔   direction   ear   direct   and   root   odi   pytho   xmltest.xml內容如下：<data><country name="Liechten"><rank updated="