Python 解析簡單的XML資料

阿新 • • 發佈：2020-07-28

問題

你想從一個簡單的XML文件中提取資料。

解決方案

可以使用 xml.etree.ElementTree 模組從簡單的XML文件中提取資料。為了演示，假設你想解析Planet Python上的RSS源。下面是相應的程式碼：

from urllib.request import urlopen
from xml.etree.ElementTree import parse

# Download the RSS feed and parse it
u = urlopen('http://planet.python.org/rss20.xml')
doc = parse(u)

# Extract and output tags of interest
for item in doc.iterfind('channel/item'):
  title = item.findtext('title')
  date = item.findtext('pubDate')
  link = item.findtext('link')

  print(title)
  print(date)
  print(link)
  print()

執行上面的程式碼，輸出結果類似這樣：

Steve Holden: Python for Data Analysis
Mon,19 Nov 2012 02:13:51 +0000
http://holdenweb.blogspot.com/2012/11/python-for-data-analysis.html

Vasudev Ram: The Python Data model (for v2 and v3)
Sun,18 Nov 2012 22:06:47 +0000
http://jugad2.blogspot.com/2012/11/the-python-data-model.html

Python Diary: Been playing around with Object Databases

Sun,18 Nov 2012 20:40:29 +0000
http://www.pythondiary.com/blog/Nov.18,2012/been-...-object-databases.html

Vasudev Ram: Wakari,Scientific Python in the cloud
Sun,18 Nov 2012 20:19:41 +0000
http://jugad2.blogspot.com/2012/11/wakari-scientific-python-in-cloud.html

Jesse Jiryu Davis: Toro: synchronization primitives for Tornado coroutines

Sun,18 Nov 2012 20:17:49 +0000
http://feedproxy.google.com/~r/EmptysquarePython/~3/_DOZT2Kd0hQ/

很顯然，如果你想做進一步的處理，你需要替換 print() 語句來完成其他有趣的事。

討論

在很多應用程式中處理XML編碼格式的資料是很常見的。不僅是因為XML在Internet上面已經被廣泛應用於資料交換，同時它也是一種儲存應用程式資料的常用格式(比如字處理，音樂庫等)。接下來的討論會先假定讀者已經對XML基礎比較熟悉了。

在很多情況下，當使用XML來僅僅儲存資料的時候，對應的文件結構非常緊湊並且直觀。例如，上面例子中的RSS訂閱源類似於下面的格式：

<?xml version="1.0"?>
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Planet Python</title>
    <link>http://planet.python.org/</link>
    <language>en</language>
    <description>Planet Python - http://planet.python.org/</description>
    <item>
      <title>Steve Holden: Python for Data Analysis</title>
      <guid>http://holdenweb.blogspot.com/...-data-analysis.html</guid>
      <link>http://holdenweb.blogspot.com/...-data-analysis.html</link>
      <description>...</description>
      <pubDate>Mon,19 Nov 2012 02:13:51 +0000</pubDate>
    </item>
    <item>
      <title>Vasudev Ram: The Python Data model (for v2 and v3)</title>
      <guid>http://jugad2.blogspot.com/...-data-model.html</guid>
      <link>http://jugad2.blogspot.com/...-data-model.html</link>
      <description>...</description>
      <pubDate>Sun,18 Nov 2012 22:06:47 +0000</pubDate>
    </item>
    <item>
      <title>Python Diary: Been playing around with Object Databases</title>
      <guid>http://www.pythondiary.com/...-object-databases.html</guid>
      <link>http://www.pythondiary.com/...-object-databases.html</link>
      <description>...</description>
      <pubDate>Sun,18 Nov 2012 20:40:29 +0000</pubDate>
    </item>
    ...
  </channel>
</rss>

xml.etree.ElementTree.parse() 函式解析整個XML文件並將其轉換成一個文件物件。然後，你就能使用 find() 、iterfind() 和 findtext() 等方法來搜尋特定的XML元素了。這些函式的引數就是某個指定的標籤名，例如 channel/item 或 title 。每次指定某個標籤時，你需要遍歷整個文件結構。每次搜尋操作會從一個起始元素開始進行。同樣，每次操作所指定的標籤名也是起始元素的相對路徑。例如，執行 doc.iterfind('channel/item') 來搜尋所有在 channel 元素下面的 item 元素。 doc 代表文件的最頂層(也就是第一級的 rss 元素)。然後接下來的呼叫 item.findtext() 會從已找到的 item 元素位置開始搜尋。 ElementTree 模組中的每個元素有一些重要的屬性和方法，在解析的時候非常有用。 tag 屬性包含了標籤的名字，text 屬性包含了內部的文字，而 get() 方法能獲取屬性值。例如：

>>> doc
<xml.etree.ElementTree.ElementTree object at 0x101339510>
>>> e = doc.find('channel/title')
>>> e
<Element 'title' at 0x10135b310>
>>> e.tag
'title'
>>> e.text
'Planet Python'
>>> e.get('some_attribute')
>>>

有一點要強調的是 xml.etree.ElementTree 並不是XML解析的唯一方法。對於更高階的應用程式，你需要考慮使用 lxml 。它使用了和ElementTree同樣的程式設計介面，因此上面的例子同樣也適用於lxml。你只需要將剛開始的import語句換成 from lxml.etree import parse 就行了。lxml 完全遵循XML標準，並且速度也非常快，同時還支援驗證，XSLT和XPath等特性。

以上就是Python 解析簡單的XML資料的詳細內容，更多關於Python 解析XML的資料請關注我們其它相關文章！

Python 解析簡單的XML資料

Python 解析簡單的XML資料

python 動態遷移solr資料過程解析

Python解析多幀dicom資料詳解

python解析xml檔案方式(解析、更新、寫入)

Python 讀取xml資料,cv2裁剪圖片例項

Python Scrapy多頁資料爬取實現過程解析

Python如何對XML 解析

獲取python執行輸出的資料並解析存為dataFrame例項

python 解析xml檔案

Python 解析含有名稱空間(xmlns)的xml檔案(基於ElementTree)

Java 解析XML資料的4種方式

Python 解析xml檔案的示例

Python爬取豆瓣資料實現過程解析

python實現簡單的神經網路_mnist資料集神經網路實現

Eureka訪問返回Json資料被解析成XML

用Python實現一個簡單的資料處理

用Python解析XML的幾種常見方法的介紹

Python爬蟲實戰，openpyxl模組學習，爬取房價資訊並簡單的資料分析

演算法與資料結構——用Python實現簡單單鏈表

解析文件中的XML資料轉成map

Python 解析簡單的XML資料

相關推薦