1. 程式人生 > 其它 >xml檔案_解析示例_python實現

xml檔案_解析示例_python實現

注:文中程式碼的目的為解析xml 檔案中的內容,輸出xml 檔案中的資料。

最原始的目的是為了統計資料集中圖片標籤的類別數。

xml檔案內容展示

<annotation>
  <folder>images</folder>
  <filename>Czech_000022.jpg</filename>
  <size>
    <depth>3</depth>
    <width>600</width>
    <height>600</height>
  </size>
  <object>
    <name>D00</name>
    <bndbox>
      <xmin>182</xmin>
      <ymin>471</ymin>
      <xmax>229</xmax>
      <ymax>512</ymax>
    </bndbox>
  </object>
</annotation>

讀取並解析 xml 檔案的 python 程式碼

 1 import os
 2 import xml.etree.ElementTree as ET
 3 
 4 def fun(xml_root_path, txt_save_path):
 5     xml_files = os.listdir(xml_root_path)
 6     print(type(xml_files), len(xml_files), xml_files[0])
 7 
 8     for xml_file in xml_files:
 9         xml_path = os.path.join(xml_root_path, xml_file)
10 print('xml路徑:', xml_path) 11 12 ### 解析xml 13 tree = ET.parse(xml_path) 14 root = tree.getroot() #獲取根結點 15 print(type(root), len(root), root, '\n') 16 17 # 子標籤:資料夾名稱 18 folder_name = root[0].text 19 print('folder_name: \t', folder_name)
20 21 # 子標籤:檔名稱 22 file_name = root[1].text 23 print('file_name: \t', file_name) 24 25 # 子標籤: 影象尺寸 26 size_name = root[2] 27 # print( len(size_name) ) # 3 28 size_depth = size_name[0].text 29 print('size_depth: \t', size_depth) 30 size_width = size_name[1].text 31 print('size_width: \t', size_width) 32 size_height = size_name[2].text 33 print('size_height: \t', size_height) 34 35 # 子標籤: 標記框資訊 36 object_name = root[3] 37 # print(len(object_name)) # 2 38 label_name = object_name[0].text 39 print('label_name:\t', label_name) ####### 標籤名稱,主要統計的是這個 40 # bbox 41 bndbox = object_name[1] 42 # print(len(bndbox)) # 4 43 bbox = [bndbox[0].text, bndbox[1].text, bndbox[2].text, bndbox[3].text ] 44 print(bbox) 45 46 47 temp_path = '../data/JanpanRoad' 48 txt_save_path = './temp.txt' 49 50 fun(temp_path, txt_save_path)

輸出結果

 1 <class 'list'> 1 Czech_000022.xml
 2 xml路徑: ../data/JanpanRoad\Czech_000022.xml
 3 <class 'xml.etree.ElementTree.Element'> 4 <Element 'annotation' at 0x00000281DA735368> 
 4 
 5 folder_name:      images
 6 file_name:      Czech_000022.jpg
 7 size_depth:      3
 8 size_width:      600
 9 size_height:      600
10 label_name:     D00
11 ['182', '471', '229', '512']