xml檔案_解析示例_python實現
阿新 • • 發佈:2021-12-08
注:文中程式碼的目的為解析xml 檔案中的內容,輸出xml 檔案中的資料。
最原始的目的是為了統計資料集中圖片標籤的類別數。
xml檔案內容展示
<annotation> <folder>images</folder> <filename>Czech_000022.jpg</filename> <size> <depth>3</depth> <width>600</width> <height>600</height> </size> <object> <name>D00</name> <bndbox> <xmin>182</xmin> <ymin>471</ymin> <xmax>229</xmax> <ymax>512</ymax> </bndbox> </object> </annotation>
讀取並解析 xml 檔案的 python 程式碼
1 import os 2 import xml.etree.ElementTree as ET 3 4 def fun(xml_root_path, txt_save_path): 5 xml_files = os.listdir(xml_root_path) 6 print(type(xml_files), len(xml_files), xml_files[0]) 7 8 for xml_file in xml_files: 9 xml_path = os.path.join(xml_root_path, xml_file)10 print('xml路徑:', xml_path) 11 12 ### 解析xml 13 tree = ET.parse(xml_path) 14 root = tree.getroot() #獲取根結點 15 print(type(root), len(root), root, '\n') 16 17 # 子標籤:資料夾名稱 18 folder_name = root[0].text 19 print('folder_name: \t', folder_name)20 21 # 子標籤:檔名稱 22 file_name = root[1].text 23 print('file_name: \t', file_name) 24 25 # 子標籤: 影象尺寸 26 size_name = root[2] 27 # print( len(size_name) ) # 3 28 size_depth = size_name[0].text 29 print('size_depth: \t', size_depth) 30 size_width = size_name[1].text 31 print('size_width: \t', size_width) 32 size_height = size_name[2].text 33 print('size_height: \t', size_height) 34 35 # 子標籤: 標記框資訊 36 object_name = root[3] 37 # print(len(object_name)) # 2 38 label_name = object_name[0].text 39 print('label_name:\t', label_name) ####### 標籤名稱,主要統計的是這個 40 # bbox 41 bndbox = object_name[1] 42 # print(len(bndbox)) # 4 43 bbox = [bndbox[0].text, bndbox[1].text, bndbox[2].text, bndbox[3].text ] 44 print(bbox) 45 46 47 temp_path = '../data/JanpanRoad' 48 txt_save_path = './temp.txt' 49 50 fun(temp_path, txt_save_path)
輸出結果
1 <class 'list'> 1 Czech_000022.xml 2 xml路徑: ../data/JanpanRoad\Czech_000022.xml 3 <class 'xml.etree.ElementTree.Element'> 4 <Element 'annotation' at 0x00000281DA735368> 4 5 folder_name: images 6 file_name: Czech_000022.jpg 7 size_depth: 3 8 size_width: 600 9 size_height: 600 10 label_name: D00 11 ['182', '471', '229', '512']