1. 程式人生 > 程式設計 >python程式碼xml轉txt例項

python程式碼xml轉txt例項

為了訓練深度學習模型,經常要整理大量的標註資料,需統一不同格式的標註資料,一般情況下習慣讀取TXT格式的資料。但實際中經常遇到XML格式的標註資料,在此舉例:1.讀取XML標註資料;2.寫入TXT檔案。

XML標註資料如下

<annotation verified="no"> 
 <folder>suE</folder> 
 <filename>Drivingrecord_001</filename> 
 <path>C:\Desktop\Drivingrecord_001.jpg</path> 
 <source> 
  <database>Unknown</database> 
 </source> 
 <size> 
  <width>1920</width> 
  <height>1080</height> 
  <depth>3</depth> 
 </size> 
 <segmented>0</segmented> 
 <object> 
  <name>蘇E*****-藍-1-白,灰-大眾-上海大眾-桑塔納-尚納</name> 
  <flag>polygon</flag> 
  <pose>Unspecified</pose> 
  <truncated>0</truncated> 
  <difficult>0</difficult> 
  <bndbox> 
   <leftTopx>170</leftTopx> 
   <leftTopy>704</leftTopy> 
   <rightTopx>167</rightTopx> 
   <rightTopy>729</rightTopy> 
   <rightBottomx>242</rightBottomx> 
   <rightBottomy>735</rightBottomy> 
   <leftBottomx>243</leftBottomx> 
   <leftBottomy>710</leftBottomy> 
  </bndbox> 
 </object> 
 <object> 
  <name>蘇E*****-藍-1-黃-雷克薩斯-雷克薩斯(進口)-雷克薩斯RX</name> 
  <flag>polygon</flag> 
  <pose>Unspecified</pose> 
  <truncated>0</truncated> 
  <difficult>0</difficult> 
  <bndbox> 
   <leftTopx>733</leftTopx> 
   <leftTopy>721</leftTopy> 
   <rightTopx>733</rightTopx> 
   <rightTopy>759</rightTopy> 
   <rightBottomx>881</rightBottomx> 
   <rightBottomy>760</rightBottomy> 
   <leftBottomx>882</leftBottomx> 
   <leftBottomy>722</leftBottomy> 
  </bndbox> 
 </object> 
 <object> 
  <name>蘇*****-藍-1-黑-寶馬-寶馬(進口)-寶馬7系</name> 
  <flag>polygon</flag> 
  <pose>Unspecified</pose> 
  <truncated>0</truncated> 
  <difficult>0</difficult> 
  <bndbox> 
   <leftTopx>1274</leftTopx> 
 <leftTopy>657</leftTopy> 
   <rightTopx>1274</rightTopx> 
   <rightTopy>671</rightTopy> 
   <rightBottomx>1325</rightBottomx> 
   <rightBottomy>670</rightBottomy> 
   <leftBottomx>1326</leftBottomx> 
   <leftBottomy>656</leftBottomy> 
  </bndbox> 
 </object> 
 <object> 
  <name>蘇*****-藍-1-灰-標緻-東風標緻-標緻307</name> 
  <flag>polygon</flag> 
  <pose>Unspecified</pose> 
  <truncated>0</truncated> 
  <difficult>0</difficult> 
  <bndbox> 
   <leftTopx>1609</leftTopx> 
   <leftTopy>658</leftTopy> 
   <rightTopx>1611</rightTopx> 
   <rightTopy>671</rightTopy> 
   <rightBottomx>1659</rightBottomx> 
   <rightBottomy>669</rightBottomy> 
   <leftBottomx>1657</leftBottomx> 
   <leftBottomy>656</leftBottomy> 
  </bndbox> 
 </object> 
</annotation> 

在此,我們只需要圖片名filename,和每個object的座標(四個點的座標)

Drivingrecord_001.jpg 170 704 167 729 242 735 243 710 733 721 733 759 881 760 882 722 1274 657 1274 671 1325 670 1326 656 1609 658 1611 671 1659 669 1657 656

利用xml.dom.*模組,檔案物件模組DOM在讀取XML檔案時,一次讀取整個檔案,將其所有資料儲存在一個樹結構中,此時,可利用DOM的各種函式來讀取目標資料。在此,利用xml.dom.minidom解析XML檔案。

並將目標資料寫入TXT文件。

# -*- coding: utf-8 -*- 
""" 
Created on Fri Mar 2 15:36:44 2018 
 
@author: gg 
""" 
 
import xml.dom.minidom 
import os 
 
save_dir = 'D:\plate_train'  
if not os.path.exists(save_dir): 
  os.mkdir(save_dir) 
f = open(os.path.join(save_dir,'landmark.txt'),'w') 
 
DOMTree = xml.dom.minidom.parse('D:\plate_train\label\Drivingrecord_001.xml') 
annotation = DOMTree.documentElement 
 
filename = annotation.getElementsByTagName("filename")[0] 
imgname = filename.childNodes[0].data+'.jpg' 
print(imgname) 
   
objects = annotation.getElementsByTagName("object") 
 
loc = [imgname] #文件儲存格式:檔名 座標 
 
for object in objects: 
  bbox = object.getElementsByTagName("bndbox")[0] 
  leftTopx = bbox.getElementsByTagName("leftTopx")[0] 
  lefttopx = leftTopx.childNodes[0].data 
  print(lefttopx) 
  leftTopy = bbox.getElementsByTagName("leftTopy")[0] 
  lefttopy = leftTopy.childNodes[0].data 
  print(lefttopy) 
  rightTopx = bbox.getElementsByTagName("rightTopx")[0] 
  righttopx = rightTopx.childNodes[0].data 
  print(righttopx) 
  rightTopy = bbox.getElementsByTagName("rightTopy")[0] 
  righttopy = rightTopy.childNodes[0].data 
  print(righttopy) 
  rightBottomx = bbox.getElementsByTagName("rightBottomx")[0] 
  rightbottomx = rightBottomx.childNodes[0].data 
  print(rightbottomx) 
  rightBottomy = bbox.getElementsByTagName("rightBottomy")[0] 
  rightbottomy = rightBottomy.childNodes[0].data 
  print(rightbottomy) 
  leftBottomx = bbox.getElementsByTagName("leftBottomx")[0] 
  leftbottomx = leftBottomx.childNodes[0].data 
  print(leftbottomx) 
  leftBottomy = bbox.getElementsByTagName("leftBottomy")[0] 
  leftbottomy = leftBottomy.childNodes[0].data  
  print(leftbottomy) 
   
  loc = loc + [lefttopx,lefttopy,righttopx,righttopy,rightbottomx,rightbottomy,leftbottomx,leftbottomy] 
   
for i in range(len(loc)): 
  f.write(str(loc[i])+' ') 
f.write('\t\n')   
f.close() 
   

以上這篇python程式碼xml轉txt例項就是小編分享給大家的全部內容了,希望能給大家一個參考,也希望大家多多支援我們。