使用python-docx讀取doc,docx文件
API: http://python-docx.readthedocs.io/en/latest/#api-documentation
將doc轉為docx:
from win32com import client as wc
word = wc.Dispatch("Word.Application")
doc = word.Documents.Open(路徑+名稱.doc)
doc.SaveAs(路徑+名稱.docx, 12) 12為docx
doc.Close()
word.Quit()
讀取段落:
import docx
docStr = Document(docName) 開啟文件
for paragraph in docStr.paragraphs:
parStr = paragraph.text
--》paragraph.style.name == 'Heading 1' 一級標題
--》paragraph.paragraph_format.alignment == 1 居中顯示
--》paragraph.style.next_paragraph_style.paragraph_format.alignment == 1 下一段居中顯示
--》paragraph.style.font.color
讀取表格:
numTables = docStr.tables
for table in numTables:
#行列個數
row_count = len(table.rows)
col_count = len(table.columns)
for i in range(row_count):
row = table.rows[i].cells
i行j列內容:row[j].text
或者:
row_count = len(table.rows)
col_count = len(table.columns)
for i in range(row_count):
for j in range(col_count):
print(table.cell(i,j).text)