1. 程式人生 > >使用python-docx讀取doc,docx文件

使用python-docx讀取doc,docx文件

API:    http://python-docx.readthedocs.io/en/latest/#api-documentation

將doc轉為docx:

        from win32com import client as wc

        word = wc.Dispatch("Word.Application")

        doc = word.Documents.Open(路徑+名稱.doc)

        doc.SaveAs(路徑+名稱.docx, 12)   12為docx

        doc.Close()

        word.Quit()

讀取段落:

        import docx

        docStr = Document(docName)   開啟文件

        for paragraph in docStr.paragraphs:

                parStr = paragraph.text

                --》paragraph.style.name == 'Heading 1'  一級標題   

                --》paragraph.paragraph_format.alignment == 1  居中顯示

                --》paragraph.style.next_paragraph_style.paragraph_format.alignment == 1  下一段居中顯示

                --》paragraph.style.font.color

讀取表格:

        numTables = docStr.tables

        for table in numTables:

                #行列個數

                row_count = len(table.rows)

                col_count = len(table.columns)

                for i in range(row_count):

                        row = table.rows[i].cells

                        i行j列內容:row[j].text

           或者:

                    row_count = len(table.rows)
                    col_count = len(table.columns)
                    for i in range(row_count):
                            for j in range(col_count):
                                    print(table.cell(i,j).text)