Python把資料從Word(.docx)中讀出來寫入到Excel(.xlsx)中
阿新 • • 發佈:2019-01-28
左側Word的每一行是一段,是一些非結構化資料,目標是把它結構化表示成右側的excel格式。
需要匯入的包:
import docx from docx import Document from openpyxl import Workbook from tools import *
新建用於寫xlsx的物件
workbook = Workbook() booksheet = workbook.active
讀docx文件存入到xlsx裡:
dir = '/Users/b/' file = '南京親近母語2017年書目.docx' f = docx.Document(dir+file) level = '' #遍歷文件裡的段落for para in f.paragraphs: bookname = '' auther = '' publiser = '' resource = '南京親近母語2017年書目' text = para.text if len(text) == 0: continue text = key_filter(text) #用於過濾資料 textlist=text.split(' ') if len(textlist) == 1: level = textlist[0] print('level1',level) continueprint('level2',level) while ' ' in textlist: textlist.remove('') list = [] if is_bookname(textlist[0].strip()): bookname = re_filter(textlist[0].strip(),'[1-9]\d*.') print(bookname) else: continue list.append(bookname.strip()) list.append(textlist[1].strip()) list.append(publiser.strip()) list.append(resource.strip()) list.append(level.strip()) booksheet.append(list) workbook.save(file.split('.')[0]+'.xlsx')
上面是完整的,下面分開解釋解釋
讀Word文件:
f = docx.Document(dir+file) for para in f.paragraphs: text = para.text print(text)
新建excel檔案並寫入資料,以list的形式寫入表中
from openpyxl import Workbook workbook = Workbook() booksheet = workbook.active list = ['《大衛上學去》','[美]大衛·夏農','','南京親近母語2017年書目','一年級課程書目(圖畫書書目'] booksheet.append(list) workbook.save(file.split('.')[0]+'.xlsx')