java讀取word文件的文字內容
阿新 • • 發佈:2020-09-20
該程式用於讀取word文件的文字內容,如果是藝術字,圖片不能讀取
先在idea建立maven專案
在pom.xml新增以下依賴
<!-- https://mvnrepository.com/artifact/org.apache.poi/poi --> <dependency> <groupId>org.apache.poi</groupId> <artifactId>poi</artifactId> <version>3.17</version> </dependency> <!-- https://mvnrepository.com/artifact/org.apache.poi/poi-ooxml --> <dependency> <groupId>org.apache.poi</groupId> <artifactId>poi-ooxml</artifactId> <version>3.17</version> </dependency> <!-- https://mvnrepository.com/artifact/org.apache.poi/poi-ooxml-schemas --><dependency> <groupId>org.apache.poi</groupId> <artifactId>poi-ooxml-schemas</artifactId> <version>3.17</version> </dependency> <!-- https://mvnrepository.com/artifact/org.apache.poi/poi-scratchpad --> <dependency> <groupId>org.apache.poi</groupId> <artifactId>poi-scratchpad</artifactId> <version>3.17</version> </dependency>
程式碼:
package com.gong; import java.io.File; import java.io.FileInputStream; import java.io.IOException; import java.io.InputStream; import org.apache.poi.POIXMLDocument; import org.apache.poi.POIXMLTextExtractor; import org.apache.poi.hwpf.extractor.WordExtractor; //import org.apache.poi.ooxml.POIXMLDocument; //import org.apache.poi.ooxml.extractor.POIXMLTextExtractor; import org.apache.poi.openxml4j.opc.OPCPackage; import org.apache.poi.xwpf.extractor.XWPFWordExtractor; public class Word { public static String ReadDoc(String path) throws IOException { String resullt = ""; //首先判斷檔案中的是doc/docx try { if (path.endsWith(".doc")) { InputStream is = new FileInputStream(new File(path)); WordExtractor re = new WordExtractor(is); resullt = re.getText(); re.close(); } else if (path.endsWith(".docx")) { OPCPackage opcPackage = POIXMLDocument.openPackage(path); POIXMLTextExtractor extractor = new XWPFWordExtractor(opcPackage); resullt = extractor.getText(); extractor.close(); } else { System.out.println("此檔案不是word檔案"); } } catch(Exception e){ e.printStackTrace(); } return resullt; } public static void main(String[] args) throws IOException { String path="E:\\datas\\學習.docx"; String result=ReadDoc(path); System.out.println(result); } }
執行程式在終端打印出來word文件的內容
此文參考了:https://blog.csdn.net/lq18894033018/article/details/97934901