1. 程式人生 > 實用技巧 >java讀取word文件的文字內容

java讀取word文件的文字內容

該程式用於讀取word文件的文字內容,如果是藝術字,圖片不能讀取

先在idea建立maven專案

在pom.xml新增以下依賴

 <!-- https://mvnrepository.com/artifact/org.apache.poi/poi -->
    <dependency>
      <groupId>org.apache.poi</groupId>
      <artifactId>poi</artifactId>
      <version>3.17</version>
    </dependency>
    <!-- https://
mvnrepository.com/artifact/org.apache.poi/poi-ooxml --> <dependency> <groupId>org.apache.poi</groupId> <artifactId>poi-ooxml</artifactId> <version>3.17</version> </dependency> <!-- https://mvnrepository.com/artifact/org.apache.poi/poi-ooxml-schemas -->
<dependency> <groupId>org.apache.poi</groupId> <artifactId>poi-ooxml-schemas</artifactId> <version>3.17</version> </dependency> <!-- https://mvnrepository.com/artifact/org.apache.poi/poi-scratchpad --> <dependency> <groupId>org.apache.poi</groupId> <artifactId>poi-scratchpad</artifactId> <version>3.17</version> </dependency>

程式碼:

package com.gong;

import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;

import org.apache.poi.POIXMLDocument;
import org.apache.poi.POIXMLTextExtractor;
import org.apache.poi.hwpf.extractor.WordExtractor;
//import org.apache.poi.ooxml.POIXMLDocument;
//import org.apache.poi.ooxml.extractor.POIXMLTextExtractor;
import org.apache.poi.openxml4j.opc.OPCPackage;
import org.apache.poi.xwpf.extractor.XWPFWordExtractor;
public class Word {
    public static String ReadDoc(String path) throws IOException {
        String resullt = "";
        //首先判斷檔案中的是doc/docx
        try {
            if (path.endsWith(".doc")) {
                InputStream is = new FileInputStream(new File(path));
                WordExtractor re = new WordExtractor(is);
                resullt = re.getText();
                re.close();
            } else if (path.endsWith(".docx")) {
                OPCPackage opcPackage = POIXMLDocument.openPackage(path);
                POIXMLTextExtractor extractor = new XWPFWordExtractor(opcPackage);
                resullt = extractor.getText();
                extractor.close();
            } else {
                System.out.println("此檔案不是word檔案");
            }
        } catch(Exception e){
            e.printStackTrace();
        }
        return resullt;
    }

    public static void main(String[] args) throws IOException {
        String path="E:\\datas\\學習.docx";
        String result=ReadDoc(path);
        System.out.println(result);
    }

}

執行程式在終端打印出來word文件的內容

此文參考了:https://blog.csdn.net/lq18894033018/article/details/97934901