Java 使用 POI 3.17根據Word 模板替換、操作書籤
由於專案的需求,需要對大量的word文件進行處理。
查找了大量的文件發現很多的部落格對這個進行了介紹,主要有2種方案做處理,jacob 和poi。但是現在的伺服器基本上是部署在Linux上,所以jacob基本上是不可行的。所以呢,主要是使用poi來進行這些操作。
Apache poi的hwpf模組是專門用來對word doc檔案進行讀寫操作的。在hwpf裡面我們使用HWPFDocument來表示一個word doc文件。在HWPFDocument裡面有這麼幾個概念:
Range:它表示一個範圍,這個範圍可以是整個文件,也可以是裡面的某一小節(Section),也可以是某一個段落(Paragraph),還可以是擁有共同屬性的一段文字(CharacterRun)。
Section:word文件的一個小節,一個word文件可以由多個小節構成。
Paragraph:word文件的一個段落,一個小節可以由多個段落構成。
CharacterRun:具有相同屬性的一段文字,一個段落可以由多個CharacterRun組成。Table:一個表格。
TableRow:表格對應的行。
TableCell:表格對應的單元格。
Section、Paragraph、CharacterRun和Table都繼承自Range。
1、基本的替換方法
InputStream inputStream = new FileInputStream(modulePath); HWPFDocument document = new HWPFDocument(inputStream); Range range = document.getRange(); for (Map.Entry<String, String> entry : maps.entrySet()) { range.replaceText("@" + entry.getKey() + "@", entry.getValue()); } OutputStream outputStream = new FileOutputStream(outPath); document.write(outputStream); this.closeStream(outputStream); this.closeStream(inputStream);
這些在網上已經有很普遍的使用了,但是這些基本上是基於3.9poi進行使用的,目前poi的版本已經更新到了3.17了,而且後續的就不會對Java6的支援了,最低支援Java8的,所以我們要使用3.17來進行對word進行文字的替換,書籤的操作。
BookMarkWord 檔案中標籤的封裝類,儲存了其定義和內部的操作
BookMarks: 利用POI進行Word檔案相關的操作,針對docx形式的封裝package com; import java.util.List; import java.util.Stack; import org.apache.poi.xwpf.usermodel.XWPFParagraph; import org.apache.poi.xwpf.usermodel.XWPFRun; import org.apache.poi.xwpf.usermodel.XWPFTable; import org.apache.poi.xwpf.usermodel.XWPFTableCell; import org.apache.poi.xwpf.usermodel.XWPFTableRow; import org.apache.xmlbeans.XmlException; import org.openxmlformats.schemas.wordprocessingml.x2006.main.CTBookmark; import org.openxmlformats.schemas.wordprocessingml.x2006.main.CTText; import org.w3c.dom.Node; import org.w3c.dom.NodeList; /** * * Word 檔案中標籤的封裝類,儲存了其定義和內部的操作 * * @author * * <p>Modification History:</p> * <p>Date Author Description</p> * <p>------------------------------------------------------------------</p> * <p> </p> * <p> </p> */ public class BookMark { //以下為定義的常量 /** 替換標籤時,設於標籤的後面 **/ public static final int INSERT_AFTER = 0; /** 替換標籤時,設於標籤的前面 **/ public static final int INSERT_BEFORE = 1; /** 替換標籤時,將內容替換書籤 **/ public static final int REPLACE = 2; /** docx中定義的部分常量引用 **/ public static final String RUN_NODE_NAME = "w:r"; public static final String TEXT_NODE_NAME = "w:t"; public static final String BOOKMARK_START_TAG = "bookmarkStart"; public static final String BOOKMARK_END_TAG = "bookmarkEnd"; public static final String BOOKMARK_ID_ATTR_NAME = "w:id"; public static final String STYLE_NODE_NAME = "w:rPr"; /** 內部的標籤定義類 **/ private CTBookmark _ctBookmark = null; /** 標籤所處的段落 **/ private XWPFParagraph _para = null; /** 標籤所在的表cell物件 **/ private XWPFTableCell _tableCell = null; /** 標籤名稱 **/ private String _bookmarkName = null; /** 該標籤是否處於表格內 **/ private boolean _isCell = false; /** * 建構函式 * @param ctBookmark * @param para */ public BookMark(CTBookmark ctBookmark, XWPFParagraph para) { this._ctBookmark = ctBookmark; this._para = para; this._bookmarkName = ctBookmark.getName(); this._tableCell = null; this._isCell = false; } /** * 建構函式,用於表格中的標籤 * @param ctBookmark * @param para * @param tableCell */ public BookMark(CTBookmark ctBookmark, XWPFParagraph para, XWPFTableCell tableCell) { this(ctBookmark, para); this._tableCell = tableCell; this._isCell = true; } public boolean isInTable() { return this._isCell; } public XWPFTable getContainerTable() { return this._tableCell.getTableRow().getTable(); } public XWPFTableRow getContainerTableRow() { return this._tableCell.getTableRow(); } public String getBookmarkName() { return this._bookmarkName; } /** * Insert text into the Word document in the location indicated by this * bookmark. * * @param bookmarkValue An instance of the String class that encapsulates * the text to insert into the document. * @param where A primitive int whose value indicates where the text ought * to be inserted. There are three options controlled by constants; insert * the text immediately in front of the bookmark (Bookmark.INSERT_BEFORE), * insert text immediately after the bookmark (Bookmark.INSERT_AFTER) and * replace any and all text that appears between the bookmark's square * brackets (Bookmark.REPLACE). */ public void insertTextAtBookMark(String bookmarkValue, int where) { //根據標籤的型別,進行不同的操作 if(this._isCell) { this.handleBookmarkedCells(bookmarkValue, where); } else { //普通標籤,直接建立一個元素 XWPFRun run = this._para.createRun(); run.setText(bookmarkValue); switch(where) { case BookMark.INSERT_AFTER: this.insertAfterBookmark(run); break; case BookMark.INSERT_BEFORE: this.insertBeforeBookmark(run); break; case BookMark.REPLACE: this.replaceBookmark(run); break; } } } /** * Inserts some text into a Word document in a position that is immediately * after a named bookmark. * * Bookmarks can take two forms, they can either simply mark a location * within a document or they can do this but contain some text. The * difference is obvious from looking at some XML markup. The simple * placeholder bookmark will look like this; * * <pre> * * <w:bookmarkStart w:name="AllAlone" w:id="0"/><w:bookmarkEnd w:id="0"/> * * </pre> * * Simply a pair of tags where one tag has the name bookmarkStart, the other * the name bookmarkEnd and both share matching id attributes. In this case, * the text will simply be inserted into the document at a point immediately * after the bookmarkEnd tag. No styling will be applied to the text, it * will simply inherit the documents defaults. * * The more complex case looks like this; * * <pre> * * <w:bookmarkStart w:name="InStyledText" w:id="3"/> * <w:r w:rsidRPr="00DA438C"> * <w:rPr> * <w:rFonts w:hAnsi="Engravers MT" w:ascii="Engravers MT" w:cs="Arimo"/> * <w:color w:val="FF0000"/> * </w:rPr> * <w:t>text</w:t> * </w:r> * <w:bookmarkEnd w:id="3"/> * * </pre> * * Here, the user has selected the word 'text' and chosen to insert a * bookmark into the document at that point. So, the bookmark tags 'contain' * a character run that is styled. Inserting any text after this bookmark, * it is important to ensure that the styling is preserved and copied over * to the newly inserted text. * * The approach taken to dealing with both cases is similar but slightly * different. In both cases, the code simply steps along the document nodes * until it finds the bookmarkEnd tag whose ID matches that of the * bookmarkStart tag. Then, it will look to see if there is one further node * following the bookmarkEnd tag. If there is, it will insert the text into * the paragraph immediately in front of this node. If, on the other hand, * there are no more nodes following the bookmarkEnd tag, then the new run * will simply be positioned at the end of the paragraph. * * Styles are dealt with by 'looking' for a 'w:rPr' element whilst iterating * through the nodes. If one is found, its details will be captured and * applied to the run before the run is inserted into the paragraph. If * there are multiple runs between the bookmarkStart and bookmarkEnd tags * and these have different styles applied to them, then the style applied * to the last run before the bookmarkEnd tag - if any - will be cloned and * applied to the newly inserted text. * * @param run An instance of the XWPFRun class that encapsulates the text * that is to be inserted into the document following the bookmark. */ private void insertAfterBookmark(XWPFRun run) { Node nextNode = null; Node insertBeforeNode = null; Node styleNode = null; int bookmarkStartID = 0; int bookmarkEndID = -1; // Capture the id of the bookmarkStart tag. The code will step through // the document nodes 'contained' within the start and end tags that have // matching id numbers. bookmarkStartID = this._ctBookmark.getId().intValue(); // Get the node for the bookmark start tag and then enter a loop that // will step from one node to the next until the bookmarkEnd tag with // a matching id is fouind. nextNode = this._ctBookmark.getDomNode(); while (bookmarkStartID != bookmarkEndID) { // Get the next node along and check to see if it is a bookmarkEnd // tag. If it is, get its id so that the containing while loop can // be terminated once the correct end tag is found. Note that the // id will be obtained as a String and must be converted into an // integer. This has been coded to fail safely so that if an error // is encuntered converting the id to an int value, the while loop // will still terminate. nextNode = nextNode.getNextSibling(); if (nextNode.getNodeName().contains(BookMark.BOOKMARK_END_TAG)) { try { bookmarkEndID = Integer.parseInt( nextNode.getAttributes().getNamedItem( BookMark.BOOKMARK_ID_ATTR_NAME).getNodeValue()); } catch (NumberFormatException nfe) { bookmarkEndID = bookmarkStartID; } } // If we are not dealing with a bookmarkEnd node, are we dealing // with a run node that MAY contains styling information. If so, // then get that style information from the run. else { if (nextNode.getNodeName().equals(BookMark.RUN_NODE_NAME)) { styleNode = this.getStyleNode(nextNode); } } } // After the while loop completes, it should have located the correct // bookmarkEnd tag but we cannot perform an insert after only an insert // before operation and must, therefore, get the next node. insertBeforeNode = nextNode.getNextSibling(); // Style the newly inserted text. Note that the code copies or clones // the style it found in another run, failure to do this would remove the // style from one node and apply it to another. if (styleNode != null) { run.getCTR().getDomNode().insertBefore( styleNode.cloneNode(true), run.getCTR().getDomNode().getFirstChild()); } // Finally, check to see if there was a node after the bookmarkEnd // tag. If there was, then this code will insert the run in front of // that tag. If there was no node following the bookmarkEnd tag then the // run will be inserted at the end of the paragarph and this was taken // care of at the point of creation. if (insertBeforeNode != null) { this._para.getCTP().getDomNode().insertBefore( run.getCTR().getDomNode(), insertBeforeNode); } } /** * Inserts some text into a Word document immediately in front of the * location of a bookmark. * * This case is slightly more straightforward than inserting after the * bookmark. For example, it is possible only to insert a new node in front * of an existing node. When inserting after the bookmark, then end node had * to be located whereas, in this case, the node is already known, it is the * CTBookmark itself. The only information that must be discovered is * whether there is a run immediately in front of the boookmarkStart tag and * whether that run is styled. If there is and if it is, then this style * must be cloned and applied the text which will be inserted into the * paragraph. * * @param run An instance of the XWPFRun class that encapsulates the text * that is to be inserted into the document following the bookmark. */ private void insertBeforeBookmark(XWPFRun run) { Node insertBeforeNode = null; Node childNode = null; Node styleNode = null; // Get the dom node from the bookmarkStart tag and look for another // node immediately preceding it. insertBeforeNode = this._ctBookmark.getDomNode(); childNode = insertBeforeNode.getPreviousSibling(); // If a node is found, try to get the styling from it. if (childNode != null) { styleNode = this.getStyleNode(childNode); // If that previous node was styled, then apply this style to the // text which will be inserted. if (styleNode != null) { run.getCTR().getDomNode().insertBefore( styleNode.cloneNode(true), run.getCTR().getDomNode().getFirstChild()); } } // Insert the text into the paragraph immediately in front of the // bookmarkStart tag. this._para.getCTP().getDomNode().insertBefore( run.getCTR().getDomNode(), insertBeforeNode); } /** * Replace the text - if any - contained between the bookmarkStart and it's * matching bookmarkEnd tag with the text specified. The technique used will * resemble that employed when inserting text after the bookmark. In short, * the code will iterate along the nodes until it encounters a matching * bookmarkEnd tag. Each node encountered will be deleted unless it is the * final node before the bookmarkEnd tag is encountered and it is a * character run. If this is the case, then it can simply be updated to * contain the text the users wishes to see inserted into the document. If * the last node is not a character run, then it will be deleted, a new run * will be created and inserted into the paragraph between the bookmarkStart * and bookmarkEnd tags. * * @param run An instance of the XWPFRun class that encapsulates the text * that is to be inserted into the document following the bookmark. */ private void replaceBookmark(XWPFRun run) { Node nextNode = null; Node styleNode = null; Node lastRunNode = null; Node toDelete = null; NodeList childNodes = null; Stack<Node> nodeStack = null; boolean textNodeFound = false; boolean foundNested = true; int bookmarkStartID = 0; int bookmarkEndID = -1; int numChildNodes = 0; nodeStack = new Stack<Node>(); bookmarkStartID = this._ctBookmark.getId().intValue(); nextNode = this._ctBookmark.getDomNode(); nodeStack.push(nextNode); // Loop through the nodes looking for a matching bookmarkEnd tag while (bookmarkStartID != bookmarkEndID) { nextNode = nextNode.getNextSibling(); nodeStack.push(nextNode); // If an end tag is found, does it match the start tag? If so, end // the while loop. if (nextNode.getNodeName().contains(BookMark.BOOKMARK_END_TAG)) { try { bookmarkEndID = Integer.parseInt( nextNode.getAttributes().getNamedItem( BookMark.BOOKMARK_ID_ATTR_NAME).getNodeValue()); } catch (NumberFormatException nfe) { bookmarkEndID = bookmarkStartID; } } //else { // Place a reference to the node on the nodeStack // nodeStack.push(nextNode); //} } // If the stack of nodes found between the bookmark tags is not empty // then they have to be removed. if (!nodeStack.isEmpty()) { // Check the node at the top of the stack. If it is a run, get it's // style - if any - and apply to the run that will be replacing it. //lastRunNode = nodeStack.pop(); lastRunNode = nodeStack.peek(); if ((lastRunNode.getNodeName().equals(BookMark.RUN_NODE_NAME))) { styleNode = this.getStyleNode(lastRunNode); if (styleNode != null) { run.getCTR().getDomNode().insertBefore( styleNode.cloneNode(true), run.getCTR().getDomNode().getFirstChild()); } } // Delete any and all node that were found in between the start and // end tags. This is slightly safer that trying to delete the nodes // as they are found while stepping through them in the loop above. // If we are peeking, then this line can be commented out. //this._para.getCTP().getDomNode().removeChild(lastRunNode); this.deleteChildNodes(nodeStack); } // Place the text into position, between the bookmark tags. this._para.getCTP().getDomNode().insertBefore( run.getCTR().getDomNode(), nextNode); } /** * When replacing the bookmark's text, it is necessary to delete any nodes * that are found between matching start and end tags. Complications occur * here because it is possible to have bookmarks nested within bookmarks to * almost any level and it is important to not remove any inner or nested * bookmarks when replacing the contents of an outer or containing * bookmark. This code successfully handles the simplest occurrence - where * one bookmark completely contains another - but not more complex cases * where one bookmark overlaps another in the markup. That is still to do. * * @param nodeStack An instance of the Stack class that encapsulates * references to any and all nodes found between the opening and closing * tags of a bookmark. */ private void deleteChildNodes(Stack<Node> nodeStack) { Node toDelete = null; int bookmarkStartID = 0; int bookmarkEndID = 0; boolean inNestedBookmark = false; // The first element in the list will be a bookmarkStart tag and that // must not be deleted. for(int i = 1; i < nodeStack.size(); i++) { // Get an element. If it is another bookmarkStart tag then // again, we do not want to delete it, it's matching end tag // or any nodes that fall inbetween. toDelete = nodeStack.elementAt(i); if(toDelete.getNodeName().contains(BookMark.BOOKMARK_START_TAG)) { bookmarkStartID = Integer.parseInt( toDelete.getAttributes().getNamedItem(BookMark.BOOKMARK_ID_ATTR_NAME).getNodeValue()); inNestedBookmark = true; } else if(toDelete.getNodeName().contains(BookMark.BOOKMARK_END_TAG)) { bookmarkEndID = Integer.parseInt( toDelete.getAttributes().getNamedItem(BookMark.BOOKMARK_ID_ATTR_NAME).getNodeValue()); if(bookmarkEndID == bookmarkStartID) { inNestedBookmark = false; } } else { if(!inNestedBookmark) { this._para.getCTP().getDomNode().removeChild(toDelete); } } } } /** * Recover styling information - if any - from another document node. Note * that it is only possible to accomplish this if the node is a run (w:r) * and this could be tested for in the code that calls this method. However, * a check is made in the calling code as to whether a style has been found * and only if a style is found is it applied. This method always returns * null if it does not find a style making that checking process easier. * * @param parentNode An instance of the Node class that encapsulates a * reference to a document node. * @return An instance of the Node class that encapsulates the styling * information applied to a character run. Note that if no styling * information is found in the run OR if the node passed as an argument to * the parentNode parameter is NOT a run, then a null value will be * returned. */ private Node getStyleNode(Node parentNode) { Node childNode = null; Node styleNode = null; if (parentNode != null) { // If the node represents a run and it has child nodes then // it can be processed further. Note, whilst testing the code, it // was observed that although it is possible to get a list of a nodes // children, even when a node did have children, trying to obtain this // list would often return a null value. This is the reason why the // technique of stepping from one node to the next is used here. if (parentNode.getNodeName().equalsIgnoreCase(BookMark.RUN_NODE_NAME) && parentNode.hasChildNodes()) { // Get the first node and catch it's reference for return if // the first child node is a style node (w:rPr). childNode = parentNode.getFirstChild(); if (childNode.getNodeName().equals("w:rPr")) { styleNode = childNode; } else { // If the first node was not a style node and there are other // child nodes remaining to be checked, then step through // the remaining child nodes until either a style node is // found or until all child nodes have been processed. while ((childNode = childNode.getNextSibling()) != null) { if (childNode.getNodeName().equals(BookMark.STYLE_NODE_NAME)) { styleNode = childNode; // Note setting to null here if a style node is // found in order order to terminate any further // checking childNode = null; } } } } } return (styleNode); } /** * Get the text - if any - encapsulated by this bookmark. The creator of a * Word document can chose to select one or more items of text and then * insert a bookmark at that location. The highlighted text will appear * between the square brackets that denote the location of a bookmark in the * document's text and they will be returned by a call to this method. * * @return An instance of the String class encapsulating any text that * appeared between the opening and closing square bracket associated with * this bookmark. * @throws XmlException Thrown if a problem is encountered parsing the XML * markup recovered from the document in order to construct a CTText * instance which may required to obtain the bookmarks text. */ public String getBookmarkText() throws XmlException { StringBuilder builder = null; // Are we dealing with a bookmarked table cell? If so, the entire // contents of the cell - if anything - must be recovered and returned. if(this._tableCell != null) { builder = new StringBuilder(this._tableCell.getText()); } else { builder = this.getTextFromBookmark(); } return(builder == null ? null : builder.toString()); } /** * There are two types of bookmarks. One is a simple placeholder whilst the * second is still a placeholder but it 'contains' some text. In the second * instance, the creator of the document has selected some text and then * chosen to insert a bookmark there and the difference if obvious when * looking at the XML markup. * * The simple case; * * <pre> * * <w:bookmarkStart w:name="AllAlone" w:id="0"/><w:bookmarkEnd w:id="0"/> * * </pre> * * The more complex case; * * <pre> * * <w:bookmarkStart w:name="InStyledText" w:id="3"/> * <w:r w:rsidRPr="00DA438C"> * <w:rPr> * <w:rFonts w:hAnsi="Engravers MT" w:ascii="Engravers MT" w:cs="Arimo"/> * <w:color w:val="FF0000"/> * </w:rPr> * <w:t>text</w:t> * </w:r> * <w:bookmarkEnd w:id="3"/> * * </pre> * * This method assumes that the user wishes to recover the content from any * character run that appears in the markup between a matching pair of * bookmarkStart and bookmarkEnd tags; thus, using the example above again, * this method would return the String 'text' to the user. It is possible * however for a bookmark to contain more than one run and for a bookmark to * contain other bookmarks. In both of these cases, this code will return * the text contained within any and all runs that appear in the XML markup * between matching bookmarkStart and bookmarkEnd tags. The term 'matching * bookmarkStart and bookmarkEndtags' here means tags whose id attributes * have matching value. * * @return An instance of the StringBuilder class encapsulating the text * recovered from any character run elements found between the bookmark's * start and end tags. If no text is found then a null value will be * returned. * @throws XmlException Thrown if a problem is encountered parsing the XML * markup recovered from the document in order to construct a CTText * instance which may be required to obtain the bookmarks text. */ private StringBuilder getTextFromBookmark() throws XmlException { int startBookmarkID = 0; int endBookmarkID = -1; Node nextNode = null; Node childNode = null; CTText text = null; StringBuilder builder = null; String rawXML = null; // Get the ID of the bookmark from it's start tag, the DOM node from the // bookmark (to make looping easier) and initialise the StringBuilder. startBookmarkID = this._ctBookmark.getId().intValue(); nextNode = this._ctBookmark.getDomNode(); builder = new StringBuilder(); // Loop through the nodes held between the bookmark's start and end // tags. while (startBookmarkID != endBookmarkID) { // Get the next node and, if it is a bookmarkEnd tag, get it's ID // as matching ids will terminate the while loop.. nextNode = nextNode.getNextSibling(); if (nextNode.getNodeName().contains(BookMark.BOOKMARK_END_TAG)) { // Get the ID attribute from the node. It is a String that must // be converted into an int. An exception could be thrown and so // the catch clause will ensure the loop ends neatly even if the // value might be incorrect. Must inform the user. try { endBookmarkID = Integer.parseInt( nextNode.getAttributes(). getNamedItem(BookMark.BOOKMARK_ID_ATTR_NAME).getNodeValue()); } catch (NumberFormatException nfe) { endBookmarkID = startBookmarkID; } } else { // This is not a bookmarkEnd node and can processed it for any // text it may contain. Note the check for both type - it must // be a run - and contain children. Interestingly, it seems as // though the node may contain children and yet the call to // nextNode.getChildNodes() will still return an empty list, // hence the need to step through the child nodes. if (nextNode.getNodeName().equals(BookMark.RUN_NODE_NAME) && nextNode.hasChildNodes()) { // Get the text from the child nodes. builder.append(this.getTextFromChildNodes(nextNode)); } } } return (builder); } /** * Iterates through all and any children of the Node whose reference will be * passed as an argument to the node parameter, and recover the contents of * any text nodes. Testing revealed that a node can be called a text node * and yet report it's type as being something different, an element node * for example. Calling the getNodeValue() method on a text node will return * the text the node encapsulates but doing the same on an element node will * not. In fact, the call will simply return a null value. As a result, this * method will test the nodes name to catch all text nodes - those whose * name is to 'w:t' and then it's type. If the type is reported to be a text * node, it is a trivial task to get at it's contents. However, if the type * is not reported as a text type, then it is necessary to parse the raw XML * markup for the node to recover it's value. * * @param node An instance of the Node class that encapsulates a reference * to a node recovered from the document being processed. It should be * passed a reference to a character run - 'w:r' - node. * @return An instance of the String class that encapsulates the text * recovered from the nodes children, if they are text nodes. * @throws XmlException Thrown if a problem is encountered parsing the XML * markup recovered from the document in order to construct the CTText * instance which may be required to obtain the bookmarks text. */ private String getTextFromChildNodes(Node node) throws XmlException { NodeList childNodes = null; Node childNode = null; CTText text = null; StringBuilder builder = new StringBuilder(); int numChildNodes = 0; // Get a list of chid nodes from the node passed to the method and // find out how many children there are in the list. childNodes = node.getChildNodes(); numChildNodes = childNodes.getLength(); // Iterate through the children one at a time - it is possible for a // run to ciontain zero, one or more text nodes - and recover the text // from an text type child nodes. for (int i = 0; i < numChildNodes; i++) { // Get a node and check it's name. If this is 'w:t' then process as // text type node. childNode = childNodes.item(i); if (childNode.getNodeName().equals(BookMark.TEXT_NODE_NAME)) { // If the node reports it's type as txet, then simply call the // getNodeValue() method to get at it's text. if (childNode.getNodeType() == Node.TEXT_NODE) { builder.append(childNode.getNodeValue()); } else { // Correct the type by parsing the node's XML markup and // creating a CTText object. Call the getStringValue() // method on that to get the text. text = CTText.Factory.parse(childNode); builder.append(text.getStringValue()); } } } return (builder.toString()); } private void handleBookmarkedCells(String bookmarkValue, int where) { List<XWPFParagraph> paraList = null; List<XWPFRun> runs = null; XWPFParagraph para = null; XWPFRun readRun = null; // Get a list if paragraphs from the table cell and remove any and all. paraList = this._tableCell.getParagraphs(); for(int i = 0; i < paraList.size(); i++) { this._tableCell.removeParagraph(i); } para = this._tableCell.addParagraph(); para.createRun().setText(bookmarkValue); } }
package com;
import org.apache.poi.xwpf.usermodel.XWPFDocument;
import org.apache.poi.xwpf.usermodel.XWPFParagraph;
import org.apache.poi.xwpf.usermodel.XWPFTable;
import org.apache.poi.xwpf.usermodel.XWPFTableCell;
import org.apache.poi.xwpf.usermodel.XWPFTableRow;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.CTBookmark;
import java.util.HashMap;
import java.util.Iterator;
import java.util.List;
import java.util.Collection;
import java.util.Set;
import org.w3c.dom.NamedNodeMap;
import org.w3c.dom.Node;
/**
*
* 利用POI進行Word檔案相關的操作,針對docx形式的封裝
*
* @author
*
* <p>Modification History:</p>
* <p>Date Author Description</p>
* <p>------------------------------------------------------------------</p>
* <p> </p>
* <p> </p>
*/
public class BookMarks {
/** 儲存Word檔案中定義的標籤 **/
private HashMap<String, BookMark> _bookmarks = null;
/**
* 建構函式,用以分析文件,解析出所有的標籤
*
* @param document Word OOXML document instance.
*/
public BookMarks(XWPFDocument document) {
//初始化標籤快取
this._bookmarks = new HashMap<String, BookMark>();
// 首先解析文件普通段落中的標籤
this.procParaList(document.getParagraphs());
//利用繁瑣的方法,從所有的表格中得到得到標籤,處理比較原始和簡單
List<XWPFTable> tableList = document.getTables();
for (XWPFTable table : tableList) {
//得到表格的列資訊
List<XWPFTableRow> rowList = table.getRows();
for (XWPFTableRow row : rowList){
//得到行中的列資訊
List<XWPFTableCell> cellList = row.getTableCells();
for (XWPFTableCell cell : cellList) {
//逐個解析標籤資訊
//this.procParaList(cell.getParagraphs(), row);
this.procParaList(cell);
}
}
}
}
/**
* 根據標籤名稱,獲得標籤的相關定義,如果不存在,則返回空
* @param bookmarkName 標籤名稱
* @return 返回封裝好的物件
*/
public BookMark getBookmark(String bookmarkName) {
BookMark bookmark = null;
if(this._bookmarks.containsKey(bookmarkName)) {
bookmark = this._bookmarks.get(bookmarkName);
}
return bookmark;
}
/**
* 得到所有的標籤資訊集合
*
* @return 快取的標籤資訊集合
*/
public Collection<BookMark> getBookmarkList() {
return(this._bookmarks.values());
}
/**
* 返回文件中的標籤名稱迭代器
* @return 由Map KEY 轉換的迭代器
*/
public Iterator<String> getNameIterator() {
return(this._bookmarks.keySet().iterator());
}
private void procParaList(XWPFTableCell cell){
List<XWPFParagraph> paragraphList = cell.getParagraphs();
for(XWPFParagraph paragraph : paragraphList){
//得到段落中的標籤標記
List<CTBookmark> bookmarkList = paragraph.getCTP().getBookmarkStartList();
for (CTBookmark bookmark : bookmarkList ) {
this._bookmarks.put(bookmark.getName(),
new BookMark(bookmark, paragraph, cell));
}
}
}
/**
* 解析表格中的標籤
* @param paragraphList 傳入的段落列表
* @param tableRow 對應的表格行物件
*/
private void procParaList(List<XWPFParagraph> paragraphList, XWPFTableRow tableRow) {
NamedNodeMap attributes = null;
Node colFirstNode = null;
Node colLastNode = null;
int firstColIndex = 0;
int lastColIndex = 0;
//迴圈判斷,解析段落中的標籤
for (XWPFParagraph paragraph : paragraphList) {
//得到段落中的標籤標記
List<CTBookmark> bookmarkList = paragraph.getCTP().getBookmarkStartList();
for (CTBookmark bookmark : bookmarkList ) {
// With a bookmark in hand, test to see if the bookmarkStart tag
// has w:colFirst or w:colLast attributes. If it does, we are
// dealing with a bookmarked table cell. This will need to be
// handled differnetly - I think by an different concrete class
// that implements the Bookmark interface!!
attributes = bookmark.getDomNode().getAttributes();
if(attributes != null) {
// Get the colFirst and colLast attributes. If both - for
// now - are found, then we are dealing with a bookmarked
// cell.
colFirstNode = attributes.getNamedItem("w:colFirst");
colLastNode = attributes.getNamedItem("w:colLast");
if(colFirstNode != null && colLastNode != null) {
// Get the index of the cell (or cells later) from them.
// First convefrt the String values both return to primitive
// int value. TO DO, what happens if there is a
// NumberFormatException.
firstColIndex = Integer.parseInt(colFirstNode.getNodeValue());
lastColIndex = Integer.parseInt(colLastNode.getNodeValue());
// if the indices are equal, then we are dealing with a#
// cell and can create the bookmark for it.
if(firstColIndex == lastColIndex) {
this._bookmarks.put(bookmark.getName(),
new BookMark(bookmark, paragraph,
tableRow.getCell(firstColIndex)));
}
else {
System.out.println("This bookmark " + bookmark.getName() +
" identifies a number of cells in the "
+ "table. That condition is not handled yet.");
}
}
else {
this._bookmarks.put(bookmark.getName(),
new BookMark(bookmark, paragraph,tableRow.getCell(1)));
}
}
else {
this._bookmarks.put(bookmark.getName(),
new BookMark(bookmark, paragraph,tableRow.getCell(1)));
}
}
}
}
/**
* 解析普通段落中的標籤
* @param paragraphList 傳入的段落
*/
private void procParaList(List<XWPFParagraph> paragraphList) {
for (XWPFParagraph paragraph : paragraphList) {
List<CTBookmark> bookmarkList = paragraph.getCTP().getBookmarkStartList();
//迴圈加入標籤
for (CTBookmark bookmark : bookmarkList) {
this._bookmarks.put(bookmark.getName(),
new BookMark(bookmark, paragraph));
}
}
}
}
使用的工具類:MSWordTool
package com;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.IOException;
import java.math.BigInteger;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.Iterator;
import java.util.List;
import java.util.Map;
import java.util.Map.Entry;
import org.apache.poi.POIXMLDocument;
import org.apache.poi.xwpf.usermodel.ParagraphAlignment;
import org.apache.poi.xwpf.usermodel.XWPFDocument;
import org.apache.poi.xwpf.usermodel.XWPFParagraph;
import org.apache.poi.xwpf.usermodel.XWPFRun;
import org.apache.poi.xwpf.usermodel.XWPFTable;
import org.apache.poi.xwpf.usermodel.XWPFTableCell;
import org.apache.poi.xwpf.usermodel.XWPFTableRow;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.CTHeight;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.CTRow;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.CTTrPr;
import org.w3c.dom.Node;
/**
* 使用POI,進行Word相關的操作
*
*
* @author xuyu
*
* <p>Modification History:</p>
* <p>Date Author Description</p>
* <p>------------------------------------------------------------------</p>
* <p> </p>
* <p> </p>
*/
public class MSWordTool {
/** 內部使用的文件物件 **/
private XWPFDocument document;
private BookMarks bookMarks = null;
/**
* 為文件設定模板
* @param templatePath 模板檔名稱
*/
public void setTemplate(String templatePath) {
try {
this.document = new XWPFDocument(
POIXMLDocument.openPackage(templatePath));
bookMarks = new BookMarks(document);
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
/**
* 進行標籤替換的例子,傳入的Map中,key表示標籤名稱,value是替換的資訊
* @param indicator
*/
public void replaceBookMark(Map<String,String> indicator) {
//迴圈進行替換
Iterator<String> bookMarkIter = bookMarks.getNameIterator();
while (bookMarkIter.hasNext()) {
String bookMarkName = bookMarkIter.next();
//得到標籤名稱
BookMark bookMark = bookMarks.getBookmark(bookMarkName);
//進行替換
if (indicator.get(bookMarkName)!=null) {
bookMark.insertTextAtBookMark(indicator.get(bookMarkName), BookMark.INSERT_BEFORE);
}
}
}
public void fillTableAtBookMark(String bookMarkName,List<Map<String,String>> content) {
//rowNum來比較標籤在表格的哪一行
int rowNum = 0;
//首先得到標籤
BookMark bookMark = bookMarks.getBookmark(bookMarkName);
Map<String, String> columnMap = new HashMap<String, String>();
Map<String, Node> styleNode = new HashMap<String, Node>();
//標籤是否處於表格內
if(bookMark.isInTable()){
//獲得標籤對應的Table物件和Row物件
XWPFTable table = bookMark.getContainerTable();
XWPFTableRow row = bookMark.getContainerTableRow();
CTRow ctRow = row.getCtRow();
List<XWPFTableCell> rowCell = row.getTableCells();
for(int i = 0; i < rowCell.size(); i++){
columnMap.put(i+"", rowCell.get(i).getText().trim());
//System.out.println(rowCell.get(i).getParagraphs().get(0).createRun().getFontSize());
//System.out.println(rowCell.get(i).getParagraphs().get(0).getCTP());
//System.out.println(rowCell.get(i).getParagraphs().get(0).getStyle());
//獲取該單元格段落的xml,得到根節點
Node node1 = rowCell.get(i).getParagraphs().get(0).getCTP().getDomNode();
//遍歷根節點的所有子節點
for (int x=0;x<node1.getChildNodes().getLength();x++) {
if (node1.getChildNodes().item(x).getNodeName().equals(BookMark.RUN_NODE_NAME)) {
Node node2 = node1.getChildNodes().item(x);
//遍歷所有節點為"w:r"的所有自己點,找到節點名為"w:rPr"的節點
for (int y=0;y<node2.getChildNodes().getLength();y++) {
if(node2.getChildNodes().item(y).getNodeName().endsWith(BookMark.STYLE_NODE_NAME)){
//將節點為"w:rPr"的節點(字型格式)存到HashMap中
styleNode.put(i+"", node2.getChildNodes().item(y));
}
}
} else {
continue;
}
}
}
//迴圈對比,找到該行所處的位置,刪除改行
for(int i = 0; i < table.getNumberOfRows(); i++){
if(table.getRow(i).equals(row)){
rowNum = i;
break;
}
}
table.removeRow(rowNum);
for(int i = 0; i < content.size(); i++){
//建立新的一行,單元格數是表的第一行的單元格數,
//後面新增資料時,要判斷單元格數是否一致
XWPFTableRow tableRow = table.createRow();
CTTrPr trPr = tableRow.getCtRow().addNewTrPr();
CTHeight ht = trPr.addNewTrHeight();
ht.setVal(BigInteger.valueOf(360));
}
//得到表格行數
int rcount = table.getNumberOfRows();
for(int i = rowNum; i < rcount; i++){
XWPFTableRow newRow = table.getRow(i);
//判斷newRow的單元格數是不是該書籤所在行的單元格數
if(newRow.getTableCells().size() != rowCell.size()){
//計算newRow和書籤所在行單元格數差的絕對值
//如果newRow的單元格數多於書籤所在行的單元格數,不能通過此方法來處理,可以通過表格中文字的替換來完成
//如果newRow的單元格數少於書籤所在行的單元格數,要將少的單元格補上
int sub= Math.abs(newRow.getTableCells().size() - rowCell.size());
//將缺少的單元格補上
for(int j = 0;j < sub; j++){
newRow.addNewTableCell();
}
}
List<XWPFTableCell> cells = newRow.getTableCells();
for(int j = 0; j < cells.size(); j++){
XWPFParagraph para = cells.get(j).getParagraphs().get(0);
XWPFRun run = para.createRun();
if(content.get(i-rowNum).get(columnMap.get(j+"")) != null){
//改變單元格的值,標題欄不用改變單元格的值
run.setText(content.get(i-rowNum).get(columnMap.get(j+""))+"");
//將單元格段落的字型格式設為原來單元格的字型格式
run.getCTR().getDomNode().insertBefore(styleNode.get(j+"").cloneNode(true), run.getCTR().getDomNode().getFirstChild());
}
para.setAlignment(ParagraphAlignment.CENTER);
}
}
}
}
public void replaceText(Map<String,String> bookmarkMap, String bookMarkName) {
//首先得到標籤
BookMark bookMark = bookMarks.getBookmark(bookMarkName);
//獲得書籤標記的表格
XWPFTable table = bookMark.getContainerTable();
//獲得所有的表
//Iterator<XWPFTable> it = document.getTablesIterator();
if(table != null){
//得到該表的所有行
int rcount = table.getNumberOfRows();
for(int i = 0 ;i < rcount; i++){
XWPFTableRow row = table.getRow(i);
//獲到改行的所有單元格
List<XWPFTableCell> cells = row.getTableCells();
for(XWPFTableCell c : cells){
for(Entry<String,String> e : bookmarkMap.entrySet()){
if(c.getText().equals(e.getKey())){
//刪掉單元格內容
c.removeParagraph(0);
//給單元格賦值
c.setText(e.getValue());
}
}
}
}
}
}
public void saveAs() {
File newFile = new File("e:\\test\\Word模版_REPLACE.docx");
FileOutputStream fos = null;
try {
fos = new FileOutputStream(newFile);
} catch (FileNotFoundException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
try {
this.document.write(fos);
fos.flush();
fos.close();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}
測試方法
/**
* @param args
*/
public static void main(String[] args) {
long startTime = System.currentTimeMillis();
MSWordTool changer = new MSWordTool();
changer.setTemplate("E:\\test\\Word.docx");
Map<String,String> content = new HashMap<String,String>();
content.put("Principles", "格式規範、標準統一、利於閱覽");
content.put("Purpose", "規範會議操作、提高會議質量");
content.put("Scope", "公司會議、部門之間業務協調會議");
content.put("customerName", "**有限公司");
content.put("address", "機場路2號");
content.put("userNo", "3021170207");
content.put("tradeName", "水泥製造");
content.put("price1", "1.085");
content.put("price2", "0.906");
content.put("price3", "0.433");
content.put("numPrice", "0.675");
content.put("company_name", "**有限公司");
content.put("company_address", "機場路2號");
changer.replaceBookMark(content);
//替換表格標籤
List<Map<String ,String>> content2 = new ArrayList<Map<String, String>>();
Map<String, String> table1 = new HashMap<String, String>();
table1.put("MONTH", "*月份");
table1.put("SALE_DEP", "75分");
table1.put("TECH_CENTER", "80分");
table1.put("CUSTOMER_SERVICE", "85分");
table1.put("HUMAN_RESOURCES", "90分");
table1.put("FINANCIAL", "95分");
table1.put("WORKSHOP", "80分");
table1.put("TOTAL", "85分");
for(int i = 0; i < 3; i++){
content2.add(table1);
}
changer.fillTableAtBookMark("Table" ,content2);
changer.fillTableAtBookMark("month", content2);
//表格中文字的替換
Map<String, String> table = new HashMap<String, String>();
table.put("CUSTOMER_NAME", "**有限公司");
table.put("ADDRESS", "機場路2號");
table.put("USER_NO", "3021170207");
table.put("tradeName", "水泥製造");
table.put("PRICE_1", "1.085");
table.put("PRICE_2", "0.906");
table.put("PRICE_3", "0.433");
table.put("NUM_PRICE", "0.675");
changer.replaceText(table,"Table2");
//儲存替換後的WORD
changer.saveAs();
System.out.println("time=="+(System.currentTimeMillis() - startTime));
}
這裡主要的區別就是,他使用的是poi3.9的,但是引用3.17的話就會報錯,有些方法進行了修改。
它的修改之後,我們有一些方法不能使用。需要引入新的包。我們可以在poi的官網上進行下載3.17的包
下載解壓後如下圖所示:
其中ooxml-lib就是之前沒有的或者說是修改後分離出來的。在專案中引用就可以了。
當然了,poi相關的也要新增進來。本人測試可行。