Lucene索引詳解（IndexWriter詳解、Document詳解、索引更新）

阿新 • • 發佈：2018-12-19

一、IndexWriter詳解

問題1：索引建立過程完成什麼事？

　　　　分詞、儲存到反向索引中

1. 回顧Lucene架構圖：

介紹我們編寫的應用程式要完成資料的收集，再將資料以document的形式用lucene的索引API建立索引、儲存。這裡重點要強調應用程式碼負責做什麼,lucene負責做什麼。

2. Lucene索引建立API 圖示

通過該圖介紹lucene建立索引的核心API:Document、IndexWriter

Lucene中要索引的文件、資料記錄以document表示，應用程式通過IndexWirter將Document加入到索引中。

3. Lucene索引建立程式碼示例

public static void main(String[] args) throws IOException {
// 建立使用的分詞器
Analyzer analyzer = new IKAnalyzer4Lucene7(true);
// 索引配置物件
IndexWriterConfig config = new IndexWriterConfig(analyzer);
// 設定索引庫的開啟模式：新建、追加、新建或追加
config.setOpenMode(OpenMode.CREATE_OR_APPEND);

// 索引存放目錄
// 存放到檔案系統中
Directory directory = FSDirectory
.open((new File("f:/test/indextest")).toPath());

// 存放到記憶體中
// Directory directory = new RAMDirectory();

// 建立索引寫物件
IndexWriter writer = new IndexWriter(directory, config);

// 建立document
Document doc = new Document();
// 往document中新增 商品id欄位
doc.add(new StoredField("prodId", "p0001"));

// 往document中新增 商品名稱欄位
String name = "ThinkPad X1 Carbon 20KH0009CD/25CD 超極本輕薄膝上型電腦聯想";
doc.add(new TextField("name", name, Store.YES));

// 將文件新增到索引
writer.addDocument(doc);

// .....

// 重新整理
writer.flush();

// 提交
writer.commit();

// 關閉 會提交
writer.close();
directory.close();
}

上面示例程式碼對應的類圖展示：

4. IndexWriterConfig 寫索引配置：

使用的分詞器，

如何開啟索引（是新建，還是追加）。

還可配置緩衝區大小、或快取多少個文件，再重新整理到儲存中。

還可配置合併、刪除等的策略。

注意：

用這個配置物件建立好IndexWriter物件後，再修改這個配置物件的配置資訊不會對IndexWriter物件起作用。

如要在indexWriter使用過程中修改它的配置資訊，通過 indexWriter的getConfig()方法獲得 LiveIndexWriterConfig 物件，在這個物件中可檢視該IndexWriter使用的配置資訊，可進行少量的配置修改（看它的setter方法）

5. Directory 指定索引資料存放的位置

記憶體

檔案系統

資料庫

儲存到檔案系統用法： Directory directory = FSDirectory.open(Path path); // path指定目錄

儲存到記憶體中用法：Directory directory = new RAMDirectory();

6. IndexWriter 用來建立、維護一個索引。它的API使用流程：

    // 建立索引寫物件
    IndexWriter writer = new IndexWriter(directory, config);

    // 建立document

    // 將文件新增到索引
    writer.addDocument(doc);
    
    // 刪除文件
    //writer.deleteDocuments(terms);
    
    //修改文件
    //writer.updateDocument(term, doc);

    // 重新整理
    writer.flush();

    // 提交
    writer.commit();

注意：IndexWriter是執行緒安全的。如果你的業務程式碼中有其他的同步控制，請不要使用IndexWriter作為鎖物件，以免死鎖。

IndexWriter涉及類圖示：

問題2：索引庫中會儲存反向索引資料，會儲存document嗎？

　　索引庫會儲存一下關鍵的document資訊

問：在百度、天貓上進行搜尋，展示的列表中的資料來自哪裡？源DB、FS 嗎？

　　存在索引庫裡

二、Document詳解

1. Document 文件

要索引的資料記錄、文件在lucene中的表示，是索引、搜尋的基本單元。一個Document由多個欄位Field構成。就像資料庫的記錄-欄位。

IndexWriter按加入的順序為Document指定一個遞增的id（從0開始），稱為文件id。反向索引中儲存的是這個id，文件儲存中正向索引也是這個id。業務資料的主鍵id只是文件的一個欄位。

Document API

2. Field

欄位：由欄位名name、欄位值value（fieldsData）、欄位型別 type 三部分構成。

欄位值可以是文字（String、Reader 或預分析的 TokenStream）、二進位制值（byte[]）或數值。

IndexableField Field API

3. Document—Field 資料舉例

新聞：新聞id，新聞標題、新聞內容、作者、所屬分類、發表時間

網頁搜尋的網頁：標題、內容、連結地址

商品： id、名稱、圖片連結、類別、價格、庫存、商家、品牌、月銷量、詳情…

問題1：我們收集資料建立document物件來為其建立索引，資料的所有屬性是否都需要加入到document中？如資料庫表中的資料記錄的所有欄位是否都需要放到document中？哪些欄位應加入到document中？

　　看具體的業務，只有需要被搜尋和展示的欄位才需要被加入到document中

問題2：是不是所有加入的欄位都需要進行索引？是不是所有加入的欄位都要儲存到索引庫中？什麼樣的欄位該被索引？什麼樣的欄位該被儲存？

　　看具體的業務，需要被搜尋的欄位才該被索引，需要被展示的欄位該被儲存

問題3：各種要被索引的欄位該以什麼樣的方式進行索引，全都是分詞進行索引，還是有不同區別？

　　看是模糊查詢還是精確查詢，模糊查詢的話就需要被分詞索引，精確查詢的話就不需要被分詞索引

4. IndexableFieldType

欄位型別：描述該如何索引儲存該欄位。

欄位可選擇性地儲存在索引中，這樣在搜尋結果中，這些儲存的欄位值就可獲得。

一個Document應該包含一個或多個儲存欄位來唯一標識一個文件。為什麼？

　　為從原資料中拿完整資料去展示

5. Document 類關係

IndexableFieldType API 說明

6. IndexOptions 索引選項說明：

NONE：Not indexed 不索引

DOCS: 反向索引中只儲存了包含該詞的文件id，沒有詞頻、位置

DOCS_AND_FREQS: 反向索引中會儲存文件id、詞頻

DOCS_AND_FREQS_AND_POSITIONS:反向索引中儲存文件id、詞頻、位置

DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS :反向索引中儲存文件id、詞頻、位置、偏移量

package com.study.lucene.indexdetail;

import java.io.File;
import java.io.IOException;

import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.document.FieldType;
import org.apache.lucene.index.IndexOptions;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;

import com.study.lucene.ikanalyzer.Integrated.IKAnalyzer4Lucene7;

/**
 * 索引選項選擇
 * 
 * @author THINKPAD
 *
 */
public class IndexOptionsDemo {

    public static void main(String[] args) {
        // 建立使用的分詞器
        Analyzer analyzer = new IKAnalyzer4Lucene7(true);

        // 索引配置物件
        IndexWriterConfig config = new IndexWriterConfig(analyzer);

        try ( // 索引存放到檔案系統中
                Directory directory = FSDirectory.open((new File("f:/test/indextest")).toPath());

                // 建立索引寫物件
                IndexWriter writer = new IndexWriter(directory, config);) {

            // 準備document
            Document doc = new Document();
            // 欄位content
            String name = "content";
            String value = "張三說的確實在理";
            FieldType type = new FieldType();
            // 設定是否儲存該欄位
            type.setStored(true); // 請試試不儲存的結果
            // 設定是否對該欄位分詞
            type.setTokenized(true); // 請試試不分詞的結果
            // 設定該欄位的索引選項
            type.setIndexOptions(IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS); // 請嘗試不同的選項的效果
            type.freeze(); // 使不可更改

            Field field = new Field(name, value, type);
            // 新增欄位
            doc.add(field);
            // 加入到索引中
            writer.addDocument(doc);

        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}

問題4：如果要在搜尋結果中做關鍵字高亮，需要什麼資訊？如果要實現短語查詢、臨近查詢（跨度查詢），需要什麼資訊？

如要搜尋包含“張三” “李四”，且兩詞之間跨度不超過5個字元。

　　需要位置和偏移量

問題5：位置、偏移資料在反向索引中佔的儲存量佔比大不大？

　　看分詞的資料量

問題6：如果某個欄位不需要進行短語查詢、臨近查詢，那麼在反向索引中就不需要儲存位置、偏移資料。這樣是不是可以降低反向索引的資料量，提升效率？但是如果該欄位要做高亮顯示支援，該怎麼辦？。

　　為了提升反向索引的效率，這樣的欄位的位置、偏移資料是不應該儲存到反向索引中的。這也你前面看到 IndexOptions為什麼有那些選項的原因。

　　一個欄位分詞器分詞後，每個詞項會得到一系列屬性資訊，如出現頻率、位置、偏移量等，這些資訊構成一個詞項向量 termVectors

7. IndexableFieldType API

storeTermVectors：

　　對於不需要在搜尋反向索引時用到，但在搜尋結果處理時需要的位置、偏移量、附加資料(payLoad) 的欄位，我們可以單獨為該欄位儲存（文件id詞項向量）的正向索引。

示例程式碼：

package com.study.lucene.indexdetail;

import java.io.File;
import java.io.IOException;

import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.document.FieldType;
import org.apache.lucene.index.IndexOptions;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;

import com.study.lucene.ikanalyzer.Integrated.IKAnalyzer4Lucene7;

/**
 * 詞向向量
 * @author THINKPAD
 *
 */
public class IndexTermVectorsDemo {

    public static void main(String[] args) {
        // 建立使用的分詞器
        Analyzer analyzer = new IKAnalyzer4Lucene7(true);

        // 索引配置物件
        IndexWriterConfig config = new IndexWriterConfig(analyzer);

        try ( // 索引存放到檔案系統中
                Directory directory = FSDirectory
                        .open((new File("f:/test/indextest")).toPath());

                // 建立索引寫物件
                IndexWriter writer = new IndexWriter(directory, config);) {

            // 準備document
            Document doc = new Document();
            // 欄位content
            String name = "content";
            String value = "張三說的確實在理";
            FieldType type = new FieldType();
            // 設定是否儲存該欄位
            type.setStored(true); // 請試試不儲存的結果
            // 設定是否對該欄位分詞
            type.setTokenized(true); // 請試試不分詞的結果
            // 設定該欄位的索引選項
            type.setIndexOptions(IndexOptions.DOCS); // 反向索引中只儲存詞項

            // 設定為該欄位儲存詞項向量
            type.setStoreTermVectors(true);
            type.setStoreTermVectorPositions(true);
            type.setStoreTermVectorOffsets(true);
            type.setStoreTermVectorPayloads(true);

            type.freeze(); // 使不可更改

            Field field = new Field(name, value, type);
            // 新增欄位
            doc.add(field);
            // 加入到索引中
            writer.addDocument(doc);

        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}

請為商品記錄建立索引，欄位資訊如下：

商品id：字串，不索引、但儲存

　　String prodId = "p0001";

商品名稱：字串，分詞索引(儲存詞頻、位置、偏移量)、儲存

　　String name = “ThinkPad X1 Carbon 20KH0009CD/25CD 超極本輕薄膝上型電腦";

圖片連結：僅儲存

　　 String imgUrl = "http://www.cnblogs.com/leeSmall/";

商品簡介：字串，分詞索引（不需要支援短語、臨近查詢）、儲存，結果中支援高亮顯示

　　String simpleIntro = "整合顯示卡英特爾酷睿 i5-8250U 14英寸";

品牌：字串，不分詞索引，儲存

　　String brand = "ThinkPad";

package com.study.lucene.indexdetail;

import java.io.File;
import java.io.IOException;

import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.document.FieldType;
import org.apache.lucene.index.IndexOptions;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;
import org.apache.lucene.util.NumericUtils;

import com.study.lucene.ikanalyzer.Integrated.IKAnalyzer4Lucene7;

/**
 * 為商品記錄建立索引
 * @author THINKPAD
 *
 */
public class ProductIndexExercise {

    public static void main(String[] args) {
        // 建立使用的分詞器
        Analyzer analyzer = new IKAnalyzer4Lucene7(true);

        // 索引配置物件
        IndexWriterConfig config = new IndexWriterConfig(analyzer);

        try (
                // 索引存放目錄
                // 存放到檔案系統中
                Directory directory = FSDirectory
                        .open((new File("f:/test/indextest")).toPath());

                // 存放到記憶體中
                // Directory directory = new RAMDirectory();

                // 建立索引寫物件
                IndexWriter writer = new IndexWriter(directory, config);) {

            // 準備document
            Document doc = new Document();
            // 商品id：字串，不索引、但儲存
            String prodId = "p0001";
            FieldType onlyStoredType = new FieldType();
            onlyStoredType.setTokenized(false);
            onlyStoredType.setIndexOptions(IndexOptions.NONE);
            onlyStoredType.setStored(true);
            onlyStoredType.freeze();
            doc.add(new Field("prodId", prodId, onlyStoredType));

            // 商品名稱：字串，分詞索引(儲存詞頻、位置、偏移量)、儲存
            String name = "ThinkPad X1 Carbon 20KH0009CD/25CD 超極本輕薄膝上型電腦聯想";
            FieldType indexedAllStoredType = new FieldType();
            indexedAllStoredType.setStored(true);
            indexedAllStoredType.setTokenized(true);
            indexedAllStoredType.setIndexOptions(
                    IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS);
            indexedAllStoredType.freeze();
            doc.add(new Field("name", name, indexedAllStoredType));

            // 圖片連結：僅儲存
            String imgUrl = "http://www.cnblogs.com/leeSmall/";
            doc.add(new Field("imgUrl", imgUrl, onlyStoredType));

            // 商品簡介：文字，分詞索引（不需要支援短語、臨近查詢）、儲存，結果中支援高亮顯示
            String simpleIntro = "整合顯示卡 英特爾 酷睿 i5-8250U 14英寸";
            FieldType indexedTermVectorsStoredType = new FieldType();
            indexedTermVectorsStoredType.setStored(true);
            indexedTermVectorsStoredType.setTokenized(true);
            indexedTermVectorsStoredType
                    .setIndexOptions(IndexOptions.DOCS_AND_FREQS);
            indexedTermVectorsStoredType.setStoreTermVectors(true);
            indexedTermVectorsStoredType.setStoreTermVectorPositions(true);
            indexedTermVectorsStoredType.setStoreTermVectorOffsets(true);
            indexedTermVectorsStoredType.freeze();

            doc.add(new Field("simpleIntro", simpleIntro,
                    indexedTermVectorsStoredType));

            // 價格，整數，單位分，不索引、儲存
            int price = 2999900;
            // Field 類有整數型別值的構造方法嗎？
            // 用位元組陣列來儲存試試，還是轉為字串？
            byte[] result = new byte[Integer.BYTES];
            NumericUtils.intToSortableBytes(price, result, 0);

            doc.add(new Field("price", result, onlyStoredType));

            writer.addDocument(doc);

        } catch (IOException e) {
            e.printStackTrace();
        }

    }

}

問題7 ：我們往往需要對搜尋的結果支援按不同的欄位進行排序，如商品搜尋結果按價格排序、按銷量排序等。以及對搜尋結果進行按某欄位分組統計，如按品牌統計。

儲存的文件資料中（文件是行式儲存）就得把搜到的所有文件載入到記憶體中，來獲取價格，再按價格排序。如果搜到的文件列表量很大，會有什麼問題沒？費記憶體效率低我們往往對結果列表是分頁處理，並不需要把所有文件資料載入。

　　空間換時間：對這種需要排序、分組、聚合的欄位，為其建立獨立的文件->欄位值的正向索引、列式儲存。這樣我們要載入搜中文件的這個欄位的資料就快很多，耗記憶體少。

　　IndexableFieldType 中的 docValuesType方法就是讓你來為需要排序、分組、聚合的欄位指定如何為該欄位建立文件->欄位值的正向索引的。

DocValuesType 選項說明:

NONE 不開啟docvalue

NUMERIC 單值、數值欄位，用這個

BINARY 單值、位元組陣列欄位用

SORTED 單值、字元欄位用，會預先對值位元組進行排序、去重儲存

SORTED_NUMERIC 單值、數值陣列欄位用，會預先對數值陣列進行排序

SORTED_SET 多值欄位用，會預先對值位元組進行排序、去重儲存

具體使用選擇：

字串+單值會選擇SORTED作為docvalue儲存

字串+多值會選擇SORTED_SET作為docvalue儲存

數值或日期或列舉欄位+單值會選擇NUMERIC作為docvalue儲存

數值或日期或列舉欄位+多值會選擇SORTED_SET作為docvalue儲存

注意：需要排序、分組、聚合、分類查詢（面查詢）的欄位才建立docValues

8. 擴充套件整型Field

　　通過檢視Filed的構造方法，發現裡面沒有設定整型數值的方法，所以需要我們自己來擴充套件

擴充套件的方法如下：

1. 擴充套件Field，提供構造方法傳入數值型別值，賦給欄位值欄位；

2. 改寫binaryValue() 方法，返回數值的位元組引用。

package com.study.lucene.indexdetail.extendfield;

import org.apache.lucene.document.Field;
import org.apache.lucene.document.FieldType;
import org.apache.lucene.util.BytesRef;
import org.apache.lucene.util.NumericUtils;

/**
 * 
 * @Description: 擴充套件整型Field
 * @author liguangsheng
 * @date 2018年5月11日
 *
 */
public class ExtendIntField extends Field {
    public ExtendIntField(String fieldName, int value, FieldType type) {
        super(fieldName, type);
        this.fieldsData = Integer.valueOf(value);
    }

    @Override
    public BytesRef binaryValue() {
        byte[] bs = new byte[Integer.BYTES];
        NumericUtils.intToSortableBytes((Integer) this.fieldsData, bs, 0);
        return new BytesRef(bs);
    }
}

9. Lucene預定義的欄位子類

package com.study.lucene.indexdetail;

import java.io.File;
import java.io.IOException;

import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.document.Field.Store;
import org.apache.lucene.document.FieldType;
import org.apache.lucene.document.NumericDocValuesField;
import org.apache.lucene.document.SortedDocValuesField;
import org.apache.lucene.document.StringField;
import org.apache.lucene.index.DocValuesType;
import org.apache.lucene.index.IndexOptions;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;
import org.apache.lucene.util.BytesRef;
import org.apache.lucene.util.NumericUtils;

import com.study.lucene.ikanalyzer.Integrated.IKAnalyzer4Lucene7;

/**
 * 索引的建立
 * 
 * @author THINKPAD
 *
 */
public class IndexWriteDemo {

    public static void main(String[] args) {
        // 建立使用的分詞器
        Analyzer analyzer = new IKAnalyzer4Lucene7(true);

        // 索引配置物件
        IndexWriterConfig config = new IndexWriterConfig(analyzer);

        try (
                // 索引存放目錄
                // 存放到檔案系統中
                Directory directory = FSDirectory.open((new File("f:/test/indextest")).toPath());

                // 存放到記憶體中
                // Directory directory = new RAMDirectory();

                // 建立索引寫物件
                IndexWriter writer = new IndexWriter(directory, config);) {

            // 準備document
            Document doc = new Document();
            // 商品id：字串，不索引、但儲存
            String prodId = "p0001";
            FieldType onlyStoredType = new FieldType();
            onlyStoredType.setTokenized(false);
            onlyStoredType.setIndexOptions(IndexOptions.NONE);
            onlyStoredType.setStored(true);
            onlyStoredType.freeze();
            doc.add(new Field("prodId", prodId, onlyStoredType));

            // 等同下一行
            // doc.add(new StoredField("prodId", prodId));

            // 商品名稱：字串，分詞索引(儲存詞頻、位置、偏移量)、儲存
            String name = "ThinkPad X1 Carbon 20KH0009CD/25CD 超極本輕薄膝上型電腦聯想";
            FieldType indexedAllStoredType = new FieldType();
            indexedAllStoredType.setStored(true);
            indexedAllStoredType.setTokenized(true);
            indexedAllStoredType.setIndexOptions(IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS);
            indexedAllStoredType.freeze();

            doc.add(new Field("name", name, indexedAllStoredType));

            // 圖片連結：僅儲存
            String imgUrl = "http://www.cnblogs.com/aaa";
            doc.add(new Field("imgUrl", imgUrl, onlyStoredType));

            // 商品簡介：文字，分詞索引（不需要支援短語、臨近查詢）、儲存，結果中支援高亮顯示
            String simpleIntro = "整合顯示卡 英特爾 酷睿 i5-8250U 14英寸";
            FieldType indexedTermVectorsStoredType = new FieldType();
            indexedTermVectorsStoredType.setStored(true);
            indexedTermVectorsStoredType.setTokenized(true);
            indexedTermVectorsStoredType.setIndexOptions(IndexOptions.DOCS_AND_FREQS);
            indexedTermVectorsStoredType.setStoreTermVectors(true);
            indexedTermVectorsStoredType.setStoreTermVectorPositions(true);
            indexedTermVectorsStoredType.setStoreTermVectorOffsets(true);
            indexedTermVectorsStoredType.freeze();

            doc.add(new Field("simpleIntro", simpleIntro, indexedTermVectorsStoredType));

            // 價格，整數，單位分，不索引、儲存、要支援排序
            int price = 999900;
            FieldType numericDocValuesType = new FieldType();
            numericDocValuesType.setTokenized(false);
            numericDocValuesType.setIndexOptions(IndexOptions.NONE);
            numericDocValuesType.setStored(true);
            numericDocValuesType.setDocValuesType(DocValuesType.NUMERIC);
            numericDocValuesType.setDimensions(1, Integer.BYTES);
            numericDocValuesType.freeze();

            doc.add(new MyIntField("price", price, numericDocValuesType));

            // 與下兩行等同
            // doc.add(new StoredField("price", price));
            // doc.add(new NumericDocValuesField("price", price));

            // 類別：字串，索引不分詞，不儲存、支援分類統計,多值
            FieldType indexedDocValuesType = new FieldType();
            indexedDocValuesType.setTokenized(false);
            indexedDocValuesType.setIndexOptions(IndexOptions.DOCS);
            indexedDocValuesType.setDocValuesType(DocValuesType.SORTED_SET);
            indexedDocValuesType.freeze();

            doc.add(new Field("type", "電腦", indexedDocValuesType) {
                @Override
                public BytesRef binaryValue() {
                    return new BytesRef((String) this.fieldsData);
                }
            });
            doc.add(new Field("type", "膝上型電腦", indexedDocValuesType) {
                @Override
                public BytesRef binaryValue() {
                    return new BytesRef((String) this.fieldsData);
                }
            });

            // 等同下四行
            // doc.add(new StringField("type", "電腦", Store.NO));
            // doc.add(new SortedSetDocValuesField("type", new BytesRef("電腦")));
            // doc.add(new StringField("type", "膝上型電腦", Store.NO));
            // doc.add(new SortedSetDocValuesField("type", new
            // BytesRef("膝上型電腦")));

            // 商家 索引(不分詞)，儲存、按面（分類）查詢
            String fieldName = "shop";
            String value = "聯想官方旗艦店";
            doc.add(new StringField(fieldName, value, Store.YES));
            doc.add(new SortedDocValuesField(fieldName, new BytesRef(value)));

            // 上架時間：數值，排序需要
            long upShelfTime = System.currentTimeMillis();
            doc.add(new NumericDocValuesField("upShelfTime", upShelfTime));

            writer.addDocument(doc);

        } catch (IOException e) {
            e.printStackTrace();
        }

    }

    public static class MyIntField extends Field {

        public MyIntField(String fieldName, int value, FieldType type) {
            super(fieldName, type);
            this.fieldsData = Integer.valueOf(value);
        }

        @Override
        public BytesRef binaryValue() {
            byte[] bs = new byte[Integer.BYTES];
            NumericUtils.intToSortableBytes((Integer) this.fieldsData, bs, 0);
            return new BytesRef(bs);
        }
    }

}

三、索引更新

IndexWriter 索引更新 API

說明：

Term 詞項指定欄位的詞項

刪除流程：根據Term、Query找到相關的文件id、同時刪除索引資訊，再根據文件id刪除對應的文件儲存。

更新流程：先刪除、再加入新的doc

注意：只可根據索引的欄位進行更新。

package com.study.lucene.indexdetail;

import java.io.File;
import java.io.IOException;

import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.document.FieldType;
import org.apache.lucene.index.IndexOptions;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.index.Term;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;

import com.study.lucene.ikanalyzer.Integrated.IKAnalyzer4Lucene7;

/**
 * @Description: 索引更新
 * @author liguangsheng
 * @date 2018年5月11日
 *
 */

public class IndexUpdateDemo {

    public static void main(String[] args) {
        // 建立使用的分詞器
        Analyzer analyzer = new IKAnalyzer4Lucene7(true);

        // 索引配置物件
        IndexWriterConfig config = new IndexWriterConfig(analyzer);

        try (
                // 索引存放目錄
                // 存放到檔案系統中
                Directory directory = FSDirectory.open((new File("f:/test/indextest")).toPath());

                // 存放到記憶體中
                // Directory directory = new RAMDirectory();

                // 建立索引寫物件
                IndexWriter writer = new IndexWriter(directory, config);) {

            // Term term = new Term("prodId", "p0001");
            Term term = new Term("type", "膝上型電腦");

            // 準備document
            Document doc = new Document();
            // 商品id：字串，不索引、但儲存
            String prodId = "p0003";
            FieldType onlyStoredType = new FieldType();
            onlyStoredType.setTokenized(false);
            onlyStoredType.setIndexOptions(IndexOptions.NONE);
            onlyStoredType.setStored(true);
            onlyStoredType.freeze();
            doc.add(new Field("prodId", prodId, onlyStoredType));

            writer.updateDocument(term, doc);

            // Term term = new Term("name", "膝上型電腦");
            // writer.deleteDocuments(term);

            writer.flush();

            writer.commit();
            System.out.println("執行更新完畢。");

        } catch (IOException e) {
            e.printStackTrace();
        }

    }
}

原始碼獲取地址：

https://github.com/leeSmall/SearchEngineDemo

Lucene索引詳解（IndexWriter詳解、Document詳解、索引更新）

一、IndexWriter詳解

二、Document詳解

三、索引更新

搜索引擎系列五：Lucene索引詳解（IndexWriter詳解、Document詳解、索引更新）

Lucene索引詳解（IndexWriter詳解、Document詳解、索引更新）

Array.prototype.slice.call()方法詳解（調用方法中的參數截取出來）

python3多線程應用詳解（第四卷：圖解多線程中LOCK）

LVS原理詳解（3種工作方式8種調度算法）--老男孩

volatile詳解（轉載只是為了查閱方便，侵權，立刪）

malloc和free函式詳解（轉載只是為了查閱方便，若侵權立刪）

C/C++堆、棧及靜態資料區詳解（轉載只是為了查閱方便，若侵權立刪）

深度學習 --- 隨機神經網路詳解（玻爾茲曼機學習演算法、執行演算法）

curl -l的資訊詳解（這樣還看不懂，就沒天理了）

SIFT演算法詳解（這篇對演算法講解的還是相當清楚的）

SPARQL查詢語句詳解（不懂的可以樓下評論，看到秒回）

Jenkins+Ant+Android+Robitium 例項詳解（打包app，執行Robotium測試，生成測試結果）

prim演算法基礎詳解（無向賦權圖的最小生成樹MST）

【CV學習筆記】———— 基本圖片處理知識（此坑還未填完，不定期更新）

2018年Java面試彙總（還有最近一些面試問題未總結，下次更新）

【CV學習筆記】———— 基本圖片處理知識（此坑還未填完，不定期更新）

RGB格式詳解（二）--索引格式

elasticsearch系列三：索引詳解（分詞器、文檔管理、路由詳解）

elasticsearch 索引儲存深入詳解（Elasticsearch教程03）|MVP講堂

Lucene索引詳解（IndexWriter詳解、Document詳解、索引更新）

一、IndexWriter詳解

二、Document詳解

三、索引更新

相關推薦