Atitit lucence 使用總結目錄 1. 基本概念 1 1.1. Index:索引庫，文件的集合組成索引。 1 2. 建立索引 2 2.1. Api查詢 2 2.2. Dsl查詢 3

阿新 • • 發佈：2018-12-07

Atitit lucence 使用總結

1. 基本概念 1

1.1. Index:索引庫，文件的集合組成索引。 1

2. 建立索引 2

2.1. Api查詢 2

2.2. Dsl查詢 3

Merge branch 'master' of https://gitee.com/attilax/FulltxtLucenePrj

# Conflicts:

# bin4/com/attilax/archive/bitUtil.class

# bin4/fulltxt/luceneUtil$1.class

# bin4/fulltxt/luceneUtil.class

# src/com/attilax/archive/bitUtil.java

基本概念
1. Index:索引庫，文件的集合組成索引。

和一般的資料庫不一樣，Lucene不支援定義主鍵，在Lucene中不存在一個叫做Index的類，通過IndexWriter來寫索引，通過IndexReader來讀索引。索引庫在物理形式上一般是位於一個路徑下的一系列檔案

2、分析器：一段有意義的文字需要通過Analyzer分析器分割成一個個詞語後才能按關鍵字搜尋，StandardAnalyzer是Lucene中最常用的分析器。為了達到更好的搜尋效果，不同的語言可以使用不同的搜尋器（如CnAnalyzer是一個主要處理中文的分析器）。

3、 Analyzer返回的結果是一串Token，Token包含一個代表詞本身含義的字串和該詞在文章中相應的起止偏移位置，Token還包含一個用來儲存詞型別的字串。

4、一個Document代表索引庫中的一條記錄，也叫做文件。要搜尋的資訊封裝成Document後通過IndexWriter寫入索引庫，呼叫Searcher介面按關鍵詞搜尋後，返回的也是一個封裝後的Document列表。

---------------------

5、一個Document可以包含多個列，叫做Field。例如一篇文章可以包含“標題”、“正文”、“修改時間”等Field。建立這些列物件以後，可通過Document的add方法增加這些列。與一般資料庫不同，一個文件的一個列可以有多個值，例如一篇文件既可以術語網際網路類，又可以屬於科技類。

6、 Term是搜尋語法的最小單位，複雜的搜尋語法會分解成一個Term查詢，他表示文件的一個詞語，Term由兩部分組成：它表示的詞語和這個詞語所出現的Field。

---------------------

D:\0workspace\FulltxtLucenePrj

/FulltxtLucenePrj/src/fulltxt/luceneUtil.java

建立索引

Directory directory_index = FSDirectory.open(Paths.get(indexDir));

// 建立索引 writer

IndexWriter IndexWriter = new IndexWriter(directory_index, new IndexWriterConfig( new IKAnalyzer()));

Document doc = new Document();

doc.add(new TextField("f", f.getName(), org.apache.lucene.document.Field.Store.YES));

doc.add(new TextField("f_fullpath", f.getAbsolutePath(), org.apache.lucene.document.Field.Store.YES));

doc.add(new TextField("txt", t, org.apache.lucene.document.Field.Store.NO));

String r = String.valueOf(IndexWriter.addDocument(doc));

1. Api查詢

dir = FSDirectory.open(Paths.get(indexDir));

IndexReader reader = DirectoryReader.open(dir);

return new IndexSearcher(reader);

* String expressStr="+txt:webdav +txt:ftp";

public void Search(IndexSearcher IndexSearcher1, String expressStr,Consumer consumer1) throws ParseException, IOException {

int count = 500;

// String searchField = "txt";

// String kws = " webdav 編碼艾提拉";

String fieldName = "txt";

QueryParser QueryParser1 = new QueryParser(fieldName, new SimpleAnalyzer());

Query query = QueryParser1.parse(expressStr );

TopDocs topDocs = IndexSearcher1.search(query, count);//limit 5 count

for (ScoreDoc scoreDoc : topDocs.scoreDocs) {

Document document = IndexSearcher1.doc(scoreDoc.doc);

consumer1.accept(document);

}

1. Dsl查詢

public void testSearch() throws Exception {

luceneUtil luceneUtil = new luceneUtil();

System.out.println( JSON.toJSONString( luceneUtil.Search()));

String indexDir = "./articles522/";

IndexSearcher IndexSearcher1= luceneUtil.getIndexSearcher1(indexDir);

List li= luceneUtil.select("f,f_fullpath,txt").from(IndexSearcher1).where( $( fld("txt").contain("webdav") ).and( fld("txt").contain("資料庫") ).build() ).exec();

System.out.println( JSON.toJSONString(li));

Atitit lucence 使用總結目錄 1. 基本概念 1 1.1. Index:索引庫，文件的集合組成索引。 1 2. 建立索引 2 2.1. Api查詢 2 2.2. Dsl查詢 3

Atitit lucence 使用總結目錄 1. 基本概念 1 1.1. Index:索引庫，文件的集合組成索引。 1 2. 建立索引 2 2.1. Api查詢 2 2.2. Dsl查詢 3

javascript跳躍式前進(1) - 基本概念

TCP/IP詳解學習筆記(1)-基本概念【轉】

【C++復習】1.1基本概念

Structured Streaming教程(1) —— 基本概念與使用

無線通信學習筆記1---基本概念篇

資料結構（C語言版）讀書筆記1(基本概念和術語)

[仁潤雲技術團隊]併發程式設計-(1)基本概念

區塊鏈開發(1)基本概念

【雙目測距】1 基本概念及術語

二叉樹系列(1)——基本概念和遍歷

AVL樹、B樹、B+樹（1-基本概念）

機器學習筆記1-基本概念

Apache Geode/GemFire入門(1)-基本概念和模組

OpenStack Keystone (1) : 基本概念及其組織結構關係與案例分析

Spring學習1--基本概念

機器學習-1 基本概念

計算機網路聽課筆記1——基本概念與OSI模型

[WCF Transaction] 1. 基本概念

TCP/IP詳解學習筆記(1)-基本概念

Atitit lucence 使用總結 目錄 1. 基本概念 1 1.1. Index:索引庫，文件的集合組成索引。 1 2. 建立索引 2 2.1. Api查詢 2 2.2. Dsl查詢 3

相關推薦

Atitit lucence 使用總結目錄 1. 基本概念 1 1.1. Index:索引庫，文件的集合組成索引。 1 2. 建立索引 2 2.1. Api查詢 2 2.2. Dsl查詢 3