lucene3.6.0 經典案例入門教程

阿新 • • 發佈：2019-01-30

第一步：下載並匯入lucene的核心包(注意版本問題):
例如Lucene3.6版本：將lucene-core-3.6.0.jar拷貝到專案的libs 資料夾裡。
例如Lucene4.6版本：將lucene-analyzers-common-4.6.0.jar、lucene-core-4.6.0.jar、lucene-queryparser-4.6.0.jar拷貝到專案的libs 資料夾裡。
使用右擊專案名--Build Path--Configure Build Path…--Add External JARs… 選擇相關jar包，並儲存。

第二步：建立資料夾：
在C盤下建立存放待索引的檔案(C:\source)，例如建立兩個檔案，名稱為 test1.txt, test2.txt。
test1.txt檔案內容為：歡迎來到絕對秋香的部落格。
test2.txt檔案內容為：絕對秋香引領你走向潮流。
在C盤下再建立存放索引的檔案 (C:\index)

第三步，建立索引類 FileIndexWriter

/*建立索引類 TextFileIndexer
*
*第一步：下載並匯入lucene的核心包(注意版本問題):
* 例如Lucene3.6版本：將lucene-core-3.6.0.jar拷貝到專案的libs 資料夾裡。
* 例如Lucene4.6版本：將lucene-analyzers-common-4.6.0.jar、lucene-core-4.6.0.jar、lucene-queryparser-4.6.0.jar拷貝到專案的libs 資料夾裡。
* 使用右擊專案名--Build Path--Configure Build Path…--Add External JARs… 選擇相關jar包，並儲存。
* 
*第二步：建立資料夾：
* 在C盤下建立存放待索引的檔案(C:\source)，例如建立兩個檔案，名稱為 test1.txt, test2.txt。
* test1.txt檔案內容為：歡迎來到絕對秋香的部落格。
* test2.txt檔案內容為：絕對秋香引領你走向潮流。
* 在C盤下再建立存放索引的檔案 (C:\index)
* 
*第三步，建立索引類 FileIndexWriter
* 
*第四步，建立測試類FileIndexReader，輸出測試結果
* */
import java.io.BufferedReader;
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStreamReader;
import java.util.Date;

import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.index.IndexWriterConfig.OpenMode;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;
import org.apache.lucene.util.Version;

public class FileIndexWriter {
	public static void main(String[] args) throws Exception {
        /* 指明要索引資料夾的位置,可以根據自己的需要進行設定。這裡是C盤的source資料夾下 */
        File fileDir = new File("C:\\source");    
        /* 這裡放索引檔案的位置 */    
        File indexDir = new File("C:\\index");
    //1、建立Directory
        Directory dir = FSDirectory.open(indexDir); //索引檔案的儲存位置
        
   //2、建立Analyzer
        //分析器，選擇“標準分析器”,引數代表當前使用的Lucene版本(jar包),也有人寫Version.LUCENE_CURRENT
        //StandardAnalyzer可以做如下功能: 
        //1、對原有句子按照空格進行了分詞 ;
        //2、所有的大寫字母都可以能轉換為小寫的字母 ;
        //3、可以去掉一些沒有用處的單詞，例如"is","the","are"等單詞，也刪除了所有的標點。
        Analyzer luceneAnalyzer=new StandardAnalyzer(Version.LUCENE_CURRENT);

   //3、建立IndexWriter
        //建立IndexWriter,索引的核心元件。在Lucene3.X版本後，在建立時需要用到IndexWriterConfig配置類對IndexWriter的配置。
        //第一個引數是目前的版本，第二個引數是詞法分析器Analyzer。
        IndexWriterConfig iwc = new IndexWriterConfig(
        			Version.LUCENE_CURRENT, luceneAnalyzer); 
        //setOpenMode，設定存放索引的資料夾將以覆蓋或者新建的方式建立，有下面幾個選項：
        //APPEND：總是追加，可能會導致錯誤，索引還會重複，導致返回多次結果;
        //CREATE：清空重建（推薦）;
        //CREATE_OR_APPEND【預設】：建立或追加。
        iwc.setOpenMode(OpenMode.CREATE);
        IndexWriter indexWriter = new IndexWriter(dir,iwc);  
        
        File[] textFiles = fileDir.listFiles();    
        long startTime = new Date().getTime();    //用於計算時間
        
        //增加document(txt格式)到索引去    
        for (int i=0; i<textFiles.length; i++) {    
            if (textFiles[i].isFile()&&textFiles[i].getName().endsWith(".txt")){    
                System.out.println("File "+textFiles[i].getCanonicalPath()+"正在被索引..");    
                String temp = FileReaderAll(textFiles[i].getCanonicalPath(), "GBK");    
                System.out.println(temp);    			//列印檔案的內容
            	//申請一個document物件，代表一些域Field的集合。這個類似於資料庫中的表。
                //Document相當於一個要進行索引的單元，任何可以想要被索引的檔案都必須轉化為Document物件才能進行索引。
   //4、建立Document物件
                Document document = new Document();	
                //Field：欄位,文件中的一個具體的域，如文件建立時間,作者,內容等。
                //其組成部分包括type(域的型別),name(域的名稱),fieldsData(域的值),boost(加強因子).
                //Field.Store.YES:儲存域值，適用於顯示搜尋結果的欄位 — 例如，檔案路徑和 URL。;
                //Field.Index.NO:使對應的域值不被搜尋，適用於未搜尋的欄位 — 僅用於儲存欄位，比如檔案路徑。
                //Field.Index.ANALYZED 用於欄位索引和分析 — 例如，電子郵件訊息正文和標題。 
                Field FieldPath = new Field("path", textFiles[i].getPath(),
                	Field.Store.YES, Field.Index.NO);   //路徑 
                Field FieldBody = new Field("content", temp, Field.Store.YES,
                	Field.Index.ANALYZED, Field.TermVector.WITH_POSITIONS_OFFSETS); //內容
   //5、為Document新增Field 
                document.add(FieldPath);    
                document.add(FieldBody);  
   //6、通過IndexWriter新增文件到索引中 
                indexWriter.addDocument(document);		//把doc物件加入到索引建立中。
            }    
        }    
        indexWriter.close();    //關閉IndexWriter,提交建立內容。
            
        //測試一下索引的時間    
        long endTime = new Date().getTime();    
        System.out.println("建立目錄"+ fileDir.getPath()+" 下所有文件的索引，總共消耗 "
        		+(endTime-startTime) + " 毫秒!" );    
    }    
    
    public static String FileReaderAll(String FileName, String charset)
    				throws IOException{    
        BufferedReader reader = new BufferedReader(
        	new InputStreamReader(new FileInputStream(FileName), charset));
        String line = new String();
        String temp = new String();
            
        while ((line = reader.readLine()) != null) {
            temp += line;
        }
        reader.close();
        return temp;
    }

}

執行結果：1.

FileIndexWriterFile C:\source\test1.txt正在被索引....歡迎來到絕對秋香的部落格。

File C:\source\test2.txt正在被索引....絕對秋香引領你走向潮流。

建立目錄C:\source 下所有文件的索引，總共消耗 465 毫秒!

第四步，建立測試類FileIndexReader，輸出測試結果

package com.newlearner.lucene;
/*建立測試類TestQuery，輸出測試結果
 * */
import java.io.File;
import java.io.IOException;

import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.queryparser.classic.ParseException;
import org.apache.lucene.queryparser.classic.QueryParser;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.TopDocs;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;
import org.apache.lucene.util.Version;

public class FileIndexReader {
	public static void main(String[] args) throws IOException, ParseException{
        //搜尋的索引路徑，lucene提供了兩種索引存放的位置，
        //一種是磁碟(FSDirectory)，一種是記憶體(RAMDirectory)。一般情況將索引放在磁碟上
    	String index = "C:\\index";
   //1.建立Directory，根據IndexReader建立IndexSearcher(搜尋器)
        Directory directory = FSDirectory.open(new File(index));
   //2.建立Analyzer,設定分析器，必須與建立索引時採用的分析器相同
        Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_CURRENT);
   //3.建立IndexReader
        IndexReader reader = IndexReader.open(directory);  //開啟儲存位置
        IndexSearcher searcher = new IndexSearcher(reader);    
        
        String queryString = "絕對秋香";   //搜尋的關鍵詞  
        try { 
   //4.建立搜尋的Query,確定搜尋的內容 
        	//使用QueryParser完成解析搜尋請求，
        	//new QueryParser(version,Field欄位， 分析器) ,Field欄位為進行搜尋的範圍。
            QueryParser qp = new QueryParser(Version.LUCENE_CURRENT,"content",analyzer);
            //Query對所傳入的搜尋關鍵詞彙進行解釋，並返回query物件。
            //我們需要把查詢String封裝成Query才可以交給Searcher來搜尋。
            //lucene中支援模糊查詢，語義查詢，短語查詢，組合查詢等等,
            //如有TermQuery,BooleanQuery,RangeQuery,WildcardQuery等一些類。
            Query query = qp.parse(queryString);
            //Query query = QueryParser.parser(“要查詢的字串”)
        
	        if (searcher != null) {
	//5.根據searcher搜尋並且返回TopDocs 
	        	//老版本中的Hits已經被棄用,代替它的是TopDocs，
                    //這個物件封裝了那些最符合搜尋條件的結果的資訊
	            TopDocs results = searcher.search(query,10);    //返回搜尋相似度最高的10條記錄
	            //ScoreDoc是一個評分物件，因為lucene在搜尋過程中，給每一個資源都評分，
	            //然後按照分數高低來決定最符合的搜尋條件的結果,這個物件同樣存了這些結果的資訊
	//6.根據TopDocs獲取ScoreDoc物件
	            ScoreDoc[] hits = results.scoreDocs; 	//用於儲存搜尋結果
	            //列印搜尋結果
	            if (hits.length > 0) {    
	                System.out.println("找到:" + hits.length + " 個結果!");    
	                for (int i=0; i<hits.length; i++) {
	//7.根據search和ScordDoc物件獲取具體的Document物件  
		                Document hitDoc = searcher.doc(hits[i].doc);
		                System.out.println("____________________________");
	//8.根據Document物件獲取需要的值   
		                System.out.println(hitDoc.get("path"));
		                System.out.println(hitDoc.get("content"));
		                System.out.println("____________________________\n");
		            }
	            }
	            reader.close();
	            directory.close();
	        }   
        } catch (ParseException e) {
        	e.printStackTrace();
        }
    }    
}

2. FileIndexReader
找到:2 個結果!
____________________________
C:\source\test1.txt
歡迎來到絕對秋香的部落格。
____________________________

____________________________
C:\source\test2.txt
絕對秋香引領你走向潮流。
____________________________

lucene3.6.0 經典案例入門教程

lucene3.6.0 經典案例入門教程

Windows學習總結（6）——MindManager新手入門教程

Mac OS MySQL8.0.12簡易入門教程

C#6.0&VISUALSTUDIO 2015 C#入門經典第7版pdf

ECMAScript 6.0基礎入門教程（一）-ES6基礎入門教程

C#6.0&VISUALSTUDIO 2015 C#入門經典第7版pdf

浪潮ERP GS 6.0安裝教程

【CC2530入門教程-增強版】基礎技能綜合實訓案例（基礎版）-題目需求

VFP+6.0中文版教程--初級教程

《呂鑫：VC++6.0就業培訓寶典之MFC視頻教程》學習筆記 -- 第二章 MFC原理介紹

Spring Boot 2.0.1 入門教程

javascript教程系列41:表格全選反選,經典案例詳解

安全多方計算（MPC）從入門到精通：經典案例

微信小程式經典案例開發視訊教程合集

小宋深度學習之旅（小白入門教程）0

ASP.NET Core 入門教程 6、ASP.NET Core MVC 檢視佈局入門

昆石VOS3000_2.1.6.0.0一鍵安裝教程

《從0到1學習Flink》—— Mac 上搭建 Flink 1.6.0 環境並構建執行簡單程式入門

入門C語言，別和我提Visual C++6.0

logback 入門教程系列-01-logback 入門使用案例

lucene3.6.0 經典案例 入門教程

相關推薦

lucene3.6.0 經典案例入門教程