基於Lucene的搜尋引擎的建立

阿新 • • 發佈：2019-02-05

一、基礎知識
1、索引概念
索引建立：資料——>分詞——>索引建立
搜尋過程：獲取關鍵字——>分詞——>檢索索引——>返回結果
2、索引數學模型
詞元的權重計算：文件中的每個詞元都對應一個權重
空間向量模型：將每個詞元可以對應為空間中的一個向量
檢索：將關鍵字依舊放入空間中，相當於求與目的詞元之間的夾角
3、Lucene的索引檔案結構
二、Lucene的使用
1、建立索引
定義分詞器
確定索引檔案儲存的位置
建立IndexWriter，進行索引檔案的寫入
內容提取，進行索引的儲存
2、通過關鍵字索引文件
開啟儲存位置
建立搜尋器
進行關鍵字查詢
關閉查詢器等
官方文件給出的例子

Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_CURRENT);

    // Store the index in memory:
    Directory directory = new RAMDirectory();
    // To store an index on disk, use this instead:
    //Directory directory = FSDirectory.open("/tmp/testindex");
    IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE 
_CURRENT, analyzer);
    IndexWriter iwriter = new IndexWriter(directory, config);
    Document doc = new Document();
    String text = "This is the text to be indexed.";
    doc.add(new Field("fieldname", text, TextField.TYPE_STORED));
    iwriter.addDocument(doc);
    iwriter.close();

    // Now search the index:
    DirectoryReader ireader = DirectoryReader.open 
(directory);
    IndexSearcher isearcher = new IndexSearcher(ireader);
    // Parse a simple query that searches for "text":
    QueryParser parser = new QueryParser(Version.LUCENE_CURRENT, "fieldname", analyzer);
    Query query = parser.parse("text");
    ScoreDoc[] hits = isearcher.search(query, null, 1000).scoreDocs;
    assertEquals(1, hits.length);
    // Iterate through the results:
    for (int i = 0; i < hits.length; i++) {
      Document hitDoc = isearcher.doc(hits[i].doc);
      assertEquals("This is the text to be indexed.", hitDoc.get("fieldname"));
    }
    ireader.close();
    directory.close();

3、分詞器的不同方法Analyzer
CJKAnalyzer、KeywordAnalyzer、SimpleAnalyzer、StopAnalyzer、WhitespaceAnalyzer、StandardAnalyzer、IKAnalyzer
4、搜尋器Query的不同方法
QueryParser、 MultiFieldQueryParser、TermQuery 、PrefixQuery、 PhraseQuery、 WildcardQuery、TermRangeQuery、 NumericRangeQuery、 BooleanQuery
搜尋中還會用的幾個類：
Collector主要用來對搜尋結果做收集、自定義排序、過濾等
Filter主要是做篩選條件的，用於指定哪些文件可以在搜尋結果中
Sort在檢索方法中指定排序方式，相當於資料庫中的order by

基於Lucene的搜尋引擎的建立

java 基於lucene 如何建立index【索引】

基於Lucene的搜尋引擎的建立

基於 Lucene 的8 個開源搜尋引擎

基於Lucene 7.1.0 實現搜尋引擎

基於lucene建立實時索引基礎jar包

搜尋引擎智慧提示的實現-基於Lucene拼音檢查庫

基於lucene的案例開發：建立索引

Zoie：基於Lucene實時的搜尋引擎系統

fork開源代碼後如何基於某個tag建立自己的branch

搜索引擎之全文搜索算法功能實現（基於Lucene）

基於Lucene框架的“虎撲籃球”網站搜索引擎（java版）

Lucene搜尋引擎-搜尋

Lucene搜尋引擎-索引

Lucene搜尋引擎-分詞器

Lucene搜尋引擎(1)--Cygwin的安裝

基於Z-Stack建立自己的工程

基於Lucene查詢原理分析Elasticsearch的效能

Vue 2.0 學習筆記一基於webpack模板建立專案

Docker實戰：基於centos7映象建立可以ssh連結的Docker容器

eclipse 基於 jdk1.8 建立第一個 java 工程

基於Lucene的搜尋引擎的建立

相關推薦