Lucene中的近實時搜尋SearcherManager
近實時搜尋(near-real-time)可以搜尋IndexWriter還未commit的內容。
Index索引的重新整理過程:
只有IndexWriter上的commit操作才會導致Ram Directory記憶體上的資料完全同步到檔案。
IndexWriter提供了實時獲得reader的API,這個呼叫將會導致flush操作,生成新的segment,但不會commit (fsync),從而減少了IO。新的segment被加入到新生成的reader裡。從返回的reader中可以看到更新。
所以,只要每次新的搜尋都從IndexWriter獲得一個新的reader,就可以搜尋到最新的內容。這一操作的開銷僅僅是flush,相對commmit來說,開銷很小。
Lucene的index索引組織方式為一個index目錄下的多個segment片段,新的doc會加入新的segment裡,這些新的小segment每間隔一段時間就會合並起來。因為合併,總的sgement數量保持的較小,總體的search速度仍然很快。
為了防止讀寫衝突,lucene只建立新的segment,並對任何active狀態的reader,不在使用後刪除老的segment。
flush就是把資料寫入作業系統的緩衝區,只要緩衝區不滿,就不會有硬碟操作。
commit是把所有記憶體緩衝區內的資料寫入到硬碟,是完全的硬碟操作,屬於重量級的操作。這是因為Lucene索引中最主要的結構posting倒排通過VInt型別和delta的格式儲存並緊密排列。合併時要對同一個term的posting(倒排)進行歸併排序,是一個讀出,合併再生成的過程。
SearchManager近實時搜尋 實現原理:
Lucene通過NRTManager這個類來實現近實時搜尋,所謂近實時搜尋也就是在索引發生改變時,通過執行緒跟蹤,在相對很短的時間內反映給使用者程式的 呼叫NRTManager通過管理IndexWriter物件,並將IndexWriter的一些方法進行增刪改,例如:addDocument,deleteDocument等方法暴漏給客戶呼叫,它的操作全部在記憶體裡面,所以如果你不呼叫IndexWriter的commit方法,通過以上的操作,使用者硬盤裡面的索引庫是不會變化的,所以你每次更新完索引庫請記得commit掉,這樣才能將變化的索引一起寫到硬碟中。
實現索引更新後的同步使用者每次獲取最新索引(IndexSearcher),可以通過兩種方式:
第一種是通過呼叫NRTManagerReopenThread物件,該執行緒負責實時跟蹤索引記憶體的變化,每次變化就呼叫maybeReopen方法,保持最新代索引,開啟一個新的IndexSearcher物件,而使用者所要的IndexSearcher物件是NRTManager通過呼叫getSearcherManager方法獲得SearcherManager物件,然後通過SearcherManager物件獲取IndexSearcher物件返回個客戶使用,使用者使用完之後呼叫SearcherManager的release釋放IndexSearcher物件,最後記得關閉NRTManagerReopenThread;
第二種方式是不通過NRTManagerReopenThread物件,而是直接呼叫NRTManager的maybeReopen方法來獲取最新的IndexSearcher物件來獲取最新索引.
public void testSearch() throws IOException {
Directory directory = FSDirectory.open(new File("/root/data/03"));
SearcherManager sm = new SearcherManager(directory, null);
IndexSearcher searcher = sm.acquire();
// IndexReader reader = DirectoryReader.open(directory);
// IndexSearcher searcher = new IndexSearcher(reader);
Query query = new TermQuery(new Term("title", "test"));
TopDocs results = searcher.search(query, null, 100);
System.out.println(results.totalHits);
ScoreDoc[] docs = results.scoreDocs;
for (ScoreDoc doc : docs) {
System.out.println("doc inertalid:" + doc.doc + " ,docscore:" + doc.score);
Document document = searcher.doc(doc.doc);
System.out.println("id:" + document.get("id") + " ,title:" + document.get("title"));
}
sm.release(searcher);
sm.close();
}
public void testUpdateAndSearch() throws IOException, InterruptedException {
Directory directory = FSDirectory.open(new File("/root/data/03"));
Analyzer analyzer = new StandardAnalyzer();
IndexWriterConfig config = new IndexWriterConfig(Version.LATEST, analyzer);
config.setOpenMode(OpenMode.CREATE_OR_APPEND);
IndexWriter writer = new IndexWriter(directory, config);
TrackingIndexWriter trackingWriter = new TrackingIndexWriter(writer);
SearcherManager sm = new SearcherManager(writer, true, null);
ControlledRealTimeReopenThread thread = new ControlledRealTimeReopenThread(trackingWriter, sm, 60, 1);
thread.setDaemon(true);
thread.setName("NRT Index Manager Thread");
thread.start();
Document doc = new Document();
Field idField = new StringField("id", "3", Store.YES);
Field titleField = new TextField("title", "test for 3", Store.YES);
doc.add(idField);
doc.add(titleField);
long gerenation = trackingWriter.updateDocument(new Term("id", "2"), doc);
// Thread.sleep(1000);
// writer.close();
// sm.maybeRefresh();
// sm = new SearcherManager(writer, true, null);
thread.waitForGeneration(gerenation);
IndexSearcher searcher = sm.acquire();
Query query = new TermQuery(new Term("title", "test"));
TopDocs results = searcher.search(query, null, 100);
System.out.println(results.totalHits);
ScoreDoc[] docs = results.scoreDocs;
for (ScoreDoc scoreDoc : docs) {
System.out.println("doc inertalid:" + scoreDoc.doc + " ,docscore:" + scoreDoc.score);
Document document = searcher.doc(scoreDoc.doc);
System.out.println("id:" + document.get("id") + " ,title:" + document.get("title"));
}
sm.release(searcher);
sm.close();
// IndexSearcher searcher = sm.acquire();
// IndexReader reader = DirectoryReader.open(directory);
// IndexSearcher searcher = new IndexSearcher(reader);
// Query query = new TermQuery(new Term("title", "test"));
// TopDocs results = searcher.search(query, null, 100);
// System.out.println(results.totalHits);
// ScoreDoc[] docs = results.scoreDocs;
// for (ScoreDoc doc : docs) {
// System.out.println("doc inertalid:" + doc.doc + " ,docscore:" +
// doc.score);
// Document document = searcher.doc(doc.doc);
// System.out.println("id:" + document.get("id") + " ,title:" +
// document.get("title"));
// }
// sm.release(searcher);
}
建立索引:
public void testBulidIndex() throws IOException {
Directory directory = FSDirectory.open(new File("/root/data/03"));
// Directory directory=new RAMDirectory();
Analyzer analyzer = new StandardAnalyzer();
IndexWriterConfig config = new IndexWriterConfig(Version.LATEST, analyzer);
config.setOpenMode(OpenMode.CREATE);
IndexWriter writer = new IndexWriter(directory, config);
Document doc1 = new Document();
Field idField1 = new StringField("id", "1", Store.YES);
Field titleField1 = new TextField("title", "test for 1", Store.YES);
doc1.add(idField1);
doc1.add(titleField1);
writer.addDocument(doc1);
Document doc2 = new Document();
Field idField2 = new StringField("id", "2", Store.YES);
Field titleField2 = new TextField("title", "test for 2", Store.YES);
doc2.add(idField2);
doc2.add(titleField2);
writer.addDocument(doc2);
writer.commit();
writer.close();
}