10、索引庫的查詢四之:Lucene的高階搜尋技術
阿新 • • 發佈:2019-01-04
Lucene的高階搜尋技術首先要說的就是 SpanTermQuery ,他和TermQuery用法很相似,唯一區別就是SapnTermQuery可以得到Term的span跨度資訊,用法如下:
SpanNearQuery:用來匹配兩個Term之間的跨度的,用法如下:@Test public void testSpanTermQuery() throws Exception{ Directory directory = FSDirectory.open(Paths.get("D:\\LucentTest\\luceneIndex")); //建立一個IndexReader IndexReader indexReader = DirectoryReader.open(directory); //建立一個IndexSearcher物件 IndexSearcher indexSearcher = new IndexSearcher(indexReader); SpanQuery query=new SpanTermQuery(new Term("text","new")); //執行查詢 TopDocs topDocs = indexSearcher.search(query, 10); System.out.println("查詢結果總數量:" + topDocs.totalHits); for (ScoreDoc scoreDoc : topDocs.scoreDocs) { //取document物件 Document document = indexSearcher.doc(scoreDoc.doc); System.out.println(document.get("text")); } indexSearcher.getIndexReader().close(); }
SpanNotQuery:使用場景是當使用SpanNearQuery時,如果兩個Term從TermA到TermB有多種情況,即可能出現TermA或者TermB在索引中重複出現,則可能有多種情況,SpanNotQuery就是用來限制TermA和TermB之間不存在TermC,從而排除一些情況,實現更精確的控制,用法如下:@Test public void testSpanNearQuery() throws Exception{ Directory directory = FSDirectory.open(Paths.get("D:\\LucentTest\\luceneIndex")); //建立一個IndexReader IndexReader indexReader = DirectoryReader.open(directory); //建立一個IndexSearcher物件 IndexSearcher indexSearcher = new IndexSearcher(indexReader); SpanQuery queryStart = new SpanTermQuery(new Term("text","there")); SpanQuery queryEnd = new SpanTermQuery(new Term("text","contrib")); /** *原文: there is a new QueryParser in contrib, which matches the same syntax as this class, but is more modular, enabling substantial customization to how a query is created. *SpanNearQuery:用來匹配兩個Term之間的跨度的, * 即一個Term經過幾個跨度可以到達另一個Term,slop為跨度因子,用來限制兩個Term之間的最大跨度, * 不可能一個Term和另一個Term之間要經過十萬八千個跨度才到達也算兩者相近,這不符合常理。所以有個slop因子進行限制。 * 還有一個inOrder引數要引起注意,它用來設定是否允許進行倒序跨度,什麼意思?即TermA到TermB不一定是從左到右去匹配也可以從右到左, * 而從右到左就是倒序,inOrder為true即表示order(順序)很重要不能倒序去匹配必須正向去匹配,false則反之。注意停用詞不在slop統計範圍內。 * * slop:其實之前就有過一次說明,這裡再提一次,slop的值表示 跨度的大小,如果slop的值是4 則無法匹配到正確的,只是大於或等於5才能正確匹配。 */ SpanNearQuery query=new SpanNearQuery(new SpanQuery[]{queryStart,queryEnd}, 5, true); //執行查詢 TopDocs topDocs = indexSearcher.search(query, 10); System.out.println("查詢結果總數量:" + topDocs.totalHits); for (ScoreDoc scoreDoc : topDocs.scoreDocs) { //取document物件 Document document = indexSearcher.doc(scoreDoc.doc); System.out.println(document.get("text")); } indexSearcher.getIndexReader().close(); }
SpanOrQuery顧名思義就是把多個Span'Query用or連線起來,其實你也可以用BooleanQuery來代替SpanOrQuery,但SpanOrQuery會返回額外的Span跨度資訊,用法如下:@Test public void testSpanNotQuery() throws Exception{ Directory directory = FSDirectory.open(Paths.get("D:\\LucentTest\\luceneIndex2")); //建立一個IndexReader IndexReader indexReader = DirectoryReader.open(directory); SpanQuery queryStart = new SpanTermQuery(new Term("text","there")); SpanQuery queryEnd = new SpanTermQuery(new Term("text","contrib")); SpanQuery excludeQuery = new SpanTermQuery(new Term("text","new")); /** *原文: there is a new QueryParser in contrib, which matches the same syntax as this class, but is more modular, enabling substantial customization to how a query is created. */ SpanNearQuery spanquery=new SpanNearQuery(new SpanQuery[]{queryStart,queryEnd}, 5, true); //第一個引數表示要包含的跨度物件,第二個引數則表示要排除的跨度物件 SpanQuery query=new SpanNotQuery(spanquery,excludeQuery); //建立一個IndexSearcher物件 IndexSearcher indexSearcher = new IndexSearcher(indexReader); //執行查詢 TopDocs topDocs = indexSearcher.search(query, 10); System.out.println("查詢結果總數量:" + topDocs.totalHits); for (ScoreDoc scoreDoc : topDocs.scoreDocs) { //取document物件 Document document = indexSearcher.doc(scoreDoc.doc); System.out.println(document.get("text")); } indexSearcher.getIndexReader().close(); }
@Test
public void testSpanOrQuery() throws Exception{
Directory directory = FSDirectory.open(Paths.get("D:\\LucentTest\\luceneIndex2"));
//建立一個IndexReader
IndexReader indexReader = DirectoryReader.open(directory);
SpanQuery queryStart = new SpanTermQuery(new Term("text","there"));
SpanQuery queryEnd = new SpanTermQuery(new Term("text","contrib"));
SpanQuery excludeQuery = new SpanTermQuery(new Term("text","new"));
/**
*原文: there is a new QueryParser in contrib, which matches the same syntax as this class, but is more modular, enabling substantial customization to how a query is created.
* SpanOrQuery顧名思義就是把多個Span'Query用or連線起來,其實你也可以用BooleanQuery來代替SpanOrQuery,但SpanOrQuery會返回額外的Span跨度資訊
*/
SpanNearQuery spanquery=new SpanNearQuery(new SpanQuery[]{queryStart,queryEnd}, 5, true);
//第一個引數表示要包含的跨度物件,第二個引數則表示要排除的跨度物件
SpanOrQuery query=new SpanOrQuery(spanquery,excludeQuery);
//建立一個IndexSearcher物件
IndexSearcher indexSearcher = new IndexSearcher(indexReader);
//執行查詢
TopDocs topDocs = indexSearcher.search(query, 10);
System.out.println("查詢結果總數量:" + topDocs.totalHits);
for (ScoreDoc scoreDoc : topDocs.scoreDocs) {
//取document物件
Document document = indexSearcher.doc(scoreDoc.doc);
System.out.println(document.get("text"));
}
indexSearcher.getIndexReader().close();
}
SpanPositionRangeQuery這個query是用來限制匹配的情況是否分佈在(start,end)這個區間內,區間索引從零開始計算,用法如下: @Test
public void testSpanPositionRangeQuery() throws Exception{
Directory directory = FSDirectory.open(Paths.get("D:\\LucentTest\\luceneIndex2"));
//建立一個IndexReader
IndexReader indexReader = DirectoryReader.open(directory);
FuzzyQuery fQuery = new FuzzyQuery(new Term("text", "conerib"));
SpanQuery startEnd = new SpanMultiTermQueryWrapper<FuzzyQuery>(fQuery);
/**
*原文: there is a new QueryParser in contrib, which matches the same syntax as this class, but is more modular, enabling substantial customization to how a query is created.
* 首先呢,FuzzyQuery fQuery = new FuzzyQuery(new Term("text", "conerib"));用來查詢包含跟單詞contrib相似字元的索引文件
* 然後呢,new一個SpanQuery,把FuzzyQuery轉換成了SpanQuery,然後使用SpanPositionRangeQuery對匹配到的2種情況的落放的位置進行限制即跟conerib相似的單詞必須分佈在(3,10)這個區間內
*/
Query query = new SpanPositionRangeQuery(startEnd,3,10);
//第一個引數表示要包含的跨度物件,第二個引數則表示要排除的跨度物件
IndexSearcher indexSearcher = new IndexSearcher(indexReader);
//執行查詢
TopDocs topDocs = indexSearcher.search(query, 10);
System.out.println("查詢結果總數量:" + topDocs.totalHits);
for (ScoreDoc scoreDoc : topDocs.scoreDocs) {
//取document物件
Document document = indexSearcher.doc(scoreDoc.doc);
System.out.println(document.get("text"));
}
indexSearcher.getIndexReader().close();
}
SpanFirstQuery 用法如下: @Test
public void testSpanFirstQuery() throws Exception{
Directory directory = FSDirectory.open(Paths.get("D:\\LucentTest\\luceneIndex2"));
//建立一個IndexReader
IndexReader indexReader = DirectoryReader.open(directory);
FuzzyQuery fQuery = new FuzzyQuery(new Term("text", "conerib"));
SpanQuery startEnd = new SpanMultiTermQueryWrapper<FuzzyQuery>(fQuery);
/**
*原文: there is a new QueryParser in contrib, which matches the same syntax as this class, but is more modular, enabling substantial customization to how a query is created.
* 原理與SpanPositionRangeQuery是相同的,只是看起來少了一個引數,如果進行他的構建方法裡就能看的出來 是將start 賦值成0了
*/
Query query = new SpanFirstQuery(startEnd,10);
//第一個引數表示要包含的跨度物件,第二個引數則表示要排除的跨度物件
IndexSearcher indexSearcher = new IndexSearcher(indexReader);
//執行查詢
TopDocs topDocs = indexSearcher.search(query, 10);
System.out.println("查詢結果總數量:" + topDocs.totalHits);
for (ScoreDoc scoreDoc : topDocs.scoreDocs) {
//取document物件
Document document = indexSearcher.doc(scoreDoc.doc);
System.out.println(document.get("text"));
}
indexSearcher.getIndexReader().close();
}
FieldMaskingSpanQuery,它用於在多個域之間查詢,即把另一個域看作某個域,從而看起來就像在同一個域裡查詢,因為Lucene預設某個條件只能作用在單個域上,不支援跨域查詢只能在同一個域裡查詢,所以有了FieldMaskingSpanQuery @Test
public void testFieldMaskingSpanQuery() throws Exception{
Directory directory = FSDirectory.open(Paths.get("D:\\LucentTest\\luceneIndex2"));
//建立一個IndexReader
IndexReader indexReader = DirectoryReader.open(directory);
SpanQuery queryStart = new SpanTermQuery(new Term("text","there"));
SpanQuery queryEnd = new SpanTermQuery(new Term("text","new"));
SpanQuery startEnd = new FieldMaskingSpanQuery(queryEnd, "text");
/**
*原文: there is a new QueryParser in contrib, which matches the same syntax as this class, but is more modular, enabling substantial customization to how a query is created.
*它用於在多個域之間查詢,即把另一個域看作某個域,從而看起來就像在同一個域裡查詢,因為Lucene預設某個條件只能作用在單個域上,不支援跨域查詢只能在同一個域裡查詢,所以有了FieldMaskingSpanQuery
*/
Query query = new SpanNearQuery(new SpanQuery[]{queryStart, startEnd}, 5, false);
//第一個引數表示要包含的跨度物件,第二個引數則表示要排除的跨度物件
IndexSearcher indexSearcher = new IndexSearcher(indexReader);
//執行查詢
TopDocs topDocs = indexSearcher.search(query, 10);
System.out.println("查詢結果總數量:" + topDocs.totalHits);
for (ScoreDoc scoreDoc : topDocs.scoreDocs) {
//取document物件
Document document = indexSearcher.doc(scoreDoc.doc);
System.out.println(document.get("text"));
}
indexSearcher.getIndexReader().close();
}
1.6 禁用模糊查詢和萬用字元查詢如果禁用模糊查詢就要自定義QueryParser 類,禁用模糊查詢和萬用字元查詢,同樣的如果希望禁用其它型別查詢,只需要覆寫對應的getXXXQuery方法即可import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.queryparser.classic.ParseException;
import org.apache.lucene.queryparser.classic.QueryParser;
import org.apache.lucene.search.Query;
/**
* Created by kangz on 2016/12/15.
*/
public class CustomQueryParser extends QueryParser{
public CustomQueryParser(String f, Analyzer a) {
super(f, a);
}
protected Query getFuzzyQuery(String field, String termStr, float minSimilarity) throws ParseException {
throw new ParseException("Fuzzy queries not allowed!");
}
protected Query getWildcardQuery(String field, String termStr) throws ParseException {
throw new ParseException("由於效能原因,已禁用萬用字元搜尋,請輸入更精確的資訊進行搜尋 ^_^ ^_^");
}
}
1.7 多索引的搜尋合併方法//多索引的組合查詢@Test
public void testMultiReader() throws IOException {
Directory directory1 = FSDirectory.open(Paths.get("D:\\LucentTest\\luceneIndex"));
Directory directory2 = FSDirectory.open(Paths.get("D:\\LucentTest\\luceneIndex2"));
IndexReader aIndexReader = DirectoryReader.open(directory1);
IndexReader bIndexReader = DirectoryReader.open(directory2);
MultiReader multiReader = new MultiReader(aIndexReader, bIndexReader);
IndexSearcher indexSearcher = new IndexSearcher(multiReader);
TopDocs animal = indexSearcher.search(new TermRangeQuery("text", new BytesRef("a"), new BytesRef("z"), true, true), 10);
ScoreDoc[] scoreDocs = animal.scoreDocs;
for (ScoreDoc sd : scoreDocs) {
System.out.println(indexSearcher.doc(sd.doc));
}
}
下面是小編的微信轉帳二維碼,小編再次謝謝讀者的支援,小編會更努力的
----請看下方↓↓↓↓↓↓↓
百度搜索 Drools從入門到精通:可下載開源全套Drools教程
深度Drools教程不段更新中:
更多Drools實戰陸續釋出中………
掃描下方二維碼關注公眾號 ↓↓↓↓↓↓↓↓↓↓