1. 程式人生 > 實用技巧 >【ES】查詢

【ES】查詢

查詢介面

本文為翻譯官方文件

一個基本的查詢構造

1、建立SearchRequest,不帶引數,表示查詢所有索引
2、新增大部分查詢引數到 SearchSourceBuilder,接收QueryBuilders構建的查詢引數
3、新增 match_all 查詢到 SearchSourceBuilder
4、新增 SearchSourceBuilder 到 SearchRequest

SearchRequest searchRequest = new SearchRequest(); 
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder(); 
searchSourceBuilder.query(QueryBuilders.matchAllQuery()); 
searchRequest.source(searchSourceBuilder); 

SearchRequest 有一些可選引數

// 指定查詢“posts”索引
SearchRequest searchRequest = new SearchRequest("posts"); 
// 設定路由
searchRequest.routing("routing"); 
// IndicesOptions 設定如何解析未知的索引及萬用字元表示式如何擴充套件
searchRequest.indicesOptions(IndicesOptions.lenientExpandOpen()); 
// 設定偏好引數,如設定搜尋本地分片的偏好,預設是在分片中隨機檢索
searchRequest.preference("_local"); 

使用SearchSourceBuilder

控制搜尋行為的大多數選項都可以在 SearchSourceBuilder 上設定,該構建器或多或少包含與Rest API 的search request中的選項等價的設定。下面是一些通用設定選項:

// 使用預設引數建立 SearchSourceBuilder
SearchSourceBuilder sourceBuilder = new SearchSourceBuilder(); 
// 可以設定任何型別的QueryBuilder查詢引數
sourceBuilder.query(QueryBuilders.termQuery("user", "kimchy")); 
// 設定查詢的起始位置,預設是0
sourceBuilder.from(0); 
// 設定查詢結果的頁大小,預設是10
sourceBuilder.size(5); 
// 設定當前查詢的超時時間
sourceBuilder.timeout(new TimeValue(60, TimeUnit.SECONDS));

設定完上面 的SearchSourceBuilder ,只需要將其設定到 SearchRequest中即可。

SearchRequest searchRequest = new SearchRequest();
searchRequest.indices("posts");
searchRequest.source(sourceBuilder);

構建查詢引數QueryBuilder

使用QueryBuilder構建查詢引數,QueryBuilder包括所有Elasticsearch’s Query DSL支援的型別。

可以使用QueryBuilde構造器建立一個QueryBuilder:

// 構建一個全文檢索Match Query, 查詢匹配kimchy的user欄位
MatchQueryBuilder matchQueryBuilder = new MatchQueryBuilder("user", "kimchy");

// 可以針對建立的QueryBuilder物件設定查詢引數
// 開啟模糊查詢
matchQueryBuilder.fuzziness(Fuzziness.AUTO); 
// 設定查詢字首長度
matchQueryBuilder.prefixLength(3); 
// 設定模糊查詢最大擴充套件
matchQueryBuilder.maxExpansions(10); 

可以使用工具類QueryBuilders,採用流式程式設計的形式構建QueryBuilder

QueryBuilder matchQueryBuilder = QueryBuilders.matchQuery("user", "kimchy")
                                                .fuzziness(Fuzziness.AUTO)
                                                .prefixLength(3)
                                                .maxExpansions(10);

不管使用上述兩種方式的哪種來建立QueryBuilder, QueryBuilder都必須按如下方式設定到SearchSourceBuilder.

searchSourceBuilder.query(matchQueryBuilder);

Building Queries 頁給出了所有的查詢QueryBuilder及輔助類QueryBuilders的表達方式。

指定排序

SearchSourceBuilder允許增加一或多個排序引數SortBuilder,有四個具體實現FieldSortBuilder, ScoreSortBuilder, GeoDistanceSortBuilder 和 ScriptSortBuilder。

// 預設排序。根據_score倒序
sourceBuilder.sort(new ScoreSortBuilder().order(SortOrder.DESC)); 
// 根據_id升序
sourceBuilder.sort(new FieldSortBuilder("id").order(SortOrder.ASC)); 

使用Source欄位過濾

_source欄位

預設情況下,查詢請求會返回_source欄位的全部內容,但是該行為可以被覆寫,比如,你可以完全關掉該欄位的索引(不推薦,該行為,原因參考上面的連結)

sourceBuilder.fetchSource(false);

該方法fetchSource也可以接收組通配模式來以更細粒度地方式控制哪些欄位被包含或者被排除。

String[] includeFields = new String[] {"title", "innerObject.*"};
String[] excludeFields = new String[] {"user"};
sourceBuilder.fetchSource(includeFields, excludeFields);

搜尋結果突出(Highlighting)

通過向 SearchSourceBuilder 設定 HighlightBuilder 可以實現查詢高亮的效果。通過設定一個或多個 HighlightBuilder.Field 物件到HighlightBuilder 中實現不同的突出行為。

SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
HighlightBuilder highlightBuilder = new HighlightBuilder(); 
// 設定需要突出的欄位
HighlightBuilder.Field highlightTitle = new HighlightBuilder.Field("title"); 
highlightTitle.highlighterType("unified");  
highlightBuilder.field(highlightTitle);  
HighlightBuilder.Field highlightUser = new HighlightBuilder.Field("user");
highlightBuilder.field(highlightUser);
// 設定HighlightBuilder到SearchSourceBuilder
searchSourceBuilder.highlighter(highlightBuilder);

詳細的選項可以參考Rest API的文件。

高亮的文字片斷可以在下面的 later be retrieved SearchResponse中獲取。

實際樣例

    // 獲取資料實現高亮功能
    public List<Map<String, Object>> searchPageHighlightBuilder(String keyword, int pageNo, int pageSize)
        throws IOException {
        if (pageNo <= 1) {
            pageNo = 1;
        }

        keyword = URLDecoder.decode(keyword, "UTF-8");

        // 條件搜尋
        SearchRequest searchRequest = new SearchRequest("jd_goods");
        SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();

        // 分頁
        searchSourceBuilder.from(pageNo);
        searchSourceBuilder.size(pageSize);

        // 精準匹配
        TermQueryBuilder termQueryBuilder = QueryBuilders.termQuery("title", keyword);
        searchSourceBuilder.query(termQueryBuilder);
        searchSourceBuilder.timeout(new TimeValue(60, TimeUnit.SECONDS));

        // 高亮
        HighlightBuilder highlightBuilder = new HighlightBuilder();
        highlightBuilder.field("title");
        highlightBuilder.requireFieldMatch(true);// 多個高亮顯示
        highlightBuilder.preTags("<span style='color:red'>");
        highlightBuilder.postTags("</span>");
        searchSourceBuilder.highlighter(highlightBuilder);

        // 執行搜尋
        searchRequest.source(searchSourceBuilder);
        SearchResponse search = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);

        // 解析結果
        ArrayList<Map<String, Object>> list = new ArrayList<>();
        for (SearchHit documentFields : search.getHits().getHits()) {

            // 解析高亮的欄位
            Map<String, HighlightField> highlightFields = documentFields.getHighlightFields();
            HighlightField title = highlightFields.get("title");
            Map<String, Object> sourceAsMap = documentFields.getSourceAsMap();
            if (title != null) {
                Text[] fragments = title.fragments();
                String n_title = "";
                for (Text text : fragments) {
                    n_title += text;
                }
                sourceAsMap.put("title", n_title);
            }
            list.add(sourceAsMap);
        }
        return list;

    }

請求聚合(Requesting Aggregations)

通過構建AggregationBuilder物件並設定到SearchSourceBuilder中可以實現聚合查詢。

下面的例子建立了terms的聚合:聚合各公司下員工的平均年齡

SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
TermsAggregationBuilder aggregation = AggregationBuilders.terms("by_company").field("company.keyword");
aggregation.subAggregation(AggregationBuilders.avg("average_age").field("age"));
searchSourceBuilder.aggregation(aggregation);

Building Aggregations頁給出了聚合物件AggregationBuilder及輔助類AggregationBuilders的對應關係及使用方式。

後面會介紹如何從SearchResponse中獲取聚合結果。 access aggregations

請求建議Requesting Suggestions

在查詢請求中可以設定請求Suggestions,通過使用SuggestBuilders輔助類,或者SuggestionBuilder構造器,將其設定到SuggestBuilder,最後將SuggestBuilder設定SearchSourceBuilder中。

SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
// 為欄位user和文字kmichy建立 TermSuggestionBuilder 
SuggestionBuilder termSuggestionBuilder = SuggestBuilders.termSuggestion("user").text("kmichy"); 
SuggestBuilder suggestBuilder = new SuggestBuilder();
// 新增TermSuggestionBuilder到suggestBuilder中,並命名為suggest_user
suggestBuilder.addSuggestion("suggest_user", termSuggestionBuilder); 
searchSourceBuilder.suggest(suggestBuilder);

後面會介紹如何從SearchResponse中檢索建議retrieve suggestions

Profiling Queries和aggregations

Profile API可以配置某個具體的查詢或聚合請求的執行過程。如果想使用該功能,需要將SearchSourceBuilder的開關開啟。

SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
searchSourceBuilder.profile(true);

SearchRequest執行後,SearchResponse會包含配置的執行資訊。contain the profiling results

同步查詢執行Synchronous execution

下面是同步查詢執行方式,客戶端會等待SearchResponse的結果返回後,才繼續執行後面的程式碼。

SearchResponse searchResponse = client.search(searchRequest, RequestOptions.DEFAULT);

high-level REST client 的同步請求可能丟擲IOException。

非同步查詢執行Asynchronous execution

SearchRequest也可以以非同步方式執行。使用者需要指定listenert到非同步請求中,告訴非同步請求如何處理響應或潛在的錯誤。

client.searchAsync(searchRequest, RequestOptions.DEFAULT, listener); 

searchRequest執行完成後會呼叫ActionListener

非同步方式不會阻塞,當該非同步呼叫結果後,ActionListener會被呼叫,如果執行成功,則onResponse會被呼叫,如果失敗,則onFailure被呼叫。一個典型的search listener如下:

ActionListener<SearchResponse> listener = new ActionListener<SearchResponse>() {
    @Override
    public void onResponse(SearchResponse searchResponse) {
        
    }

    @Override
    public void onFailure(Exception e) {
        
    }
};

SearchResponse

SearchResponse提供了查詢執行的細節以及返回的文件結果。首先,SearchResponse包括當前查詢的執行細節,比如HTTP響應碼、執行時間、或者是否超時等:

RestStatus status = searchResponse.status();
TimeValue took = searchResponse.getTook();
Boolean terminatedEarly = searchResponse.isTerminatedEarly();
boolean timedOut = searchResponse.isTimedOut();

其次,SearchResponse提供了查詢涉及的分片統計資訊,包含執行成功和執行失敗的分片。執行的錯誤資訊ShardSearchFailures可以通過如下方式迭代處理:

int totalShards = searchResponse.getTotalShards();
int successfulShards = searchResponse.getSuccessfulShards();
int failedShards = searchResponse.getFailedShards();
for (ShardSearchFailure failure : searchResponse.getShardFailures()) {
    // failures should be handled here
}

檢索結果Retrieving SearchHits

為了獲取到結果中的文件資訊,我們首先要獲取response中的搜尋結果集SearchHits:

SearchHits hits = searchResponse.getHits();

SearchHits提供了所有命中結果的全域性資訊,包括命中的總數或者最大的score:

TotalHits totalHits = hits.getTotalHits();
// the total number of hits, must be interpreted in the context of totalHits.relation
long numHits = totalHits.value;
// whether the number of hits is accurate (EQUAL_TO) or a lower bound of the total (GREATER_THAN_OR_EQUAL_TO)
TotalHits.Relation relation = totalHits.relation;
float maxScore = hits.getMaxScore();

SearchHits中的單個結果集可以迭代獲取:

SearchHit[] searchHits = hits.getHits();
for (SearchHit hit : searchHits) {
    // do something with the SearchHit
}

單個結果集SearchHit包含了一些基本資訊:索引、文件ID、每個命中結果的score

String index = hit.getIndex();
String id = hit.getId();
float score = hit.getScore();

此外,SearchHit可以以JSON或MAP形式返回文件的source資訊。在Map中,普通的欄位以欄位名作為key,值為欄位值。多值欄位是以物件列表形式返回,巢狀物件,則以另一個map的形式返回。需要根據實際情況進行強轉:

String sourceAsString = hit.getSourceAsString();
Map<String, Object> sourceAsMap = hit.getSourceAsMap();
String documentTitle = (String) sourceAsMap.get("title");
List<Object> users = (List<Object>) sourceAsMap.get("user");
Map<String, Object> innerObject = (Map<String, Object>) sourceAsMap.get("innerObject");

獲取突出內容(Retrieving Highlighting)

如果設定了highlighting的請求,則返回的SearchHit中可以獲取到突出的text文字片斷。

SearchHits hits = searchResponse.getHits();
for (SearchHit hit : hits.getHits()) {
    Map<String, HighlightField> highlightFields = hit.getHighlightFields();
    // Get the highlighting for the title field
    HighlightField highlight = highlightFields.get("title"); 
    // Get one or many fragments containing the highlighted field content
    Text[] fragments = highlight.fragments();  
    String fragmentString = fragments[0].string();
}

獲取聚合結果(Retrieving Aggregations)

首先從SearchResponse中獲取聚合樹aggregation tree的根,聚合物件Aggregations,然後可以根據名稱獲取聚合物件aggregation。

Aggregations aggregations = searchResponse.getAggregations();
// Get the by_company terms aggregation
Terms byCompanyAggregation = aggregations.get("by_company"); 
// Get the buckets that is keyed with Elastic
Bucket elasticBucket = byCompanyAggregation.getBucketByKey("Elastic"); 
// Get the average_age sub-aggregation from that bucket
Avg averageAge = elasticBucket.getAggregations().get("average_age"); 
double avg = averageAge.getValue();

注意,如果使用名稱獲取Aggregations,需要指定aggregation 介面為請求裡設定的aggregation 型別,否則會丟擲ClassCastException 異常。

// This will throw an exception because "by_company" is a terms aggregation but we try to retrieve it as a range aggregation
Range range = aggregations.get("by_company"); 

也可以以map的形式獲取aggregations,key是aggregation名稱。這種情況下,aggregation 介面需要顯式的強轉。

Map<String, Aggregation> aggregationMap = aggregations.getAsMap();
Terms companyAggregation = (Terms) aggregationMap.get("by_company");

也有方式將所有top level aggregations以列表形式返回:

List<Aggregation> aggregationList = aggregations.asList();

最後,可以迭代所有aggregations:

for (Aggregation agg : aggregations) {
    String type = agg.getType();
    if (type.equals(TermsAggregationBuilder.NAME)) {
        Bucket elasticBucket = ((Terms) agg).getBucketByKey("Elastic");
        long numberOfDocs = elasticBucket.getDocCount();
    }
}

獲取建議結果(Retrieving Suggestions)

為了從SearchResponse中獲取suggestions,可以使用Suggest物件作為入口。

// Use the Suggest class to access suggestions
Suggest suggest = searchResponse.getSuggest(); 
// Suggestions can be retrieved by name. You need to assign them to the correct type of Suggestion class (here TermSuggestion), otherwise a ClassCastException is thrown
TermSuggestion termSuggestion = suggest.getSuggestion("suggest_user"); 
// Iterate over the suggestion entries
for (TermSuggestion.Entry entry : termSuggestion.getEntries()) { 
    // Iterate over the options in one entry
    for (TermSuggestion.Entry.Option option : entry) { 
        String suggestText = option.getText().string();
    }
}

獲取配置結果(Retrieving Profiling Results)

可以使用SearchResponse的getProfileResults()方法獲取。返回結果為每個分片包裝一個Map,值為ProfileShardResult物件。key是能唯一標識分片的資訊。

// Retrieve the Map of ProfileShardResult from the SearchResponse
Map<String, ProfileShardResult> profilingResults = searchResponse.getProfileResults(); 
// Profiling results can be retrieved by shard’s key if the key is known, otherwise it might be simpler to iterate over all the profiling results
for (Map.Entry<String, ProfileShardResult> profilingResult : profilingResults.entrySet()) { 
    // Retrieve the key that identifies which shard the ProfileShardResult belongs to
    String key = profilingResult.getKey(); 
    // Retrieve the ProfileShardResult for the given shard
    ProfileShardResult profileShardResult = profilingResult.getValue(); 
}

ProfileShardResult包含一個或多個profile 結果:

// Retrieve the list of QueryProfileShardResult
List<QueryProfileShardResult> queryProfileShardResults =
        profileShardResult.getQueryProfileResults(); 
// Iterate over each QueryProfileShardResult
for (QueryProfileShardResult queryProfileResult : queryProfileShardResults) { 

}

每個QueryProfileShardResult 中可以獲取ProfileResult物件列表:

// Iterate over the profile results
for (ProfileResult profileResult : queryProfileResult.getQueryResults()) {
    // Retrieve the name of the Lucene query
    String queryName = profileResult.getQueryName(); 
    // Retrieve the time in millis spent executing the Lucene query
    long queryTimeInMillis = profileResult.getTime(); 
    // Retrieve the profile results for the sub-queries (if any)
    List<ProfileResult> profiledChildren = profileResult.getProfiledChildren(); 
}

QueryProfileShardResult也可以獲取Lucene collectors的資訊:

// Retrieve the profiling result of the Lucene collector
CollectorResult collectorResult = queryProfileResult.getCollectorResult();  
// Retrieve the name of the Lucene collector
String collectorName = collectorResult.getName();  
// Retrieve the time in millis spent executing the Lucene collector
Long collectorTimeInMillis = collectorResult.getTime(); 
// Retrieve the profile results for the sub-collectors (if any)
List<CollectorResult> profiledChildren = collectorResult.getProfiledChildren(); 

QueryProfileShardResult可以獲取詳細的aggregations tree執行資訊:

// Retrieve the AggregationProfileShardResult
AggregationProfileShardResult aggsProfileResults =
        profileShardResult.getAggregationProfileResults(); 
// Iterate over the aggregation profile results
for (ProfileResult profileResult : aggsProfileResults.getProfileResults()) { 
    // Retrieve the type of the aggregation (corresponds to Java class used to execute the aggregation)
    String aggName = profileResult.getQueryName(); 
    // Retrieve the time in millis spent executing the Lucene collector
    long aggTimeInMillis = profileResult.getTime(); 
    // Retrieve the profile results for the sub-aggregations (if any)
    List<ProfileResult> profiledChildren = profileResult.getProfiledChildren(); 
}