1. 程式人生 > >ES參考之Query DSL

ES參考之Query DSL

ES因搜尋而生,主要工作就是處理查詢,返回結果。

ES提供了基於JSON的query DSL查詢語言,有兩種型別子句:
1. Leaf query。查詢特定欄位的特定值。如match、term、range查詢。
2. Compound query。wrap other leaf or compound queries。以一種含有邏輯(如booldis_max查詢)的方式組合多個查詢,或改變查詢行為(如constant_score查詢)

Query clauses behave differently depending on whether they are used in query context or filter context

.

Query and filter context

Query context
Answers the question “How well does this document match this query clause?”
有打分(_score)。

In effect(生效) whenever a query clause is passed to a query parameter,

Filter context
Answers the question “Does this document match this query clause?”
無打分,yes or no。常用於過濾structured data. 如timestamp是否在特定範圍

為了提高效能,ES把常用的filters自動快取。

In effect whenever a query clause is passed to a filter parameter,

  1. filter or must_not parameters in the bool query,
  2. the filter parameter in the constant_score query
  3. the filter aggregation.
POST _search
{
  "query": {
    "bool" : {
      "filter": {//1.1
        "term"
: { "tag" : "tech" } }, "must_not" : {//1.2 "range" : { "age" : { "from" : 10, "to" : 20 } } } } } } GET /_search { "query": { "constant_score" : {//2.1 "filter" : { "term" : { "user" : "kimchy"} }, "boost" : 1.2 } } } { "aggs" : { "red_products" : { "filter" : { "term": { "color": "red" } },//3.1 "aggs" : { "avg_price" : { "avg" : { "field" : "price" } } } } } } GET /_search { "query": {// query context "bool": { "must": [// bool and two must:query context { "match": { "title": "Search" }}, { "match": { "content": "Elasticsearch" }} ], "filter": [ // filter context,term、range are used in FC { "term": { "status": "published" }}, { "range": { "publish_date": { "gte": "2015-01-01" }}} ] } } }

注意:在query context中的條件影響打分,而filter context不影響打分。

搜尋型別

query_then_fetch :
第一步,執行查詢得到對文件進行排序和分級所需要資訊,在所有分片上執行。然後,只在相關分片上查詢文件的實際內容。返回結果的最大數量是 size 引數的值。這個型別是預設的查詢型別。
query_and_fetch :
查詢在所有分片上並行執行,所有分片返回等於 size 值的結果數。返回文件的最大數等於 size 乘以 分片的數量。
dfs_query_and_fetch :
與 query_and_fetch 類似,在初始查詢中執行分散式詞頻的計算,以得到返回檔案的更精確的得分,從而讓查詢結果更想相關。
dfs_query_then_fetch :
與 query_then_fetch 類似,在初始查詢中執行分散式詞頻的計算,以得到返回檔案的更精確的得分,從而讓查詢結果更想相關。
count —–>size=0
特殊搜尋,只返回匹配查詢的文件數。
scan —–>scroll:
在傳送第一個請求後,響應一個滾動識別符號,類似於資料庫當中的遊標

Match All Query

最簡單的,匹配所有(_score=1.0)

GET /_search
{
    "query": {
        "match_all": {}//不匹配:match_none
    }
}
//自定義打分:"boost" : 1.2 

返回結果:

  • took 花費多少毫秒
  • time_out 是否超時,若超時將得到部分結果或得不到任何結果
  • _shards 分片狀態
    • total 總共分片數,主分片???
    • successful 查詢成功的分片數
    • failed 查詢失敗的分片數,因在查詢過程一些錯誤或異常發生
  • hits 查詢結果

    • total 匹配查詢的總文件數
    • max_score 最大得分的文件的得分數,若no match scoring was computed 通常是1
    • hits , list result

    文件結果中最常見的欄位:

    • _index 哪個索引中
    • _type 哪個type中
    • _id 文件id
    • _source ,doc source,預設返回,可禁止返回
    • _score ,doc query score
    • sort ,the values that are used to sort, if the docs are sorted.
    • highlight , the highlighted segments, if highlighting was requested.
    • fields ,some fields can be retrieved without the need to fetch all the source objects.

3種查詢方式、查詢引數

http://<server>/_search
查詢所有index、type
http://<server>/<index_name(s)>/_search
查詢多個index,逗號分隔
http://<server>/<index_name(s)>/<type_name(s)>/_search
查詢多個index,多個type,逗號分隔

如上,索引名 <==> 別名

The core query is usually contained in the body of the GET/POST call, but a lot of options can
also be expressed as URI query parameters, as follows:

  • q ,perform simple string queries
    …_search?q=欄位名:欄位值
  • df(default feld,預設欄位)
    …_search?df=欄位名&q=欄位值
  • from,預設0。the start index of the hits.
  • size,預設10。 the number of hits to be returned. 是總返回,還是每個分片?
    from+size
    from + size ≤ index.max_result_window(1w)

    ?size=10&from=10000 是每個分片返回10000,彙總後再取前10。

  • analyzer ,the default analyzer to be used.

  • default_operator(預設 or)
  • explain , return information on how the score is calculated
  • fields, defne felds that must be returned。若store=true則直接取,否則從_source解析。
  • sort(預設score asc)
  • timeout,預設no active。 If a timeout is fred, all the hits accumulated are returned.
  • track_scores 預設false。When sorting on a field, scores are not computed. By setting track_scores to true, scores will still be computed and tracked.

    When sorting, the relevant sorted field values are loaded into memory. This means that per shard, there should be enough memory to contain them.

  • pretty ,若true,則 the results will be pretty printed.

query body引數:

  • query 查詢語句
  • from+size 控制小(淺)分頁
  • sort
  • post_filter ,flter out the query results without affecting the facet count.

    applied to the search hits at the very end of a search request, after aggs have already been calculated.

  • _source,control the returned source.可禁用(false),部分欄位(obj.*) ,或 multiple exclude/include.

  • fielddata_fields ---->docvalue_fields(ES 5.0)
  • fields -->stored_fields(ES 5.0) ,store不再從_source解析。
  • facets --> (1.0 deprecated , 2.0 removed)使用aggs替代
  • aggs
  • index_boost ,per-index boost value.
  • highlighting
  • version 預設false,true則返回結果中新增doc version
  • using a secondary (usually more costly) algorithm, instead of applying the costly algorithm to all documents in the index.

    A rescore request is executed on each shard before it returns its results…

  • min_score 若設定,則小於該分數的doc將不返回

  • explain , how the TD/IF score.
  • script_fields
  • suggest
  • search_type
  • scroll

Full text queries

全文檢索。
They understand how the field being queried is analyzed and will apply each field’s analyzer (or search_analyzer) to the query string before executing.

1 Match

fuzzy、phrase、proximity

accepts text/numerics/dates, analyzes them, and constructs a query.

{
    "match" : {
        "message" : "this is a test"
    }//message是一個欄位名稱,可為任意欄位,包括_all
}

三種類型match query:boolean(預設)、phrase、match_phrase_prefix

1 boolean匹配查詢
引數:

  • operator,or(預設)、and。or匹配其中一個,and是匹配所有。
  • analyzer ,預設是field mapping definition中的,或 default search analyzer,定義分析查詢文字用到的analyzer。
  • fuzziness 構建模糊查詢,對string值區間在0..1,值設定相似度。
  • prefix_length 控制模糊查詢的行為,指明區分項的共同字首長度,預設是0。
  • lenient,預設false,可設定true,to ignore exceptions caused by data-type mismatches。如query numeric field with text query string.

  • zero_terms_query, none(預設),all 。If the analyzer used removes all tokens in a query like a stop filter does, the default behavior is to match no documents at all.

  • cutoff_frequency 相對(0..1)或絕對值(≥0),per-shard-level.。 Allows handling stopwords dynamically at runtime。
  • minimum_should_match,

TODO

prefix_length max_expansions

numeric, date and IPv4 fields

fuzziness is interpreted as a +/- margin.
-fuzziness <= field value <= +fuzziness
numeric: 2 or 2.0
date: 毫秒或字串形式的"2h"
ip: long or another IPv4 address (which will be converted into a long).

string fields

fuzziness is interpreted as a Levenshtein Edit Distance ----the number of one character changes that need to be made to one string to make it the same as another string.
值:0, 1, 2,允許的最大edits.
值:AUTO,依據term長度。
0..2 必須完全匹配;
3..5 最大允許1個;
>5  最大允許2個
{
    "match" : {
        "message" : {//message是欄位名稱
            "query" : "this is a test",
            "operator" : "and"
        }
    }
}

2 phrase

{
    "match_phrase" : {//
        "message" : "this is a test"
    }
}

only a type of a match query, it can also be used in the following manner:

{
    "match" : {
        "message" : {
            "query" : "this is a test",
            "type" : "phrase" //
        }
    }
}

可指定引數:

  • slop 間隔幾個詞。
  • analyzer

3 match_phrase_prefix
same as match_phrase, except allows for prefix matches on the last term in the text.

{
    "match_phrase_prefix" : {//
        "message" : "this is a test"
    }
}
等價於:
{
    "match" : {
        "message" : {
            "query" : "this is a test",
            "type" : "phrase_prefix"//
        }
    }
}

It accepts the same parameters as the phrase type. In addition, it also accepts a max_expansions parameter that can control to how many prefixes the last term will be expanded.

{
    "match_phrase_prefix" : {
        "message" : {
            "query" : "this is a test",
            "max_expansions" : 10 //
        }
    }
}

Comparison to query_string / field
The match family of queries does not go through a “query parsing” process. It does not support field name prefixes, wildcard characters, or other “advanced” features. For this reason, chances of it failing are very small / non existent, and it provides an excellent behavior when it comes to just analyze and run that text as a query behavior (which is usually what a text search box does). Also, the phrase_prefix type can provide a great “as you type” behavior to automatically load search results.

2 Match Phrase

like Match,but matching exact phrases or word proximity matches

3 Match Phrase Prefix

like Match Phrase,but does a wildcard search on the final word.

4 Multi Match

The multi-field version of the match query.

5 Common Terms

6 Query String

7 Simple Query String

Term level queries

1 term && terms

注意:對字串,要麼是分詞後的最小單元,要麼是不分詞。

POST myindex
{
    "mappings": {
        "mytype":{
            "properties": {

                "no":{
                    "type": "integer"
                }
            }

        }
    }
}
POST myindex/mytype
{
    "no":2 //8 2 6
}
GET myindex/mytype/_search

GET myindex/mytype/_search
{
   "query": {
      "bool": {
         "should": [
            {
               "term": {
                  "no": {
                     "value": "2"
                  }
               }
            },
            {
               "term": {
                  "no": {
                     "value": "6"
                  }
               }
            }
         ]
      }
   }
}
GET myindex/mytype/_search
{
   "query": {
      "bool": {
         "should": [
            {
               "terms": {
                  "no": [2,6] //查詢欄位的多個值
               }
            }
         ]
      }
   }
}

通過其它索引,查詢欄位的取值範圍:

PUT /users/user/2
{
    "followers" : ["1", "3"]
}

PUT /tweets/tweet/1
{
    "user" : "1"
}

GET /tweets/_search
{
    "query" : {
        "terms" : {
            "user" : {//其值,從users索引,user型別,id為2的doc中,followers欄位中查詢值
                "index" : "users",
                "type" : "user",
                "id" : "2",
                "path" : "followers"
            }
        }
    }
}

3 Range

"range" : { //gte、lte、boost
    "date" : {"gte" : "now-1d/d"}
}

"range" : {//date欄位的range
    "born" : {
        "gte": "01/01/2012",
        "lte": "2013",
        "format": "dd/MM/yyyy||yyyy"  //或關係,指定2種日期格式
    }
}

range query中考慮時區:

"range" : {
    "timestamp" : {
        "gte": "2015-01-01 00:00:00", //提供的"東1區"的時間,實際為"2014-12-31T23:00:00 UTC",
        "lte": "now", //now 不受time_zone影響,dates must be stored as UTC
        "time_zone": "+01:00" //東1區
    }
}

4 Exists

    "query": {
        "exists" : { "field" : "user" }
    }

非null的幾種情況:

{ "user": "" }   空不是null 
{ "user": "-" }   分詞後為空不是null, Even though the standard analyzer would emit zero tokens, the original field is non-null.
{ "user": ["jane", null ] }  至少一個要素不是null,At least one non-null value is required.

null的幾種情況:

{ "user": null } 
{ "user": [] }     無元素,no values
{ "user": [null] }  At least one non-null value is required.
{ "foo":  "bar" }  user欄位壓根不存在

null_value mapping:

"user": {
"type": "text",
"null_value": "_null_"  //
}

explicit null values would be indexed as the string _null_,如下就是顯示的宣告null:

{ "user": null }
{ "user": [null] }

其它的非顯式 null,將不替換,但仍不匹配exists查詢:

{ "user": [] }
{ "foo": "bar" }

missing query

“must_not” 包裝 exists

5 Prefix Query

fields contain terms with a specified prefix (not analyzed).

"query": {
    "prefix" : { "user" : "ki" }
}

可選:”boost” : 2.0

6 Wildcard (萬用字元)

fields match a wildcard expression (not analyzed).
萬用字元:

  • * , 匹配any character sequence
  • ? , 匹配any single character

注意: this query can be slow, as it needs to iterate over many terms. A wildcard term should not start with one of the wildcards * or ?.

7 Regexp

8 Fuzzy (模糊)

警告:Deprecated in 5.0removed in 6.0,Use match queries with fuzziness instead.

簡易:

"query": {
   "fuzzy" : { "user" : "ki" }
}

高階:

"query": {
    "fuzzy" : {
        "user" : {
            "value" :         "ki",
            "boost" :         1.0,
            "fuzziness" :     2, //1. 
            "prefix_length" : 0, //2.
            "max_expansions": 100 //3.
        }
    }
}
  • fuzziness. The max edit distance(最大編輯距離). 參考
    • text or keyword fields, fuzziness, the number of one character changes that need to be made to one string to make it the same as another string. 其值可以是0,1,2,3
    • 預設值auto,依賴欄位的長度自動生成。
      • 長度[0..2] ,must match exactly
      • 長度[3..5] , one edit allowed
      • 長度>5 ,two edits allowed
  • prefix_length. The number of initial characters which will not be “fuzzified”. 預設0。
  • max_expansions. The maximum number of terms that the fuzzy query will expand to. 預設50. 最多利用fuzzy擴充套件、模糊出的terms。

注意:prefix_length=0,max_expansions非常高,將導致嚴重的效能問題。

9 Type (無意義)

GET /_search
{
    "query": {
        "type" : {
            "value" : "my_type"
        }
    }
}
 GET _all/my_type/_search

10 Ids 【 _uid 】

通過doc id過濾,這種查詢使用_uid({type}#{id})

GET /_search
{
    "query": {
        "ids" : {
            "type" : "my_type", //可選項,可單值或陣列
            "values" : ["1", "4", "100"]
        }
    }
}

5 Compound queries

wrap other compound or leaf queries,either to combine their results and scores,to change their behaviour, or to switch from query to filter context.

1 constant_score (filter context,固定打分值)

wraps another query, but executes it in filter context. All matching documents are given the same “constant” _score.

2 bool

combin multiple leaf or compound query clauses。
The must and should clauses have their scores combined — the more matching clauses, the better — while the must_not and filter clauses are executed in filter context.

3 dis_max (best match)

bool 查詢是匹配組合的多個條件,而dis_max查詢是匹配任一條件,僅返回最佳匹配的。

4 function_score

Modify the scores returned by the main query with functions to take into account factors like popularity, recency, distance, or custom algorithms implemented with scripting.

5 boosting

6 indices query

6 Joining queries

1 Nested

2 Has Child

3 Has Parent

4 Parent Id

7 Geo queries

1 GeoShape Query

2 Geo Bounding Box

3 Geo Distance

4 Geo Distance Range

5 Geo Polygon

8 Specialized queries

1 More Like This

2 Template

3 Script

4 Percolate

9 Span queries

1 Span Term

2 Span Multi Term

3 Span First

4 Span Near

5 Span Or

6 Span Not

7 Span Containing

8 Span Within

9 Span Field Masking

10 Minimum Should Match

11 Multi Term Query Rewrite