《Elasticsearch權威指南》案例集 之 深入搜尋
阿新 • • 發佈:2020-12-15
精確值查詢:
GET /my_store/products/_search { "query" : { "constant_score" : { "filter" : { "term" : { "price" : 20 } } } } } ### 以下查詢是否能查到結果和文件索引的方式有關 GET /my_store/products/_search { "query" : {"constant_score" : { "filter" : { "term" : { "productID" : "XHDK-A-1293-#fJ3" } } } } } ### 要將其設定成 not_analyzed 無需分析的才能查到 DELETE /my_store PUT /my_store { "mappings" : { "products" : { "properties" : { "productID" : { "type" : "string", "index" : "not_analyzed" } } } } }
組合過濾器:
### 布林過濾器 { "bool" : { "must" : [], "should" : [], "must_not" : [], } } ### 示例 GET /my_store/products/_search {"query" : { "filtered" : { "filter" : { "bool" : { "should" : [ { "term" : {"price" : 20}}, { "term" : {"productID" : "XHDK-A-1293-#fJ3"}} ], "must_not" : { "term" : {"price" : 30} } } } } } } ### 巢狀布林過濾器 GET /my_store/products/_search { "query" : { "filtered" : { "filter" : { "bool" : { "should" : [ { "term" : {"productID" : "KDKE-B-9947-#kL5"}}, { "bool" : { "must" : [ { "term" : {"productID" : "JODL-X-1937-#pV7"}}, { "term" : {"price" : 30}} ] }} ] } } } } }
查詢多個精確值:
### terms { "terms" : { "price" : [20, 30] } } #### 示例 GET /my_store/products/_search { "query" : { "constant_score" : { "filter" : { "terms" : { "price" : [20, 30] } } } } } ### 一定要了解 term 和 terms 是 包含(contains) 操作,而非 等值(equals) (判斷 ### 精確相等 ### 最好的方式是增加並索引另一個欄位, 這個欄位用以儲存該欄位包含詞項的數量 { "tags" : ["search"], "tag_count" : 1 } { "tags" : ["search", "open_source"], "tag_count" : 2 } GET /my_index/my_type/_search { "query": { "constant_score" : { "filter" : { "bool" : { "must" : [ { "term" : { "tags" : "search" } }, { "term" : { "tag_count" : 1 } } ] } } } } }
範圍:
### gt: > 大於(greater than) ### lt: < 小於(less than) ### gte: >= 大於或等於(greater than or equal to) ### lte: <= 小於或等於(less than or equal to) "range" : { "price" : { "gte" : 20, "lte" : 40 } } #### 示例 GET /my_store/products/_search { "query" : { "constant_score" : { "filter" : { "range" : { "price" : { "gte" : 20, "lt" : 40 } } } } } } ### 日期範圍 "range" : { "timestamp" : { "gt" : "2014-01-01 00:00:00", "lt" : "2014-01-07 00:00:00" } } ### 過去一小時 "range" : { "timestamp" : { "gt" : "now-1h" } } ### 早於 2014 年 1 月 1 日加 1 月 "range" : { "timestamp" : { "gt" : "2014-01-01 00:00:00", "lt" : "2014-01-01 00:00:00||+1M" } } ### 字串範圍 ### 查詢從 a 到 b (不包含)的字串 "range" : { "title" : { "gte" : "a", "lt" : "b" } }
處理Null值:
### 存在查詢 GET /my_index/posts/_search { "query" : { "constant_score" : { "filter" : { "exists" : { "field" : "tags" } } } } } ### 缺失查詢 GET /my_index/posts/_search { "query" : { "constant_score" : { "filter": { "missing" : { "field" : "tags" } } } } } ### 物件上的存在與缺失 ##### 物件示例 { "name" : { "first" : "John", "last" : "Smith" } } ### 過濾操作 { "exists" : { "field" : "name" } } ### 實際執行的是 { "bool": { "should": [ { "exists": { "field": "name.first" }}, { "exists": { "field": "name.last" }} ] } }
匹配查詢:
GET /my_index/my_type/_search { "query": { "match": { "title": "QUICK!" } } }
多詞查詢:
### 多詞查詢 GET /my_index/my_type/_search { "query": { "match": { "title": "BROWN DOG!" } } } ### 提高精度 GET /my_index/my_type/_search { "query": { "match": { "title": { "query": "BROWN DOG!", "operator": "and" } } } } ### 控制精度 GET /my_index/my_type/_search { "query": { "match": { "title": { "query": "quick brown dog", "minimum_should_match": "75%" } } } }
組合查詢:
### 組合查詢 GET /my_index/my_type/_search { "query": { "bool": { "must": { "match": { "title": "quick" }}, "must_not": { "match": { "title": "lazy" }}, "should": [ { "match": { "title": "brown" }}, { "match": { "title": "dog" }} ] } } } ### 控制精度 ### minimum_should_match可以設定為某個具體數字,更常用的做法是將其設定為一個百分數 ### 這個查詢結果會將所有滿足以下條件的文件返回: title 欄位包含 "brown" AND "fox" 、 "brown" AND "dog" 或 "fox" AND "dog" 。如果有文件包含所有三個條件,它會比只包含兩個的文件更相關。 GET /my_index/my_type/_search { "query": { "bool": { "should": [ { "match": { "title": "brown" }}, { "match": { "title": "fox" }}, { "match": { "title": "dog" }} ], "minimum_should_match": 2 } } }
如何使用布林匹配:
### 以下兩個查詢等價 ## 查詢1 { "match": { "title": "brown fox"} } ## 查詢2 { "bool": { "should": [ { "term": { "title": "brown" }}, { "term": { "title": "fox" }} ] } } ### 以下兩個查詢等價 ## 查詢3 { "match": { "title": { "query": "brown fox", "operator": "and" } } } ## 查詢4 { "bool": { "must": [ { "term": { "title": "brown" }}, { "term": { "title": "fox" }} ] } } ### 以下兩個查詢等價 ## 查詢5 { "match": { "title": { "query": "quick brown fox", "minimum_should_match": "75%" } } } ## 查詢6 ### 因為只有三條語句,match 查詢的引數 minimum_should_match 值 75% 會被截斷成 2 。即三條 should 語句中至少有兩條必須匹配。 { "bool": { "should": [ { "term": { "title": "brown" }}, { "term": { "title": "fox" }}, { "term": { "title": "quick" }} ], "minimum_should_match": 2 (1) } }
查詢語句提升權重:
GET /_search { "query": { "bool": { "must": { "match": { (1) "content": { "query": "full text search", "operator": "and" } } }, "should": [ { "match": { "content": { "query": "Elasticsearch", "boost": 3 (2) } }}, { "match": { "content": { "query": "Lucene", "boost": 2 (3) } }} ] } } }
控制分析:
GET /my_index/_analyze { "field": "my_type.title", "text": "Foxes" } GET /my_index/my_type/_validate/query?explain { "query": { "bool": { "should": [ { "match": { "title": "Foxes"}}, { "match": { "english_title": "Foxes"}} ] } } }
多字串查詢:
GET /_search { "query": { "bool": { "should": [ { "match": { "title": "War and Peace" }}, { "match": { "author": "Leo Tolstoy" }} ] } } } GET /_search { "query": { "bool": { "should": [ { "match": { "title": "War and Peace" }}, { "match": { "author": "Leo Tolstoy" }}, { "bool": { "should": [ { "match": { "translator": "Constance Garnett" }}, { "match": { "translator": "Louise Maude" }} ] }} ] } } } ### 語句的優先順序 GET /_search { "query": { "bool": { "should": [ { "match": { "title": { "query": "War and Peace", "boost": 2 }}}, { "match": { "author": { "query": "Leo Tolstoy", "boost": 2 }}}, { "bool": { "should": [ { "match": { "translator": "Constance Garnett" }}, { "match": { "translator": "Louise Maude" }} ] }} ] } } }
最佳欄位:
### dis_max 查詢:將任何與任一查詢匹配的文件作為結果返回,但只將最佳匹配的評分作為查詢的評分結果返回 { "query": { "dis_max": { "queries": [ { "match": { "title": "Brown fox" }}, { "match": { "body": "Brown fox" }} ] } } }
最佳欄位查詢調優:
### 通過指定 tie_breaker 這個引數將其他匹配語句的評分也考慮其中 { "query": { "dis_max": { "queries": [ { "match": { "title": "Quick pets" }}, { "match": { "body": "Quick pets" }} ], "tie_breaker": 0.3 } } }
multi_match查詢:
### best_fields 、 most_fields 和 cross_fields (最佳欄位、多數字段、跨欄位) ### 以下兩個查詢等價 ## 查詢1 { "dis_max": { "queries": [ { "match": { "title": { "query": "Quick brown fox", "minimum_should_match": "30%" } } }, { "match": { "body": { "query": "Quick brown fox", "minimum_should_match": "30%" } } }, ], "tie_breaker": 0.3 } } ## 查詢2 { "multi_match": { "query": "Quick brown fox", "type": "best_fields", "fields": [ "title", "body" ], "tie_breaker": 0.3, "minimum_should_match": "30%" } } ### 查詢欄位名稱的模糊匹配 { "multi_match": { "query": "Quick brown fox", "fields": "*_title" } } ### 提升單個欄位的權重 { "multi_match": { "query": "Quick brown fox", "fields": [ "*_title", "chapter_title^2" ] } }
多數字段:
GET /my_index/_search { "query": { "multi_match": { "query": "jumping rabbits", "type": "most_fields", "fields": [ "title", "title.std" ] } } } ### 權重控制 GET /my_index/_search { "query": { "multi_match": { "query": "jumping rabbits", "type": "most_fields", "fields": [ "title^10", "title.std" ] } } }
跨欄位實體搜尋:
### 查詢每個欄位並將每個欄位的匹配評分結果相加 { "query": { "multi_match": { "query": "Poland Street W1V", "type": "most_fields", "fields": [ "street", "city", "country", "postcode" ] } } }
自定義 _all 欄位:
### copy_to PUT /my_index { "mappings": { "person": { "properties": { "first_name": { "type": "string", "copy_to": "full_name" }, "last_name": { "type": "string", "copy_to": "full_name" }, "full_name": { "type": "string" } } } } }
cross-fields跨欄位查詢:
GET /books/_search { "query": { "multi_match": { "query": "peter smith", "type": "cross_fields", "fields": [ "title^2", "description" ] } } }
短語匹配:
GET /my_index/my_type/_search { "query": { "match_phrase": { "title": "quick brown fox" } } }
混合起來:
### slop 引數告訴 match_phrase 查詢詞條相隔多遠時仍然能將文件視為匹配 GET /my_index/my_type/_search { "query": { "match_phrase": { "title": { "query": "quick fox", "slop": 1 } } } }
多值欄位:
### 多值欄位示例 PUT /my_index/groups/1 { "names": [ "John Abraham", "Lincoln Smith"] } ### position_increment_gap 設定告訴 Elasticsearch 應該為陣列中每個新元素增加當前詞條 position 的指定值 PUT /my_index/_mapping/groups { "properties": { "names": { "type": "string", "position_increment_gap": 100 } } }
使用鄰近度提高相關度:
GET /my_index/my_type/_search { "query": { "bool": { "must": { "match": { "title": { "query": "quick brown fox", "minimum_should_match": "30%" } } }, "should": { "match_phrase": { "title": { "query": "quick brown fox", "slop": 50 } } } } } }
效能優化:
### 用重評分縮小視窗 優化使用鄰近度提高相關度】 GET /my_index/my_type/_search { "query": { "match": { "title": { "query": "quick brown fox", "minimum_should_match": "30%" } } }, "rescore": { "window_size": 50, "query": { "rescore_query": { "match_phrase": { "title": { "query": "quick brown fox", "slop": 50 } } } } } }
尋找相關詞:
PUT /my_index { "settings": { "number_of_shards": 1, "analysis": { "filter": { "my_shingle_filter": { "type": "shingle", "min_shingle_size": 2, "max_shingle_size": 2, "output_unigrams": false } }, "analyzer": { "my_shingle_analyzer": { "type": "custom", "tokenizer": "standard", "filter": [ "lowercase", "my_shingle_filter" ] } } } } } ### 測試分析器 GET /my_index/_analyze?analyzer=my_shingle_analyzer Sue ate the alligator ### 多欄位使用示例 PUT /my_index/_mapping/my_type { "my_type": { "properties": { "title": { "type": "string", "fields": { "shingles": { "type": "string", "analyzer": "my_shingle_analyzer" } } } } } }
郵編與結構化資料:
PUT /my_index { "mappings": { "address": { "properties": { "postcode": { "type": "string", "index": "not_analyzed" } } } } }
prefix字首查詢:
GET /my_index/address/_search { "query": { "prefix": { "postcode": "W1" } } }
萬用字元與正則表示式查詢:
GET /my_index/address/_search { "query": { "wildcard": { "postcode": "W?F*HW" } } } GET /my_index/address/_search { "query": { "regexp": { "postcode": "W[0-9].+" } } }
查詢時輸入即搜尋:
{ "match_phrase_prefix" : { "brand" : "johnnie walker bl" } } { "match_phrase_prefix" : { "brand" : { "query": "walker johnnie bl", "slop": 10 } } } { "match_phrase_prefix" : { "brand" : { "query": "johnnie walker bl", "max_expansions": 50 } } }
索引時輸入即搜尋:
PUT / my_index { "settings": { "number_of_shards": 1, "analysis": { "filter": { "autocomplete_filter": { "type": "edge_ngram", "min_gram": 1, "max_gram": 20 } }, "analyzer": { "autocomplete": { "type": "custom", "tokenizer": "standard", "filter": [ "lowercase", "autocomplete_filter" ] } } } } } ### 應用分析器 PUT / my_index / _mapping / my_type { "my_type": { "properties": { "name": { "type": "string", "analyzer": "autocomplete" } } } } ### 查詢 GET / my_index / my_type / _search { "query": { "match": { "name": "brown fo" } } } ### 查詢時設定分析器 GET / my_index / my_type / _search { "query": { "match": { "name": { "query": "brown fo", "analyzer": "standard" } } } } ### 對映時設定索引和查詢分析器 PUT / my_index / my_type / _mapping { "my_type": { "properties": { "name": { "type": "string", "index_analyzer": "autocomplete", "search_analyzer": "standard" } } } } ### 邊界 n-grams 與郵編 { "analysis": { "filter": { "postcode_filter": { "type": "edge_ngram", "min_gram": 1, "max_gram": 8 } }, "analyzer": { "postcode_index": { "tokenizer": "keyword", "filter": ["postcode_filter"] }, "postcode_search": { "tokenizer": "keyword" } } } }
相關度評分背後的理論:
### 禁用詞頻統計 PUT /my_index { "mappings": { "doc": { "properties": { "text": { "type": "string", "index_options": "docs" } } } } } ### 禁用歸一值 PUT /my_index { "mappings": { "doc": { "properties": { "text": { "type": "string", "norms": { "enabled": false } } } } } }
Lucene的實用評分函式:
### 禁用協調因子 GET /_search { "query": { "bool": { "disable_coord": true, "should": [ { "term": { "text": "jump" }}, { "term": { "text": "hop" }}, { "term": { "text": "leap" }} ] } } }
查詢時權重提升:
GET /_search { "query": { "bool": { "should": [ { "match": { "title": { "query": "quick brown fox", "boost": 2 ① } } }, { "match": { ② "content": "quick brown fox" } } ] } } } ### 提升索引權重 GET /docs_2014_*/_search ① { "indices_boost": { ② "docs_2014_10": 3, "docs_2014_09": 2 }, "query": { "match": { "text": "quick brown fox" } } }
使用查詢結構修改相關度:
### quick OR brown OR red OR fox GET /_search { "query": { "bool": { "should": [ { "term": { "text": "quick" }}, { "term": { "text": "brown" }}, { "term": { "text": "red" }}, { "term": { "text": "fox" }} ] } } } ### quick OR (brown OR red) OR fox GET /_search { "query": { "bool": { "should": [ { "term": { "text": "quick" }}, { "term": { "text": "fox" }}, { "bool": { "should": [ { "term": { "text": "brown" }}, { "term": { "text": "red" }} ] } } ] } } }
Not Quite Not:
### boosting 查詢 GET /_search { "query": { "boosting": { "positive": { "match": { "text": "apple" } }, "negative": { "match": { "text": "pie tart fruit crumble tree" } }, "negative_boost": 0.5 } } }
忽略 TF/IDF:
### constant_score 查詢 GET /_search { "query": { "bool": { "should": [{ "constant_score": { "query": { "match": { "description": "wifi" } } } }, { "constant_score": { "query": { "match": { "description": "garden" } } } }, { "constant_score": { "boost": 2 "query": { "match": { "description": "pool" } } } } ] } } }
按受歡迎度提升權重:
### 將點贊數與全文相關度評分結合 ### new_score = old_score * number_of_votes GET / blogposts / post / _search { "query": { "function_score": { "query": { "multi_match": { "query": "popularity", "fields": ["title", "content"] } }, "field_value_factor": { "field": "votes" } } } } ### modifier ### new_score = old_score * log(1 + number_of_votes) GET / blogposts / post / _search { "query": { "function_score": { "query": { "multi_match": { "query": "popularity", "fields": ["title", "content"] } }, "field_value_factor": { "field": "votes", "modifier": "log1p" } } } } ### factor ### new_score = old_score * log(1 + factor * number_of_votes) GET / blogposts / post / _search { "query": { "function_score": { "query": { "multi_match": { "query": "popularity", "fields": ["title", "content"] } }, "field_value_factor": { "field": "votes", "modifier": "log1p", "factor": 2 } } } } ### boost_mode ### multiply:評分 _score 與函式值的積(預設) ### sum:評分 _score 與函式值的和 ### min:評分 _score 與函式值間的較小值 ### max:評分 _score 與函式值間的較大值 ### replace:函式值替代評分 _score GET / blogposts / post / _search { "query": { "function_score": { "query": { "multi_match": { "query": "popularity", "fields": ["title", "content"] } }, "field_value_factor": { "field": "votes", "modifier": "log1p", "factor": 0.1 }, "boost_mode": "sum" } } } ### max_boost ### 無論 field_value_factor 函式的結果如何,最終結果都不會大於 1.5 ### max_boost 只對函式的結果進行限制,不會對最終評分 _score 產生直接影響 GET /blogposts/post/_search { "query": { "function_score": { "query": { "multi_match": { "query": "popularity", "fields": [ "title", "content" ] } }, "field_value_factor": { "field": "votes", "modifier": "log1p", "factor": 0.1 }, "boost_mode": "sum", "max_boost": 1.5 } } }
過濾集提升權重:
### multiply:函式結果求積(預設)。 ### sum:函式結果求和。 ### avg:函式結果的平均值。 ### max:函式結果的最大值。 ### min:函式結果的最小值。 ### first:使用首個函式(可以有過濾器,也可能沒有)的結果作為最終結果 GET /_search { "query": { "function_score": { "filter": { "term": { "city": "Barcelona" } }, "functions": [ { "filter": { "term": { "features": "wifi" }}, "weight": 1 }, { "filter": { "term": { "features": "garden" }}, "weight": 1 }, { "filter": { "term": { "features": "pool" }}, "weight": 2 } ], "score_mode": "sum", } } }
隨機評分:
### random_score 函式會輸出一個 0 到 1 之間的數,當種子 seed 值相同時,生成的隨機結果是一致的 ### 當然,如果增加了與查詢匹配的新文件,無論是否使用一致隨機,其結果順序都會發生變化 GET /_search { "query": { "function_score": { "filter": { "term": { "city": "Barcelona" } }, "functions": [ { "filter": { "term": { "features": "wifi" }}, "weight": 1 }, { "filter": { "term": { "features": "garden" }}, "weight": 1 }, { "filter": { "term": { "features": "pool" }}, "weight": 2 }, { "random_score": { "seed": "the users session id" } } ], "score_mode": "sum" } } }
越近越好:
### 支援linear 、 exp 和 gauss (線性、指數和高斯) ### origin:中心點 或欄位可能的最佳值,落在原點 origin 上的文件評分 _score 為滿分 1.0 。 ### scale:衰減率,即一個文件從原點 origin 下落時,評分 _score 改變的速度。(例如,每 £10 歐元或每 100 米)。 ### decay:從原點 origin 衰減到 scale 所得的評分 _score ,預設值為 0.5 。 ### offset:以原點 origin 為中心點,為其設定一個非零的偏移量 offset 覆蓋一個範圍,而不只是單個原點。在範圍 -offset <= origin <= +offset 內的所有評分 _score 都是 1.0 。 GET /_search { "query": { "function_score": { "functions": [ { "gauss": { "location": { ① "origin": { "lat": 51.5, "lon": 0.12 }, "offset": "2km", "scale": "3km" } } }, { "gauss": { "price": { ② "origin": "50", ③ "offset": "50", "scale": "20" } }, "weight": 2 ④ } ] } } }
指令碼評分:
GET /_search { "function_score": { "functions": [ { ...location clause... }, { ...price clause... }, { "script_score": { "params": { ② "threshold": 80, "discount": 0.1, "target": 10 }, "script": "price = doc['price'].value; margin = doc['margin'].value; if (price < threshold) { return price * margin / target }; return price * (1 - discount) * margin / target;" } } ] } }
更改相似度:
## 相似度演算法可以按欄位指定,只需在對映中為不同欄位選定即可 PUT /my_index { "mappings": { "doc": { "properties": { "title": { "type": "string", "similarity": "BM25" ① }, "body": { "type": "string", "similarity": "default" ② } } } } ### 配置 BM25 PUT /my_index { "settings": { "similarity": { "my_bm25": { ① "type": "BM25", "b": 0 ② } } }, "mappings": { "doc": { "properties": { "title": { "type": "string", "similarity": "my_bm25" ③ }, "body": { "type": "string", "similarity": "BM25" ④ } } } } }