elasticsearch(14) best_field策略和most_field策略

阿新 • • 發佈：2018-12-17

從best-fields換成most-fields策略 best-fields策略，主要是說將某一個field匹配儘可能多的關鍵詞的doc優先返回回來 most-fields策略，主要是說盡可能返回更多field匹配到某個關鍵詞的doc，優先返回回來

POST /forum/_mapping/article { "properties": { "sub_title": { "type": "string", "analyzer": "english", "fields": { "std": { "type": "string", "analyzer": "standard" } } } } }

POST /forum/article/_bulk { "update": { "_id": "1"} } { "doc" : {"sub_title" : "learning more courses"} } { "update": { "_id": "2"} } { "doc" : {"sub_title" : "learned a lot of course"} } { "update": { "_id": "3"} } { "doc" : {"sub_title" : "we have a lot of fun"} } { "update": { "_id": "4"} } { "doc" : {"sub_title" : "both of them are good"} } { "update": { "_id": "5"} } { "doc" : {"sub_title" : "haha, hello world"} }

GET /forum/article/_search { "query": { "match": { "sub_title": "learning courses" } } }

{ "took": 3, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 2, "max_score": 1.219939, "hits": [ { "_index": "forum", "_type": "article", "_id": "2", "_score": 1.219939, "_source": { "articleID": "KDKE-B-9947-#kL5", "userID": 1, "hidden": false, "postDate": "2017-01-02", "tag": [ "java" ], "tag_cnt": 1, "view_cnt": 50, "title": "this is java blog", "content": "i think java is the best programming language", "sub_title": "learned a lot of course" } }, { "_index": "forum", "_type": "article", "_id": "1", "_score": 0.5063205, "_source": { "articleID": "XHDK-A-1293-#fJ3", "userID": 1, "hidden": false, "postDate": "2017-01-01", "tag": [ "java", "hadoop" ], "tag_cnt": 2, "view_cnt": 30, "title": "this is java and elasticsearch blog", "content": "i like to write best elasticsearch article", "sub_title": "learning more courses" } } ] } }

sub_title用的是enligsh analyzer，所以還原了單詞

為什麼，因為如果我們用的是類似於english analyzer這種分詞器的話，就會將單詞還原為其最基本的形態，stemmer learning --> learn learned --> learn courses --> course

sub_titile: learning coureses --> learn course

{ "doc" : {"sub_title" : "learned a lot of course"} }，就排在了{ "doc" : {"sub_title" : "learning more courses"} }的前面

GET /forum/article/_search { "query": { "match": { "sub_title": "learning courses" } } }

很繞。。。。我自己都覺得很繞

很多東西，你看文字就覺得很繞，然後用語言去表述，也很繞，但是我覺得，用語言去說，相對來說會好一點點

GET /forum/article/_search { "query": { "multi_match": { "query": "learning courses", "type": "most_fields", "fields": [ "sub_title", "sub_title.std" ] } } }

{ "took": 2, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 2, "max_score": 1.219939, "hits": [ { "_index": "forum", "_type": "article", "_id": "2", "_score": 1.219939, "_source": { "articleID": "KDKE-B-9947-#kL5", "userID": 1, "hidden": false, "postDate": "2017-01-02", "tag": [ "java" ], "tag_cnt": 1, "view_cnt": 50, "title": "this is java blog", "content": "i think java is the best programming language", "sub_title": "learned a lot of course" } }, { "_index": "forum", "_type": "article", "_id": "1", "_score": 1.012641, "_source": { "articleID": "XHDK-A-1293-#fJ3", "userID": 1, "hidden": false, "postDate": "2017-01-01", "tag": [ "java", "hadoop" ], "tag_cnt": 2, "view_cnt": 30, "title": "this is java and elasticsearch blog", "content": "i like to write best elasticsearch article", "sub_title": "learning more courses" } } ] } }

你問我，具體的分數怎麼算出來的，很難說，因為這個東西很複雜，還不只是TF/IDF演算法。因為不同的query，不同的語法，都有不同的計算score的細節。

與best_fields的區別

（1）best_fields，是對多個field進行搜尋，挑選某個field匹配度最高的那個分數，同時在多個query最高分相同的情況下，在一定程度上考慮其他query的分數。簡單來說，你對多個field進行搜尋，就想搜尋到某一個field儘可能包含更多關鍵字的資料

優點：通過best_fields策略，以及綜合考慮其他field，還有minimum_should_match支援，可以儘可能精準地將匹配的結果推送到最前面缺點：除了那些精準匹配的結果，其他差不多大的結果，排序結果不是太均勻，沒有什麼區分度了

實際的例子：百度之類的搜尋引擎，最匹配的到最前面，但是其他的就沒什麼區分度了

（2）most_fields，綜合多個field一起進行搜尋，儘可能多地讓所有field的query參與到總分數的計算中來，此時就會是個大雜燴，出現類似best_fields案例最開始的那個結果，結果不一定精準，某一個document的一個field包含更多的關鍵字，但是因為其他document有更多field匹配到了，所以排在了前面；所以需要建立類似sub_title.std這樣的field，儘可能讓某一個field精準匹配query string，貢獻更高的分數，將更精準匹配的資料排到前面

優點：將盡可能匹配更多field的結果推送到最前面，整個排序結果是比較均勻的缺點：可能那些精準匹配的結果，無法推送到最前面

實際的例子：wiki，明顯的most_fields策略，搜尋結果比較均勻，但是的確要翻好幾頁才能找到最匹配的結果

elasticsearch(14) best_field策略和most_field策略

elasticsearch(14) best_field策略和most_field策略

本地策略和組策略，更改安全設定和使用者許可權分配相容性問題

hadoop預設對3個副本的儲存策略和執行策略：

Memcache 內存分配策略和性能(使用)狀態檢查

第二百七十四節，同源策略和跨域訪問

淺談java內存分配和回收策略

hibernate查詢方式和查詢策略

李笑來老師在《把時間當作朋友》曾說過：“所有學習上的成功，都只靠兩件事：策略和堅持，而堅持本身就應該是最重要的策略之一

MySQL的SQL執行性能分析以及性能優化策略和步驟

Redis數據過期和淘汰策略詳解(轉)

深入理解java虛擬機---垃圾收集器和分配策略-1

[原創]Oracle 12c的備份和恢復策略

JS實現的ajax和同源策略

uva 1608 不無聊的序列（附帶常用算法設計和優化策略總結）

Django 【第十九篇】JS實現的ajax、同源策略和前端jsonp解決跨域問題

常用算法設計和優化策略（本蒟蒻不定期更新）

數據庫Sharding的基本思想和切分策略

【Hibernate（二）】持久化類、主鍵生成策略和一級快取

ElasticSearch教程——best fields,most fields策略

Hibernate載入策略和併發處理

elasticsearch(14) best_field策略和most_field策略

相關推薦