1. 程式人生 > 其它 >Elasticsearch 論壇實戰-基於dis_max實現best fields策略進行多欄位搜尋

Elasticsearch 論壇實戰-基於dis_max實現best fields策略進行多欄位搜尋

技術標籤:Elasticsearch實戰elasticsearch

Elasticsearch實戰

準備資料

PUT /forum/post/_bulk
{"index":{"_id":1}}
{"title":"java php", "content":" kibana forum open MIjMReACTGaN564AnCZuHg"}
{"index":{"_id":2}}
{"title":"elasticsearch php", "content":"post open 4508327"}
{"index":{"_id":3}}
{"title":"elasticsearch hadoop", "content":"java kibana green open"}

執行如下查詢可觀察結果

GET /forum/post/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "match": {
            "title": "java kibana"
          }
        },
        {
          "match": {
            "content": "java kibana"
          }
        }
      ]
    }
  }
}

結果分析

#! Deprecation: [types removal] Specifying types in search requests is deprecated.
{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 1.43398,
    "hits" : [
      {
        "_index" : "forum",
        "_type" : "post",
        "_id" : "1",
        "_score" : 1.43398,
        "_source" : {
          "title" : "java php",
          "content" : " kibana forum open MIjMReACTGaN564AnCZuHg"
        }
      },
      {
        "_index" : "forum",
        "_type" : "post",
        "_id" : "3",
        "_score" : 1.3988109,
        "_source" : {
          "title" : "elasticsearch hadoop",
          "content" : "java kibana green open"
        }
      }
    ]
  }
}

期望的是doc3(content裡面java kibana都匹配到了),結果是doc1排在了前面

計算每個document的relevance score:每個query的分數,乘以matched query數量,除以總query數量

算一下doc4的分數

"match": {"title": "java kibana"},針對doc1,是有一個分數的
"match": {"content": "java kibana"},針對doc1,也是有一個分數的

所以是兩個分數加起來,比如說,1.1 + 1.2 = 2.3
matched query數量 = 2
總query數量 = 2

2.3 * 2 / 2 = 2.3

算一下doc3的分數

"match": {"title": "java kibana"},針對doc3,是有一個分數的
"match": {"content": "java kibana"},針對doc3,也是有一個分數的

所以說,只有一個query是有分數的,比如2.3
matched query數量 = 1
總query數量 = 2

2.3 * 1 / 2 = 1.15

doc3的分數 = 1.15 < doc1的分數 = 2.3

解決方案

best fields策略,就是說,搜尋到的結果,應該是某一個field中匹配到了儘可能多的關鍵詞,被排在前面;而不是儘可能多的field匹配到了少數的關鍵詞,排在了前面

dis_max語法,直接取多個query中,分數最高的那一個query的分數即可

GET /forum/post/_search
{
  "query": {
    "dis_max": {
      "queries": [
        {
          "match": {
            "title": "java kibana"
          }
        },
        {
          "match": {
            "content": "java kibana"
          }
        }
      ]
    }
  }
}

#! Deprecation: [types removal] Specifying types in search requests is deprecated.
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 1.3988109,
"hits" : [
{
"_index" : "forum",
"_type" : "post",
"_id" : "3",
"_score" : 1.3988109,
"_source" : {
"title" : "elasticsearch hadoop",
"content" : "java kibana green open"
}
},
{
"_index" : "forum",
"_type" : "post",
"_id" : "1",
"_score" : 0.9808291,
"_source" : {
"title" : "java php",
"content" : " kibana forum open MIjMReACTGaN564AnCZuHg"
}
}
]
}
}

歡迎訪問我的個人部落格:小馬部落格

如果有疑問,歡迎諮詢公眾號《小馬JAVA》