ElasticSearch的match和match_phrase查詢

阿新 • • 發佈：2018-12-25

問題：

索引中有『第十人民醫院』這個欄位，使用IK分詞結果如下 :

POST http://localhost:9200/development_hospitals/_analyze?pretty&field=hospital.names&analyzer=ik

{
  "tokens": [
    {
      "token": "第十",
      "start_offset": 0,
      "end_offset": 2,
      "type": "CN_WORD",
      "position": 0
    },
    {
      "token": "十人",
      "start_offset": 1,
      "end_offset": 3,
      "type": "CN_WORD",
      "position": 1
    },
    {
      "token": "十",
      "start_offset": 1,
      "end_offset": 2,
      "type": "TYPE_CNUM",
      "position": 2
    },
    {
      "token": "人民醫院",
      "start_offset": 2,
      "end_offset": 6,
      "type": "CN_WORD",
      "position": 3
    },
    {
      "token": "人民",
      "start_offset": 2,
      "end_offset": 4,
      "type": "CN_WORD",
      "position": 4
    },
    {
      "token": "人",
      "start_offset": 2,
      "end_offset": 3,
      "type": "COUNT",
      "position": 5
    },
    {
      "token": "民醫院",
      "start_offset": 3,
      "end_offset": 6,
      "type": "CN_WORD",
      "position": 6
    },
    {
      "token": "醫院",
      "start_offset": 4,
      "end_offset": 6,
      "type": "CN_WORD",
      "position": 7
    }
  ]
}

使用Postman構建match查詢：

可以得到結果，但是使用match_phrase查詢『第十』卻沒有任何結果

問題分析：

參考文件 The Definitive Guide [2.x] | Elastic

phrase搜尋跟關鍵字的位置有關, 『第十』採用ik_max_word分詞結果如下

{
  "tokens": [
    {
      "token": "第十",
      "start_offset": 0,
      "end_offset": 2,
      "type": "CN_WORD",
      "position": 0
    },
    {
      "token": "十",
      "start_offset": 1,
      "end_offset": 2,
      "type": "TYPE_CNUM",
      "position": 1
    }
  ]
}

雖然『第十』和『十』都可以命中，但是match_phrase的特點是分詞後的相對位置也必須要精準匹配，『第十人民醫院』採用id_max_word分詞後，『第十』和『十』之間有一個『十人』，所以無法命中。

解決方案：

採用ik_smart分詞可以避免這樣的問題，對『第十人民醫院』和『第十』採用ik_smart分詞的結果分別是：

{
  "tokens": [
    {
      "token": "第十",
      "start_offset": 0,
      "end_offset": 2,
      "type": "CN_WORD",
      "position": 0
    },
    {
      "token": "人民醫院",
      "start_offset": 2,
      "end_offset": 6,
      "type": "CN_WORD",
      "position": 1
    }
  ]
}

{
  "tokens": [
    {
      "token": "第十",
      "start_offset": 0,
      "end_offset": 2,
      "type": "CN_WORD",
      "position": 0
    }
  ]
}

穩穩命中

最佳實踐：

採用match_phrase匹配，結果會非常嚴格，但是也會漏掉相關的結果，個人覺得混合兩種方式進行bool查詢比較好，並且對match_phrase匹配採用boost加權，比如對name進行2種分詞並索引，ik_smart分詞采用match_phrase匹配，ik_max_word分詞采用match匹配，如：

{
  "query": {
    "bool": {
      "should": [
          {"match_phrase": {"name1": {"query": "第十", "boost": 2}}},
          {"match": {"name2": "第十"}}
      ]
    }
  },
  explain: true

}

ElasticSearch的match和match_phrase查詢

問題：

問題分析：

解決方案：

最佳實踐：

轉自：https://zhuanlan.zhihu.com/p/25970549

ElasticSearch的match和match_phrase查詢

HQL和SQL查詢

mybatis處理集合、循環、數組和in查詢等語句的使用

mysql開啟binlog日誌和慢查詢日誌

連接查詢和分組查詢

鏈接查詢和分組查詢

6.交叉連接,自連接和聯合查詢

Django之F和Q查詢

PHP獲得微信用戶的OpenID，然後再通過OpenID和access_token查詢用戶信息

C#實現百度網站收錄和排名查詢功能思路及實例

表連接和分組查詢

使用SQL Server和Mysql查詢所有數據庫名、表名和字段名

SSM-MyBatis-05：Mybatis中別名，sql片段和模糊查詢加getMapper

Docker安裝和狀態查詢指令

python3開發進階-Django框架中的ORM的常用操作的補充（F查詢和Q查詢，事務）

Oracle_SQL(5) 連接和子查詢

Django基礎—— 14.聚合查詢和分組查詢

ORM正向和反向查詢

mysql數據類型和子查詢

68 聚合和分組, F和Q查詢, cookie, session

ElasticSearch的match和match_phrase查詢

問題：

問題分析：

解決方案：

最佳實踐：

轉自：https://zhuanlan.zhihu.com/p/25970549

相關推薦