1. 程式人生 > 其它 >docker下的es安裝分詞器

docker下的es安裝分詞器

技術標籤:docker

所有的語言分詞,預設使用的都是“Standard Analyzer”,但是這些分詞器針對於中文的分詞,並不友好。為此需要安裝中文的分詞器。

注意:不能用預設elasticsearch-plugin install xxx.zip 進行自動安裝
https://github.com/medcl/elasticsearch-analysis-ik/releases/download 對應es版本安裝

第一種方式:

在前面安裝的elasticsearch時,我們已經將elasticsearch容器的“/usr/share/elasticsearch/plugins”目錄,對映到宿主機的“ /mydata/elasticsearch/plugins”目錄下,

所以比較方便的做法就是下載“/elasticsearch-analysis-ik-7.6.2.zip”檔案,將下載分詞器壓縮包解壓到新建的ik目錄下,改變ik目錄許可權,chmod -R 777 ik,
並將ik資料夾上傳到plugins目錄,重啟elasticsearch容器。

如果不嫌麻煩,還可以採用如下的方式。

(1)檢視elasticsearch版本號:

[[email protected] ~]# curl http://localhost:9200
{
  "name" : "0adeb7852e00",
  "cluster_name" : "elasticsearch",
  "cluster_uuid" : "9gglpP0HTfyOTRAaSe2rIg",
  "version" : {
    "number" : "7.6.2",      #版本號為7.6.2
    "build_flavor" : "default",
    "build_type" : "docker",
    "build_hash" : "ef48eb35cf30adf4db14086e8aabd07ef6fb113f",
    "build_date" : "2020-03-26T06:34:37.794943Z",
    "build_snapshot" : false,
    "lucene_version" : "8.4.0",
    "minimum_wire_compatibility_version" : "6.8.0",
    "minimum_index_compatibility_version" : "6.0.0-beta1"
  },
  "tagline" : "You Know, for Search"
}
[
[email protected]
~]#
  • 1

(2)進入es容器內部plugin目錄

  • docker exec -it 容器id /bin/bash
[[email protected] ~]# docker exec -it elasticsearch /bin/bash
[[email protected] elasticsearch]# 
  • 1
  • 2
  • wget https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v7.6.2/elasticsearch-analysis-ik-7.6.2.zip
[[email protected]
elasticsearch]# pwd /usr/share/elasticsearch #下載ik7.6.2 [[email protected] elasticsearch]# wget https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v7.6.2/elasticsearch-analysis-ik-7.6.2.zip
  • unzip 下載的檔案
[[email protected] elasticsearch]# unzip elasticsearch-analysis-ik-7.6.2.zip -d ink
Archive:  elasticsearch-analysis-ik-7.6.2.zip
   creating: ik/config/
  inflating: ik/config/main.dic      
  inflating: ik/config/quantifier.dic  
  inflating: ik/config/extra_single_word_full.dic  
  inflating: ik/config/IKAnalyzer.cfg.xml  
  inflating: ik/config/surname.dic   
  inflating: ik/config/suffix.dic    
  inflating: ik/config/stopword.dic  
  inflating: ik/config/extra_main.dic  
  inflating: ik/config/extra_stopword.dic  
  inflating: ik/config/preposition.dic  
  inflating: ik/config/extra_single_word_low_freq.dic  
  inflating: ik/config/extra_single_word.dic  
  inflating: ik/elasticsearch-analysis-ik-7.6.2.jar  
  inflating: ik/httpclient-4.5.2.jar  
  inflating: ik/httpcore-4.4.4.jar   
  inflating: ik/commons-logging-1.2.jar  
  inflating: ik/commons-codec-1.9.jar  
  inflating: ik/plugin-descriptor.properties  
  inflating: ik/plugin-security.policy  
[[email protected] elasticsearch]#
#移動到plugins目錄下
[[email protected] elasticsearch]# mv ik plugins/
  • rm -rf *.zip
[[email protected] elasticsearch]# rm -rf elasticsearch-analysis-ik-7.6.2.zip 
  • 1

確認是否安裝好了分詞器

(2)測試分詞器

使用預設

GET my_index/_analyze
{
   "text":"我是中國人"
}
  • 1

請觀察執行結果:

{
  "tokens" : [
    {
      "token" : "我",
      "start_offset" : 0,
      "end_offset" : 1,
      "type" : "<IDEOGRAPHIC>",
      "position" : 0
    },
    {
      "token" : "是",
      "start_offset" : 1,
      "end_offset" : 2,
      "type" : "<IDEOGRAPHIC>",
      "position" : 1
    },
    {
      "token" : "中",
      "start_offset" : 2,
      "end_offset" : 3,
      "type" : "<IDEOGRAPHIC>",
      "position" : 2
    },
    {
      "token" : "國",
      "start_offset" : 3,
      "end_offset" : 4,
      "type" : "<IDEOGRAPHIC>",
      "position" : 3
    },
    {
      "token" : "人",
      "start_offset" : 4,
      "end_offset" : 5,
      "type" : "<IDEOGRAPHIC>",
      "position" : 4
    }
  ]
}
  • 1
GET my_index/_analyze
{
   "analyzer": "ik_smart", 
   "text":"我是中國人"
}
  • 1

輸出結果:

{
  "tokens" : [
    {
      "token" : "我",
      "start_offset" : 0,
      "end_offset" : 1,
      "type" : "CN_CHAR",
      "position" : 0
    },
    {
      "token" : "是",
      "start_offset" : 1,
      "end_offset" : 2,
      "type" : "CN_CHAR",
      "position" : 1
    },
    {
      "token" : "中國人",
      "start_offset" : 2,
      "end_offset" : 5,
      "type" : "CN_WORD",
      "position" : 2
    }
  ]
}

  • 1
GET my_index/_analyze
{
   "analyzer": "ik_max_word", 
   "text":"我是中國人"
}
  • 1

輸出結果:

{
  "tokens" : [
    {
      "token" : "我",
      "start_offset" : 0,
      "end_offset" : 1,
      "type" : "CN_CHAR",
      "position" : 0
    },
    {
      "token" : "是",
      "start_offset" : 1,
      "end_offset" : 2,
      "type" : "CN_CHAR",
      "position" : 1
    },
    {
      "token" : "中國人",
      "start_offset" : 2,
      "end_offset" : 5,
      "type" : "CN_WORD",
      "position" : 2
    },
    {
      "token" : "中國",
      "start_offset" : 2,
      "end_offset" : 4,
      "type" : "CN_WORD",
      "position" : 3
    },
    {
      "token" : "國人",
      "start_offset" : 3,
      "end_offset" : 5,
      "type" : "CN_WORD",
      "position" : 4
    }
  ]
}