docker下的es安裝分詞器
阿新 • • 發佈:2021-01-31
技術標籤:docker
所有的語言分詞,預設使用的都是“Standard Analyzer”,但是這些分詞器針對於中文的分詞,並不友好。為此需要安裝中文的分詞器。
注意:不能用預設elasticsearch-plugin install xxx.zip 進行自動安裝
https://github.com/medcl/elasticsearch-analysis-ik/releases/download 對應es版本安裝
第一種方式:
在前面安裝的elasticsearch時,我們已經將elasticsearch容器的“/usr/share/elasticsearch/plugins”目錄,對映到宿主機的“ /mydata/elasticsearch/plugins”目錄下,
所以比較方便的做法就是下載“/elasticsearch-analysis-ik-7.6.2.zip”檔案,將下載分詞器壓縮包解壓到新建的ik目錄下,改變ik目錄許可權,chmod -R 777 ik,
並將ik資料夾上傳到plugins目錄,重啟elasticsearch容器。
如果不嫌麻煩,還可以採用如下的方式。
[[email protected] ~]# curl http://localhost:9200
{
"name" : "0adeb7852e00",
"cluster_name" : "elasticsearch",
"cluster_uuid" : "9gglpP0HTfyOTRAaSe2rIg",
"version" : {
"number" : "7.6.2", #版本號為7.6.2
"build_flavor" : "default",
"build_type" : "docker",
"build_hash" : "ef48eb35cf30adf4db14086e8aabd07ef6fb113f",
"build_date" : "2020-03-26T06:34:37.794943Z",
"build_snapshot" : false,
"lucene_version" : "8.4.0",
"minimum_wire_compatibility_version" : "6.8.0",
"minimum_index_compatibility_version" : "6.0.0-beta1"
},
"tagline" : "You Know, for Search"
}
[ [email protected] ~]#
- 1
- docker exec -it 容器id /bin/bash
[[email protected] ~]# docker exec -it elasticsearch /bin/bash
[[email protected] elasticsearch]#
- 1
- 2
- wget https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v7.6.2/elasticsearch-analysis-ik-7.6.2.zip
[[email protected] elasticsearch]# pwd
/usr/share/elasticsearch
#下載ik7.6.2
[[email protected] elasticsearch]# wget https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v7.6.2/elasticsearch-analysis-ik-7.6.2.zip
- unzip 下載的檔案
[[email protected] elasticsearch]# unzip elasticsearch-analysis-ik-7.6.2.zip -d ink
Archive: elasticsearch-analysis-ik-7.6.2.zip
creating: ik/config/
inflating: ik/config/main.dic
inflating: ik/config/quantifier.dic
inflating: ik/config/extra_single_word_full.dic
inflating: ik/config/IKAnalyzer.cfg.xml
inflating: ik/config/surname.dic
inflating: ik/config/suffix.dic
inflating: ik/config/stopword.dic
inflating: ik/config/extra_main.dic
inflating: ik/config/extra_stopword.dic
inflating: ik/config/preposition.dic
inflating: ik/config/extra_single_word_low_freq.dic
inflating: ik/config/extra_single_word.dic
inflating: ik/elasticsearch-analysis-ik-7.6.2.jar
inflating: ik/httpclient-4.5.2.jar
inflating: ik/httpcore-4.4.4.jar
inflating: ik/commons-logging-1.2.jar
inflating: ik/commons-codec-1.9.jar
inflating: ik/plugin-descriptor.properties
inflating: ik/plugin-security.policy
[[email protected] elasticsearch]#
#移動到plugins目錄下
[[email protected] elasticsearch]# mv ik plugins/
- rm -rf *.zip
[[email protected] elasticsearch]# rm -rf elasticsearch-analysis-ik-7.6.2.zip
- 1
確認是否安裝好了分詞器
使用預設
GET my_index/_analyze
{
"text":"我是中國人"
}
- 1
請觀察執行結果:
{
"tokens" : [
{
"token" : "我",
"start_offset" : 0,
"end_offset" : 1,
"type" : "<IDEOGRAPHIC>",
"position" : 0
},
{
"token" : "是",
"start_offset" : 1,
"end_offset" : 2,
"type" : "<IDEOGRAPHIC>",
"position" : 1
},
{
"token" : "中",
"start_offset" : 2,
"end_offset" : 3,
"type" : "<IDEOGRAPHIC>",
"position" : 2
},
{
"token" : "國",
"start_offset" : 3,
"end_offset" : 4,
"type" : "<IDEOGRAPHIC>",
"position" : 3
},
{
"token" : "人",
"start_offset" : 4,
"end_offset" : 5,
"type" : "<IDEOGRAPHIC>",
"position" : 4
}
]
}
- 1
GET my_index/_analyze
{
"analyzer": "ik_smart",
"text":"我是中國人"
}
- 1
輸出結果:
{
"tokens" : [
{
"token" : "我",
"start_offset" : 0,
"end_offset" : 1,
"type" : "CN_CHAR",
"position" : 0
},
{
"token" : "是",
"start_offset" : 1,
"end_offset" : 2,
"type" : "CN_CHAR",
"position" : 1
},
{
"token" : "中國人",
"start_offset" : 2,
"end_offset" : 5,
"type" : "CN_WORD",
"position" : 2
}
]
}
- 1
GET my_index/_analyze
{
"analyzer": "ik_max_word",
"text":"我是中國人"
}
- 1
輸出結果:
{
"tokens" : [
{
"token" : "我",
"start_offset" : 0,
"end_offset" : 1,
"type" : "CN_CHAR",
"position" : 0
},
{
"token" : "是",
"start_offset" : 1,
"end_offset" : 2,
"type" : "CN_CHAR",
"position" : 1
},
{
"token" : "中國人",
"start_offset" : 2,
"end_offset" : 5,
"type" : "CN_WORD",
"position" : 2
},
{
"token" : "中國",
"start_offset" : 2,
"end_offset" : 4,
"type" : "CN_WORD",
"position" : 3
},
{
"token" : "國人",
"start_offset" : 3,
"end_offset" : 5,
"type" : "CN_WORD",
"position" : 4
}
]
}