ElasticSearch分詞器總結
阿新 • • 發佈:2018-11-02
一、ik、pinyin分詞器
今天用通訊錄演示ES檢索功能,在對姓名檢索時,想實現中文和拼音均可檢索,於是除之前常用的中文分詞器ik外,又下載了拼音分詞器pinyin,使用情況總結如下:
1、下載
ik:https://github.com/medcl/elasticsearch-analysis-ik
pinyin:https://github.com/medcl/elasticsearch-analysis-pinyin
2、安裝
將下載的檔案解壓後放入es資料夾plugins下,可新建ik,pinyin資料夾;
其中pinyin分詞器我不知為何無法直接下載zip檔案,所以是下載的原始碼然後打包,再解壓後放入plugins/pinyin下
3、pinyin分詞器測試
GET _analyze?pretty
{
"analyzer": "pinyin",
"text": "劉德華"
}
結果:
{
"tokens": [
{
"token": "liu",
"start_offset": 0,
"end_offset": 0,
"type": "word",
"position": 0
},
{
"token": "de",
"start_offset": 0,
"end_offset": 0 ,
"type": "word",
"position": 1
},
{
"token": "hua",
"start_offset": 0,
"end_offset": 0,
"type": "word",
"position": 2
},
{
"token": "ldh",
"start_offset": 0,
"end_offset": 0,
"type": "word",
"position": 2
}
]
}
4、索引模板中分詞器配置
在模板setting中分詞器的配置
"analysis" : {
"analyzer" : {
"ik" : {
"tokenizer" : "ik_max_word"
},
"pinyin_analyzer" : {
"tokenizer" : "my_pinyin"
}
},
"tokenizer" : {
"my_pinyin" : {
"keep_separate_first_letter" : "false",
"lowercase" : "true",
"type" : "pinyin",
"limit_first_letter_length" : "16",
"remove_duplicated_term" : "true",
"keep_original" : "true",
"keep_full_pinyin" : "true",
"keep_joined_full_pinyin":"true",
"keep_none_chinese_in_joined_full_pinyin":"true"
}
}
}
其中my_pinyin中配置項在https://github.com/medcl/elasticsearch-analysis-pinyin文件中有說明,可根據自己需求進行配置。
5、mapping中建立type
可以在一個屬性中設定多個分詞器fields:
"mappings": {
"doc": {
"properties": {
"PERSON_ENAME": {
"type" : "text",
"fields" : {
"ik" : {"type" : "text", "analyzer" :"ik"},
"english": { "type":"text","analyzer": "english"},
"standard" : {"type" : "text"}
}
},
"CONTACTER_NAME": {
"type" : "text",
"fields" : {
"ik" : {"type" : "text", "analyzer" :"ik"},
"pinyin": { "type":"text","analyzer": "pinyin_analyzer"},
"standard" : {"type" : "text"}
}
}
}
}
}
6、測試
在多個欄位中查詢
POST sim/doc/_search
{
"query": {
"multi_match" : {
"query" : "dfbb",
"fields" : [
"PERSON_ENAME.ik",
"PERSON_ENAME.standard",
"PERSON_ENAME.english",
"CONTACTER_NAME.ik",
"CONTACTER_NAME.standard",
"CONTACTER_NAME.pinyin"]
}
}
}