ES:修改分詞器以及定製自己的分詞器
阿新 • • 發佈:2018-12-19
1、預設的分詞器
standard
standard tokenizer:以單詞邊界進行切分 standard token filter:什麼都不做 lowercase token filter:將所有字母轉換為小寫 stop token filer(預設被禁用):移除停用詞,比如a the it等等
2、修改分詞器的設定
PUT /my_index { "settings": { "analysis": { "analyzer": { "es_std": { "type": "standard", "stopwords": "_english_" } } } } }
3、定製化自己的分詞器
PUT /my_index { "settings": { "analysis": { "char_filter": { "&_to_and": { "type": "mapping", "mappings": [ "&=> and" ] } }, "filter": { "my_stopwords": { "type": "stop", "stopwords": [ "the", "a" ] } }, "analyzer": { "my_analyzer": { "type": "custom", "char_filter": [ "html_strip", "&_to_and" ], "tokenizer": "standard", "filter": [ "lowercase", "my_stopwords" ] } } } } }
PUT /my_index/_mapping/my_type
{
"properties": {
"content": {
"type": "text",
"analyzer": "my_analyzer"
}
}
}