elasticsearch實現中文分詞和拼音分詞混合查詢+CompletionSuggestion
阿新 • • 發佈:2019-01-10
引言
之前已經介紹瞭如何搭建elasticsearch服務端和簡單的索引建立,和中文分詞的支援。今天我們來說一說如何實現elasticsearch同時實現中文分詞和pinyin分詞。並且實現類似百度搜索欄的搜尋建議的功能。
混合查詢
實現混合查詢有很多方式,這裡介紹我認為是一個偷懶的方法,就是為你要拼音搜尋的欄位提供兩個額外的欄位,一個是全拼欄位,一個是首字母縮寫欄位。我這裡用的是官網的Employee的例子:
public class Employee implements Serializable {
private String firstName;
private String lastName;
private String pinyin;//firstName全拼
private String header;//firstName首字母縮寫
private int age;
private String about;
private List<String> interests;
....省略getter setter
接下來為index新增setting和mapping
XContentBuilder settings = XContentFactory.jsonBuilder();
settings.startObject()
.startObject("analysis")
.startObject("analyzer")
.startObject("ik_analyzer").field("tokenizer","ik_smart")
.endObject()
.endObject()
.endObject ().endObject();
CreateIndexRequest createIndexRequest = new CreateIndexRequest(index).settings(settings);
CreateIndexResponse createIndexResponse = esClient.admin().indices().create(createIndexRequest).get();
logger.info("Index:{} created,response:{}", index, JSON.toJSON(createIndexResponse));
XContentBuilder builder = XContentFactory.jsonBuilder();
builder.startObject()
.startObject(type)
.startObject("properties")
.startObject("firstName").field("type", "string").field("analyzer","ik_smart")
/* .field("search_analyzer","ik_smart").field("preserve_separators",false)
.field("preserve_position_increments",false)*/
.endObject()
.startObject("lastName").field("type", "string").field("analyzer","ik_smart")
.endObject()
.startObject("pinyin").field("type","string").field("analyzer","pinyin")
.startObject()
.startObject("header").field("type","string").field("analyzer","pinyin")
.startObject("about").field("type", "string").field("analyzer","ik_smart")
.endObject()
.startObject("interests").field("type", "string").field("analyzer","ik_smart")
.endObject()
.endObject()
.endObject()
.endObject();
PutMappingRequest putMappingRequest = new PutMappingRequest(index);
putMappingRequest.type(type);
putMappingRequest.source(builder);
PutMappingResponse putMappingResponse = esClient.admin().indices().putMapping(putMappingRequest).get();
logger.info("Mapping for `{}.{}` putted, response:{}", index, type, JSON.toJSON(putMappingResponse));
return true;
} catch (Exception e) {
logger.error("doCreateIndex", e);
return false;
}
新增幾個測試用例,我這裡直接用了批量插入索引的方法:
public Boolean bulkIndex(List<String> jsonList){
if(esIndexTypes.get(index)==null) {
if(getMapping(index, indexType)) esIndexTypes.put(index,true);
}
BulkRequestBuilder bulkBuilder= esClient.prepareBulk();
for (String s : jsonList) {
IndexRequestBuilder requestBuilder = esClient.prepareIndex(index, indexType)
.setSource(s);
bulkBuilder.add(requestBuilder);
}
BulkResponse bulkResponse = bulkBuilder.execute().actionGet();
logger.info("index:{} bulk request,:response:{}",index,JSON.toJSON(bulkResponse));
return true;
}
@org.junit.Test
public void test(){
List<String> list1 = new ArrayList<>(10000);
for (int i=0;i<10000;i++) {
Employee employee = new Employee();
employee.setFirstName("告白氣球"+i);
employee.setPinyin("gaobaiqiqiu"+i);
employee.setHeader("gbqq");
employee.setLastName("周杰倫,日記");
employee.setAbout("嗚啦啦啦火車笛\n" +
"\n" +
"隨著奔騰的馬蹄\n" +
"\n" +
"小妹妹吹著口琴\n" +
"\n" +
"夕陽下美了剪影\n" +
"\n" +
"我用子彈寫日記,我泡妞看電影");
employee.setAge(18);
List<String> list = new ArrayList<String>();
list.add("喜歡打籃球");
list.add("在大晴天晒太陽");
list.add("泡妞看電影");
employee.setInterests(list);
list1.add(JSON.toJSONString(employee));
}
boolean index = esProxy.bulkIndex(list1);
}
最後直接搜gaobaiqiqiu
或gbqq
搜出來的資料像這樣:
[{"firstName":"告白氣球","lastName":"周杰倫,日記","pinyin":"gaobaiqiqiu","about":"嗚啦啦啦火車笛\n\n隨著奔騰的馬蹄\n\n小妹妹吹著口琴\n\n夕陽下美了剪影\n\n我用子彈寫日記,我泡妞看電影","header":"gbqq","interests":["喜歡打籃球","在大晴天晒太陽","泡妞看電影"],"age":18}]
如果直接搜告白
搜出來的資料像這樣:
[{"firstName":"<span style=\"color:red\">告白</span>氣球","lastName":"周杰倫,日記","pinyin":"gaobaiqiqiu","about":"嗚啦啦啦火車笛\n\n隨著奔騰的馬蹄\n\n小妹妹吹著口琴\n\n夕陽下美了剪影\n\n我用子彈寫日記,我泡妞看電影","header":"gbqq","interests":["喜歡打籃球","在大晴天晒太陽","泡妞看電影"],"age":18}]
CompletionSuggestion查詢建議
使用CompletionSuggestion時mapping需要改一下,實時推薦的欄位type需要使用completion。
XContentBuilder builder = XContentFactory.jsonBuilder();
builder.startObject()
.startObject(type)
.startObject("properties")
.startObject("firstName").field("type", "completion").field("analyzer","ik_smart")
.field("search_analyzer","ik_smart").field("preserve_separators",false)
.field("preserve_position_increments",false)
.endObject()
.startObject("lastName").field("type", "string").field("analyzer","ik_smart")
.endObject()
.startObject("pinyin").field("type","string").field("analyzer","pinyin")
.startObject()
.startObject("header").field("type","string").field("analyzer","pinyin")
.startObject("about").field("type", "string").field("analyzer","ik_smart")
.endObject()
.startObject("interests").field("type", "string").field("analyzer","ik_smart")
.endObject()
.endObject()
.endObject()
.endObject();
查詢的時候需要使用CompletionSuggestionBuilder
.
public void searchSuggest(String str){
CompletionSuggestionBuilder suggestionBuilder = new CompletionSuggestionBuilder("firstName");
suggestionBuilder.analyzer("ik_smart");
suggestionBuilder.text(str);
SearchResponse response = esClient.prepareSearch(index).setTypes(indexType).setQuery(QueryBuilders.matchAllQuery())
.suggest(new SuggestBuilder().addSuggestion("my-suggest-1",suggestionBuilder)).get();
Suggest suggest= response.getSuggest();
CompletionSuggestion suggestion = suggest.getSuggestion("my-suggest-1");
List<CompletionSuggestion.Entry> list = suggestion.getEntries();
for (int i = 0; i < list.size(); i++) {
List<CompletionSuggestion.Entry.Option> options = list.get(i).getOptions();
for (int j = 0; j < options.size(); j++) {
if (options.get(j) instanceof CompletionSuggestion.Entry.Option) {
CompletionSuggestion.Entry.Option op = options.get(j);
System.out.println(op.getScore()+"--"+op.getText());
}
}
}
}
你也可以使用restAPI:http://192.168.10.xxx:9200/megacorp/_search?pretty
這裡megacorp是indexName,
{ "size": 0,
"suggest": {
"my-suggest-1": {
"prefix": "someone li",
"completion": {
"field": "firstName"
}
}
}
}
查詢出來的結果:
{
"took": 12,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 0,
"max_score": 0,
"hits": []
},
"suggest": {
"blog-suggest": [
{
"text": "someone li",
"offset": 0,
"length": 10,
"options": [
{
"text": "someone like you",
"_index": "megacorp",
"_type": "employee",
"_id": "AV_doqcXKY206Vs3lcCO",
"_score": 1,
"_source": {
"about": "嗚啦啦啦火車笛\n\n隨著奔騰的馬蹄\n\n小妹妹吹著口琴\n\n夕陽下美了剪影\n\n我用子彈寫日記,我泡妞看電影",
"age": 18,
"firstName": "someone like you",
"interests": [
"喜歡打籃球",
"在大晴天晒太陽",
"泡妞看電影"
],
"lastName": "周杰倫,日記"
}
}
]
}
]
}
}