1. 程式人生 > >elasticsearch實現中文分詞和拼音分詞混合查詢+CompletionSuggestion

elasticsearch實現中文分詞和拼音分詞混合查詢+CompletionSuggestion

引言

之前已經介紹瞭如何搭建elasticsearch服務端和簡單的索引建立,和中文分詞的支援。今天我們來說一說如何實現elasticsearch同時實現中文分詞和pinyin分詞。並且實現類似百度搜索欄的搜尋建議的功能。

混合查詢

實現混合查詢有很多方式,這裡介紹我認為是一個偷懶的方法,就是為你要拼音搜尋的欄位提供兩個額外的欄位,一個是全拼欄位,一個是首字母縮寫欄位。我這裡用的是官網的Employee的例子:

public class Employee implements Serializable {

    private String firstName;
    private
String lastName; private String pinyin;//firstName全拼 private String header;//firstName首字母縮寫 private int age; private String about; private List<String> interests; ....省略getter setter

接下來為index新增setting和mapping

 XContentBuilder settings = XContentFactory.jsonBuilder();
settings.startObject() .startObject("analysis") .startObject("analyzer") .startObject("ik_analyzer").field("tokenizer","ik_smart") .endObject() .endObject() .endObject
().endObject(); CreateIndexRequest createIndexRequest = new CreateIndexRequest(index).settings(settings); CreateIndexResponse createIndexResponse = esClient.admin().indices().create(createIndexRequest).get(); logger.info("Index:{} created,response:{}", index, JSON.toJSON(createIndexResponse)); XContentBuilder builder = XContentFactory.jsonBuilder(); builder.startObject() .startObject(type) .startObject("properties") .startObject("firstName").field("type", "string").field("analyzer","ik_smart") /* .field("search_analyzer","ik_smart").field("preserve_separators",false) .field("preserve_position_increments",false)*/ .endObject() .startObject("lastName").field("type", "string").field("analyzer","ik_smart") .endObject() .startObject("pinyin").field("type","string").field("analyzer","pinyin") .startObject() .startObject("header").field("type","string").field("analyzer","pinyin") .startObject("about").field("type", "string").field("analyzer","ik_smart") .endObject() .startObject("interests").field("type", "string").field("analyzer","ik_smart") .endObject() .endObject() .endObject() .endObject(); PutMappingRequest putMappingRequest = new PutMappingRequest(index); putMappingRequest.type(type); putMappingRequest.source(builder); PutMappingResponse putMappingResponse = esClient.admin().indices().putMapping(putMappingRequest).get(); logger.info("Mapping for `{}.{}` putted, response:{}", index, type, JSON.toJSON(putMappingResponse)); return true; } catch (Exception e) { logger.error("doCreateIndex", e); return false; }

新增幾個測試用例,我這裡直接用了批量插入索引的方法:

    public Boolean bulkIndex(List<String> jsonList){

        if(esIndexTypes.get(index)==null) {
            if(getMapping(index, indexType)) esIndexTypes.put(index,true);
        }

        BulkRequestBuilder bulkBuilder= esClient.prepareBulk();
        for (String s : jsonList) {
            IndexRequestBuilder requestBuilder = esClient.prepareIndex(index, indexType)
                    .setSource(s);
           bulkBuilder.add(requestBuilder);
        }

        BulkResponse bulkResponse = bulkBuilder.execute().actionGet();
        logger.info("index:{} bulk request,:response:{}",index,JSON.toJSON(bulkResponse));
        return true;
    }

    @org.junit.Test
    public void test(){
        List<String> list1 = new ArrayList<>(10000);
        for (int i=0;i<10000;i++) {
            Employee employee = new Employee();
            employee.setFirstName("告白氣球"+i);
            employee.setPinyin("gaobaiqiqiu"+i);
            employee.setHeader("gbqq");
            employee.setLastName("周杰倫,日記");
            employee.setAbout("嗚啦啦啦火車笛\n" +
                    "\n" +
                    "隨著奔騰的馬蹄\n" +
                    "\n" +
                    "小妹妹吹著口琴\n" +
                    "\n" +
                    "夕陽下美了剪影\n" +
                    "\n" +
                    "我用子彈寫日記,我泡妞看電影");
            employee.setAge(18);
            List<String> list = new ArrayList<String>();
            list.add("喜歡打籃球");
            list.add("在大晴天晒太陽");
            list.add("泡妞看電影");
            employee.setInterests(list);
            list1.add(JSON.toJSONString(employee));
        }

        boolean index = esProxy.bulkIndex(list1);


    }

最後直接搜gaobaiqiqiugbqq搜出來的資料像這樣:

[{"firstName":"告白氣球","lastName":"周杰倫,日記","pinyin":"gaobaiqiqiu","about":"嗚啦啦啦火車笛\n\n隨著奔騰的馬蹄\n\n小妹妹吹著口琴\n\n夕陽下美了剪影\n\n我用子彈寫日記,我泡妞看電影","header":"gbqq","interests":["喜歡打籃球","在大晴天晒太陽","泡妞看電影"],"age":18}]

如果直接搜告白搜出來的資料像這樣:

[{"firstName":"<span style=\"color:red\">告白</span>氣球","lastName":"周杰倫,日記","pinyin":"gaobaiqiqiu","about":"嗚啦啦啦火車笛\n\n隨著奔騰的馬蹄\n\n小妹妹吹著口琴\n\n夕陽下美了剪影\n\n我用子彈寫日記,我泡妞看電影","header":"gbqq","interests":["喜歡打籃球","在大晴天晒太陽","泡妞看電影"],"age":18}]

CompletionSuggestion查詢建議

使用CompletionSuggestion時mapping需要改一下,實時推薦的欄位type需要使用completion。

 XContentBuilder builder = XContentFactory.jsonBuilder();
            builder.startObject()
                    .startObject(type)
                    .startObject("properties")
                    .startObject("firstName").field("type", "completion").field("analyzer","ik_smart")
                  .field("search_analyzer","ik_smart").field("preserve_separators",false)
                    .field("preserve_position_increments",false)
                    .endObject()
                    .startObject("lastName").field("type", "string").field("analyzer","ik_smart")
                    .endObject()
                    .startObject("pinyin").field("type","string").field("analyzer","pinyin")
                    .startObject()
                    .startObject("header").field("type","string").field("analyzer","pinyin")
                    .startObject("about").field("type", "string").field("analyzer","ik_smart")
                    .endObject()
                    .startObject("interests").field("type", "string").field("analyzer","ik_smart")
                    .endObject()
                    .endObject()
                    .endObject()
                    .endObject();

查詢的時候需要使用CompletionSuggestionBuilder.

public void searchSuggest(String str){

        CompletionSuggestionBuilder suggestionBuilder = new CompletionSuggestionBuilder("firstName");
        suggestionBuilder.analyzer("ik_smart");
        suggestionBuilder.text(str);
        SearchResponse response = esClient.prepareSearch(index).setTypes(indexType).setQuery(QueryBuilders.matchAllQuery())
                .suggest(new SuggestBuilder().addSuggestion("my-suggest-1",suggestionBuilder)).get();

        Suggest suggest= response.getSuggest();
        CompletionSuggestion suggestion = suggest.getSuggestion("my-suggest-1");
        List<CompletionSuggestion.Entry> list = suggestion.getEntries();
        for (int i = 0; i < list.size(); i++) {
            List<CompletionSuggestion.Entry.Option> options = list.get(i).getOptions();
            for (int j = 0; j < options.size(); j++) {
                if (options.get(j) instanceof CompletionSuggestion.Entry.Option) {
                    CompletionSuggestion.Entry.Option op =  options.get(j);
                    System.out.println(op.getScore()+"--"+op.getText());
                }
            }
        }
    }

你也可以使用restAPI:http://192.168.10.xxx:9200/megacorp/_search?pretty這裡megacorp是indexName,

{ "size": 0,
  "suggest": {
    "my-suggest-1": {
      "prefix": "someone li",
      "completion": {
        "field": "firstName"
      }
    }
  }
}

查詢出來的結果:

{
    "took": 12,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "failed": 0
    },
    "hits": {
        "total": 0,
        "max_score": 0,
        "hits": []
    },
    "suggest": {
        "blog-suggest": [
            {
                "text": "someone li",
                "offset": 0,
                "length": 10,
                "options": [
                    {
                        "text": "someone like you",
                        "_index": "megacorp",
                        "_type": "employee",
                        "_id": "AV_doqcXKY206Vs3lcCO",
                        "_score": 1,
                        "_source": {
                            "about": "嗚啦啦啦火車笛\n\n隨著奔騰的馬蹄\n\n小妹妹吹著口琴\n\n夕陽下美了剪影\n\n我用子彈寫日記,我泡妞看電影",
                            "age": 18,
                            "firstName": "someone like you",
                            "interests": [
                                "喜歡打籃球",
                                "在大晴天晒太陽",
                                "泡妞看電影"
                            ],
                            "lastName": "周杰倫,日記"
                        }
                    }
                ]
            }
        ]
    }
}