為elastic新增中文分詞
阿新 • • 發佈:2019-02-15
新增中文分詞
可以可以自己整合中文分片語件,medcl為es寫了三個中文分詞外掛,一個是ik的,一個是mmseg的,一個是pinyin4j的。
下面介紹這三個外掛與es的整合:
1.ik與es的整合
1.1下載
1.2編譯
解壓下載的elasticsearch-analysis-ik-1.2.6.zip
編譯
在cmd下編譯
Windows開始選單---》執行-----》cmd-----》回車
e:
cd E:\j2ee\search\中文分詞器\for_es\elasticsearch-analysis-ik-1.2.6
E:\j2ee\search\中文分詞器\for_es\elasticsearch-analysis-ik-1.2.6>E:\j2ee\maven\apache-maven-3.1.1-bin\apache-maven-3.1.1\bin\mvn package
1.3配置 1.3.1在%ES_HOME%目錄下新建目錄/plugins/analysis-ik mkdir -p /usr/local/search/elasticsearch-1.3.1/plugins/analysis-ik
1.3.2將elasticsearch-analysis-ik-1.2.6.jar拷貝到目錄/usr/local/search/elasticsearch-1.3.1 /plugins/analysis-ik下
1.3.3將解壓elasticsearch-analysis-ik-1.2.6.zip後的config/ik目錄拷貝到/usr/local/search/elasticsearch-1.3.1 /config/目錄下
1.3.4修改elasticsearch.yml vi /usr/local/search/elasticsearch-1.3.1 /config/elasticsearch.yml index: analysis: analyzer: ik: alias: [news_analyzer_ik,ik_analyzer] type: org.elasticsearch.index.analysis.IkAnalyzerProvider index.analysis.analyzer.default.type : "ik"
1.3.5IKAnalyzer.cfg.xml 可以在/usr/local/search/elasticsearch-1.3.1/config/ik /IKAnalyzer.cfg.xml中配置一些擴充套件的詞庫字典,以及一些停用詞詞庫字典 vi /usr/local/search/elasticsearch-1.3.1/config/ik /IKAnalyzer.cfg.xml <?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE properties SYSTEM " http://java.sun.com/dtd/properties.dtd">
<properties>
<comment>IK Analyzer 擴充套件配置</comment>
<!--使用者可以在這裡配置自己的擴充套件字典 -->
<entry key="ext_dict">custom/mydict.dic;custom/single_word_low_freq.dic</entry>
<!--使用者可以在這裡配置自己的擴充套件停止詞字典-->
<entry key="ext_stopwords">custom/ext_stopword.dic</entry>
</properties> 1.3.6 重啟es /usr/local/search/elasticsearch-1.3.1/bin/service/elasticsearch stop /usr/local/search/elasticsearch-1.3.1/bin/service/elasticsearch start 1.4測試 1.4.1建立mapping,指定使用中文分詞器 /** * 建立型別對映關係,使用中文分詞器 * 注意:在定義mapping之前,需要先建立一個index庫 * @param client * @throws IOException */ public static void mapping4CN(Client client) throws IOException{ XContentBuilder mapping=XContentFactory.jsonBuilder().startObject().startObject("fulltext") .startObject("_all").field("indexAnalyzer","ik").field("searchAnalyzer","ik").field("term_vector","no").field("store","false").endObject() .startObject("properties") .startObject("content").field("type","string").field("store","no").field("term_vector","with_positions_offsets").field("indexAnalyzer","ik").field("searchAnalyzer","ik").field("include_in_all","true").field("boost",8).endObject() .endObject() .endObject().endObject(); System.out.println(mapping.string()); //注意:在定義mapping之前,需要先建立一個index庫 //建立索引庫 if(!indexExist(client,"cnindex")){ CreateIndexResponse ciresponse=client.admin().indices().prepareCreate("cnindex").execute().actionGet(); System.out.println("CreateIndexResponse---->"+ciresponse.isAcknowledged()); } //建立Mapping(需要指定索引庫名稱) PutMappingRequestBuilder pmrbuilder=client.admin().indices().preparePutMapping("cnindex").setType("fulltext").setSource(mapping); PutMappingResponse pmResponse=pmrbuilder.execute().actionGet(); System.out.println("PutMappingResponse---->"+pmResponse.isAcknowledged()); } 1.4.2建立中文索引 /** * 為中文內容建立索引 * @param client * @throws IOException */ public static void createIndex4CN(Client client) throws IOException{ XContentBuilder doc1=XContentFactory.jsonBuilder().startObject() .field("content", "中韓漁警衝突調查:韓警平均每天扣1艘中國漁船") .endObject(); XContentBuilder doc2=XContentFactory.jsonBuilder().startObject() .field("content", "美國留給伊拉克的是個爛攤子嗎") .endObject(); XContentBuilder doc3=XContentFactory.jsonBuilder().startObject() .field("content", "公安部:各地校車將享最高路權") .endObject(); XContentBuilder doc4=XContentFactory.jsonBuilder().startObject() .field("content", "中國駐洛杉磯領事館遭亞裔男子槍擊 嫌犯已自首") .endObject(); /** * 其中第一個引數productIndex為索引庫名,一個es叢集中可以有多個索引庫。 * 第二個引數productIndexType為索引型別,是用來區分同索引庫下不同型別的資料的,一個索引庫下可以有多個索引型別。 * 第三個引數productIndexId為document的id */ IndexRequestBuilder irbuilder1= client.prepareIndex("cnindex", "cnindextype","cnindexid1").setRefresh(true).setSource(doc1); IndexRequestBuilder irbuilder2= client.prepareIndex("cnindex", "cnindextype","cnindexid2").setRefresh(true).setSource(doc2); IndexRequestBuilder irbuilder3= client.prepareIndex("cnindex", "cnindextype","cnindexid3").setRefresh(true).setSource(doc3); IndexRequestBuilder irbuilder4= client.prepareIndex("cnindex", "cnindextype","cnindexid4").setRefresh(true).setSource(doc4); BulkRequestBuilder brbuilder=client.prepareBulk(); brbuilder.add(irbuilder1); brbuilder.add(irbuilder2); brbuilder.add(irbuilder3); brbuilder.add(irbuilder4); BulkResponse response=brbuilder.execute().actionGet(); System.out.println(response); } 1.4.3執行中文搜尋 /** * 執行中文搜尋 * @param client */ public static void search4CN(Client client){ //構造查詢條件 //TermQuery QueryBuilder qb1=QueryBuilders.termQuery("content", "伊拉克"); /** QueryBuilder qb2=QueryBuilders.boolQuery().must(QueryBuilders.termQuery("content", "中國")) .must(QueryBuilders.termQuery("content", "中國")) .mustNot(QueryBuilders.termQuery("onSale", false)) .should(QueryBuilders.termQuery("type", 1)); QueryBuilder db3=QueryBuilders.filteredQuery(QueryBuilders.termQuery("content", "中國"), FilterBuilders.rangeFilter("price").from(30.0).to(500.0).includeLower(true).includeUpper(false)); */ SearchResponse response= client.prepareSearch("cnindex").setTypes("cnindextype").setQuery(qb1).setFrom(0).setSize(15).addHighlightedField("content").setHighlighterPreTags("<span style=\"color:red\">").setHighlighterPostTags("</span>").setExplain(true).execute().actionGet(); SearchHits shits=response.getHits(); SearchHit[] shs= shits.hits(); for(SearchHit sh: shs ){ String content=(String) sh.getSource().get("content"); System.out.println("content="+content); } } 2.mmseg與es的整合 2.1下載 2.2編譯 解壓下載的elasticsearch-analysis-mmseg-1.2.0.zip 編譯 在cmd下編譯 Windows開始選單---》執行-----》cmd-----》回車 e: cd E:\j2ee\search\中文分詞器\for_es\elasticsearch-analysis-mmseg-1.2.0 E:\j2ee\search\中文分詞器\for_es\elasticsearch-analysis-mmseg-1.2.0>E:\j2ee\maven\apache-maven-3.1.1-bin\apache-maven-3.1.1\bin\mvn package
2.3配置 2.3.1在%ES_HOME%目錄下新建目錄/plugins/analysis-mmseg mkdir -p /usr/local/search/elasticsearch-1.3.1/plugins/analysis-mmseg
2.3.2將elasticsearch-analysis-mmseg-1.2.0.jar拷貝到目錄/usr/local/search/elasticsearch-1.3.1/plugins/analysis-mmseg下
2.3.3將解壓elasticsearch-analysis-mmseg-1.2.0.zip後的config\mmseg目錄拷貝到/usr/local/search/elasticsearch-1.3.1/config/目錄下
2.3.4修改elasticsearch.yml vi /usr/local/search/elasticsearch-1.3.1 /config/elasticsearch.yml index: analysis: analyzer: ik: alias: [news_analyzer_ik,ik_analyzer] type: org.elasticsearch.index.analysis.IkAnalyzerProvider mmseg: alias: [news_analyzer, mmseg_analyzer] type: org.elasticsearch.index.analysis.MMsegAnalyzerProvider
2.3.5重啟es /usr/local/search/elasticsearch-1.3.1/bin/service/elasticsearch stop /usr/local/search/elasticsearch-1.3.1/bin/service/elasticsearch start
2.4測試 2.4.1建立mapping,指定使用中文分詞器 /** * 建立型別對映關係,使用中文分詞器mmseg * 注意:在定義mapping之前,需要先建立一個index庫 * @param client * @throws IOException */ public static void mapping4CN_MMSEG(Client client) throws IOException{ XContentBuilder mapping=XContentFactory.jsonBuilder().startObject().startObject("fulltext_mmseg") .startObject("_all").field("indexAnalyzer","mmseg").field("searchAnalyzer","mmseg").field("term_vector","no").field("store","true").endObject() .startObject("properties") .startObject("content").field("type","string").field("store","yes").field("term_vector","with_positions_offsets").field("indexAnalyzer","mmseg").field("searchAnalyzer","mmseg").field("include_in_all","true").field("boost",8).endObject() .endObject() .endObject().endObject(); System.out.println(mapping.string()); //注意:在定義mapping之前,需要先建立一個index庫 //建立索引庫 if(!indexExist(client,"cnindex_mmseg")){ CreateIndexResponse ciresponse=client.admin().indices().prepareCreate("cnindex_mmseg").execute().actionGet(); System.out.println("CreateIndexResponse---->"+ciresponse.isAcknowledged()); } //建立Mapping(需要指定索引庫名稱) PutMappingRequestBuilder pmrbuilder=client.admin().indices().preparePutMapping("cnindex_mmseg").setType("fulltext_mmseg").setSource(mapping); PutMappingResponse pmResponse=pmrbuilder.execute().actionGet(); System.out.println("PutMappingResponse---->"+pmResponse.isAcknowledged()); } 2.4.2建立中文索引 /** * 為中文內容建立索引 * @param client * @throws IOException */ public static void createIndex4CN_MMSEG(Client client) throws IOException{ XContentBuilder doc1=XContentFactory.jsonBuilder().startObject() .field("content", "中韓漁警衝突調查:韓警平均每天扣1艘中國漁船") .endObject(); XContentBuilder doc2=XContentFactory.jsonBuilder().startObject() .field("content", "美國留給伊拉克的是個爛攤子嗎") .endObject(); XContentBuilder doc3=XContentFactory.jsonBuilder().startObject() .field("content", "公安部:各地校車將享最高路權") .endObject(); XContentBuilder doc4=XContentFactory.jsonBuilder().startObject() .field("content", "中國駐洛杉磯領事館遭亞裔男子槍擊 嫌犯已自首") .endObject(); /** * 其中第一個引數productIndex為索引庫名,一個es叢集中可以有多個索引庫。 * 第二個引數productIndexType為索引型別,是用來區分同索引庫下不同型別的資料的,一個索引庫下可以有多個索引型別。 * 第三個引數productIndexId為document的id */ IndexRequestBuilder irbuilder1= client.prepareIndex("cnindex_mmseg", "cnindextype_mmseg","cnindexid_mmseg1").setRefresh(true).setSource(doc1); IndexRequestBuilder irbuilder2= client.prepareIndex("cnindex_mmseg", "cnindextype_mmseg","cnindexid_mmseg2").setRefresh(true).setSource(doc2); IndexRequestBuilder irbuilder3= client.prepareIndex("cnindex_mmseg", "cnindextype_mmseg","cnindexid_mmseg3").setRefresh(true).setSource(doc3); IndexRequestBuilder irbuilder4= client.prepareIndex("cnindex_mmseg", "cnindextype_mmseg","cnindexid_mmseg4").setRefresh(true).setSource(doc4); BulkRequestBuilder brbuilder=client.prepareBulk(); brbuilder.add(irbuilder1); brbuilder.add(irbuilder2); brbuilder.add(irbuilder3); brbuilder.add(irbuilder4); BulkResponse response=brbuilder.execute().actionGet(); System.out.println(response); } 2.4.3執行中文搜尋 /** * 執行中文搜尋 * @param client */ public static void search4CN_MMSEG(Client client){ //構造查詢條件 //TermQuery QueryBuilder qb1=QueryBuilders.termQuery("content", "校車"); /** QueryBuilder qb2=QueryBuilders.boolQuery().must(QueryBuilders.termQuery("content", "中國")) .must(QueryBuilders.termQuery("content", "中國")) .mustNot(QueryBuilders.termQuery("onSale", false)) .should(QueryBuilders.termQuery("type", 1)); QueryBuilder db3=QueryBuilders.filteredQuery(QueryBuilders.termQuery("content", "中國"), FilterBuilders.rangeFilter("price").from(30.0).to(500.0).includeLower(true).includeUpper(false)); */ SearchResponse response= client.prepareSearch("cnindex_mmseg").setTypes("cnindextype_mmseg").setQuery(qb1).setFrom(0).setSize(15).addHighlightedField("content").setHighlighterPreTags("<span style=\"color:red\">").setHighlighterPostTags("</span>").setExplain(true).execute().actionGet(); SearchHits shits=response.getHits(); SearchHit[] shs= shits.hits(); for(SearchHit sh: shs ){ String content=(String) sh.getSource().get("content"); System.out.println("content="+content); } } 3.pinyin4j與es的整合 3.1下載 3.2編譯
解壓下載的 編譯 在cmd下編譯 Windows開始選單---》執行-----》cmd-----》回車 e: cd E:\j2ee\search\中文分詞器\for_es\elasticsearch-analysis-pinyin-1.2.2 E:\j2ee\search\中文分詞器\for_es\elasticsearch-analysis-pinyin-1.2.2>E:\j2ee\maven\apache-maven-3.1.1-bin\apache-maven-3.1.1\bin\mvn package
3.3配置 3.3.1在%ES_HOME%目錄下新建目錄/plugins/analysis-pinyin mkdir -p /usr/local/search/elasticsearch-1.3.1/plugins/analysis-pinyin
3.3.2將lib/pinyin4j-2.5.0.jar和target/elasticsearch-analysis-pinyin-1.2.2.jar拷貝到目錄/usr/local/search/elasticsearch-1.3.1/plugins/analysis-pinyin下
2.3.4修改elasticsearch.yml vi /usr/local/search/elasticsearch-1.3.1/config/elasticsearch.yml
index: analysis: analyzer: ik: alias: [news_analyzer_ik,ik_analyzer] type: org.elasticsearch.index.analysis.IkAnalyzerProvider mmseg: alias: [news_analyzer_mmseg, mmseg_analyzer] type: org.elasticsearch.index.analysis.MMsegAnalyzerProvider pinyin: alias: [news_analyzer_pinyin, pinyin_analyzer] type: org.elasticsearch.index.analysis.PinyinAnalyzerProvider index.analysis.analyzer.default.type : "ik"
3.3.5重啟es /usr/local/search/elasticsearch-1.3.1/bin/service/elasticsearch stop /usr/local/search/elasticsearch-1.3.1/bin/service/elasticsearch start
3.4測試 2.4.1建立mapping,指定使用中文分詞器 3.4.2建立中文索引 3.4.3執行中文搜尋
1.3配置 1.3.1在%ES_HOME%目錄下新建目錄/plugins/analysis-ik mkdir -p /usr/local/search/elasticsearch-1.3.1/plugins/analysis-ik
1.3.2將elasticsearch-analysis-ik-1.2.6.jar拷貝到目錄/usr/local/search/elasticsearch-1.3.1 /plugins/analysis-ik下
1.3.3將解壓elasticsearch-analysis-ik-1.2.6.zip後的config/ik目錄拷貝到/usr/local/search/elasticsearch-1.3.1 /config/目錄下
1.3.4修改elasticsearch.yml vi /usr/local/search/elasticsearch-1.3.1 /config/elasticsearch.yml index: analysis: analyzer: ik: alias: [news_analyzer_ik,ik_analyzer] type: org.elasticsearch.index.analysis.IkAnalyzerProvider index.analysis.analyzer.default.type : "ik"
1.3.5IKAnalyzer.cfg.xml 可以在/usr/local/search/elasticsearch-1.3.1/config/ik /IKAnalyzer.cfg.xml中配置一些擴充套件的詞庫字典,以及一些停用詞詞庫字典 vi /usr/local/search/elasticsearch-1.3.1/config/ik /IKAnalyzer.cfg.xml <?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE properties SYSTEM "
<properties>
<comment>IK Analyzer 擴充套件配置</comment>
<!--使用者可以在這裡配置自己的擴充套件字典 -->
<entry key="ext_dict">custom/mydict.dic;custom/single_word_low_freq.dic</entry>
<!--使用者可以在這裡配置自己的擴充套件停止詞字典-->
<entry key="ext_stopwords">custom/ext_stopword.dic</entry>
</properties> 1.3.6 重啟es /usr/local/search/elasticsearch-1.3.1/bin/service/elasticsearch stop /usr/local/search/elasticsearch-1.3.1/bin/service/elasticsearch start 1.4測試 1.4.1建立mapping,指定使用中文分詞器 /** * 建立型別對映關係,使用中文分詞器 * 注意:在定義mapping之前,需要先建立一個index庫 * @param client * @throws IOException */ public static void mapping4CN(Client client) throws IOException{ XContentBuilder mapping=XContentFactory.jsonBuilder().startObject().startObject("fulltext") .startObject("_all").field("indexAnalyzer","ik").field("searchAnalyzer","ik").field("term_vector","no").field("store","false").endObject() .startObject("properties") .startObject("content").field("type","string").field("store","no").field("term_vector","with_positions_offsets").field("indexAnalyzer","ik").field("searchAnalyzer","ik").field("include_in_all","true").field("boost",8).endObject() .endObject() .endObject().endObject(); System.out.println(mapping.string()); //注意:在定義mapping之前,需要先建立一個index庫 //建立索引庫 if(!indexExist(client,"cnindex")){ CreateIndexResponse ciresponse=client.admin().indices().prepareCreate("cnindex").execute().actionGet(); System.out.println("CreateIndexResponse---->"+ciresponse.isAcknowledged()); } //建立Mapping(需要指定索引庫名稱) PutMappingRequestBuilder pmrbuilder=client.admin().indices().preparePutMapping("cnindex").setType("fulltext").setSource(mapping); PutMappingResponse pmResponse=pmrbuilder.execute().actionGet(); System.out.println("PutMappingResponse---->"+pmResponse.isAcknowledged()); } 1.4.2建立中文索引 /** * 為中文內容建立索引 * @param client * @throws IOException */ public static void createIndex4CN(Client client) throws IOException{ XContentBuilder doc1=XContentFactory.jsonBuilder().startObject() .field("content", "中韓漁警衝突調查:韓警平均每天扣1艘中國漁船") .endObject(); XContentBuilder doc2=XContentFactory.jsonBuilder().startObject() .field("content", "美國留給伊拉克的是個爛攤子嗎") .endObject(); XContentBuilder doc3=XContentFactory.jsonBuilder().startObject() .field("content", "公安部:各地校車將享最高路權") .endObject(); XContentBuilder doc4=XContentFactory.jsonBuilder().startObject() .field("content", "中國駐洛杉磯領事館遭亞裔男子槍擊 嫌犯已自首") .endObject(); /** * 其中第一個引數productIndex為索引庫名,一個es叢集中可以有多個索引庫。 * 第二個引數productIndexType為索引型別,是用來區分同索引庫下不同型別的資料的,一個索引庫下可以有多個索引型別。 * 第三個引數productIndexId為document的id */ IndexRequestBuilder irbuilder1= client.prepareIndex("cnindex", "cnindextype","cnindexid1").setRefresh(true).setSource(doc1); IndexRequestBuilder irbuilder2= client.prepareIndex("cnindex", "cnindextype","cnindexid2").setRefresh(true).setSource(doc2); IndexRequestBuilder irbuilder3= client.prepareIndex("cnindex", "cnindextype","cnindexid3").setRefresh(true).setSource(doc3); IndexRequestBuilder irbuilder4= client.prepareIndex("cnindex", "cnindextype","cnindexid4").setRefresh(true).setSource(doc4); BulkRequestBuilder brbuilder=client.prepareBulk(); brbuilder.add(irbuilder1); brbuilder.add(irbuilder2); brbuilder.add(irbuilder3); brbuilder.add(irbuilder4); BulkResponse response=brbuilder.execute().actionGet(); System.out.println(response); } 1.4.3執行中文搜尋 /** * 執行中文搜尋 * @param client */ public static void search4CN(Client client){ //構造查詢條件 //TermQuery QueryBuilder qb1=QueryBuilders.termQuery("content", "伊拉克"); /** QueryBuilder qb2=QueryBuilders.boolQuery().must(QueryBuilders.termQuery("content", "中國")) .must(QueryBuilders.termQuery("content", "中國")) .mustNot(QueryBuilders.termQuery("onSale", false)) .should(QueryBuilders.termQuery("type", 1)); QueryBuilder db3=QueryBuilders.filteredQuery(QueryBuilders.termQuery("content", "中國"), FilterBuilders.rangeFilter("price").from(30.0).to(500.0).includeLower(true).includeUpper(false)); */ SearchResponse response= client.prepareSearch("cnindex").setTypes("cnindextype").setQuery(qb1).setFrom(0).setSize(15).addHighlightedField("content").setHighlighterPreTags("<span style=\"color:red\">").setHighlighterPostTags("</span>").setExplain(true).execute().actionGet(); SearchHits shits=response.getHits(); SearchHit[] shs= shits.hits(); for(SearchHit sh: shs ){ String content=(String) sh.getSource().get("content"); System.out.println("content="+content); } } 2.mmseg與es的整合 2.1下載 2.2編譯 解壓下載的elasticsearch-analysis-mmseg-1.2.0.zip 編譯 在cmd下編譯 Windows開始選單---》執行-----》cmd-----》回車 e: cd E:\j2ee\search\中文分詞器\for_es\elasticsearch-analysis-mmseg-1.2.0 E:\j2ee\search\中文分詞器\for_es\elasticsearch-analysis-mmseg-1.2.0>E:\j2ee\maven\apache-maven-3.1.1-bin\apache-maven-3.1.1\bin\mvn package
2.3配置 2.3.1在%ES_HOME%目錄下新建目錄/plugins/analysis-mmseg mkdir -p /usr/local/search/elasticsearch-1.3.1/plugins/analysis-mmseg
2.3.2將elasticsearch-analysis-mmseg-1.2.0.jar拷貝到目錄/usr/local/search/elasticsearch-1.3.1/plugins/analysis-mmseg下
2.3.3將解壓elasticsearch-analysis-mmseg-1.2.0.zip後的config\mmseg目錄拷貝到/usr/local/search/elasticsearch-1.3.1/config/目錄下
2.3.4修改elasticsearch.yml vi /usr/local/search/elasticsearch-1.3.1 /config/elasticsearch.yml index: analysis: analyzer: ik: alias: [news_analyzer_ik,ik_analyzer] type: org.elasticsearch.index.analysis.IkAnalyzerProvider mmseg: alias: [news_analyzer, mmseg_analyzer] type: org.elasticsearch.index.analysis.MMsegAnalyzerProvider
2.3.5重啟es /usr/local/search/elasticsearch-1.3.1/bin/service/elasticsearch stop /usr/local/search/elasticsearch-1.3.1/bin/service/elasticsearch start
2.4測試 2.4.1建立mapping,指定使用中文分詞器 /** * 建立型別對映關係,使用中文分詞器mmseg * 注意:在定義mapping之前,需要先建立一個index庫 * @param client * @throws IOException */ public static void mapping4CN_MMSEG(Client client) throws IOException{ XContentBuilder mapping=XContentFactory.jsonBuilder().startObject().startObject("fulltext_mmseg") .startObject("_all").field("indexAnalyzer","mmseg").field("searchAnalyzer","mmseg").field("term_vector","no").field("store","true").endObject() .startObject("properties") .startObject("content").field("type","string").field("store","yes").field("term_vector","with_positions_offsets").field("indexAnalyzer","mmseg").field("searchAnalyzer","mmseg").field("include_in_all","true").field("boost",8).endObject() .endObject() .endObject().endObject(); System.out.println(mapping.string()); //注意:在定義mapping之前,需要先建立一個index庫 //建立索引庫 if(!indexExist(client,"cnindex_mmseg")){ CreateIndexResponse ciresponse=client.admin().indices().prepareCreate("cnindex_mmseg").execute().actionGet(); System.out.println("CreateIndexResponse---->"+ciresponse.isAcknowledged()); } //建立Mapping(需要指定索引庫名稱) PutMappingRequestBuilder pmrbuilder=client.admin().indices().preparePutMapping("cnindex_mmseg").setType("fulltext_mmseg").setSource(mapping); PutMappingResponse pmResponse=pmrbuilder.execute().actionGet(); System.out.println("PutMappingResponse---->"+pmResponse.isAcknowledged()); } 2.4.2建立中文索引 /** * 為中文內容建立索引 * @param client * @throws IOException */ public static void createIndex4CN_MMSEG(Client client) throws IOException{ XContentBuilder doc1=XContentFactory.jsonBuilder().startObject() .field("content", "中韓漁警衝突調查:韓警平均每天扣1艘中國漁船") .endObject(); XContentBuilder doc2=XContentFactory.jsonBuilder().startObject() .field("content", "美國留給伊拉克的是個爛攤子嗎") .endObject(); XContentBuilder doc3=XContentFactory.jsonBuilder().startObject() .field("content", "公安部:各地校車將享最高路權") .endObject(); XContentBuilder doc4=XContentFactory.jsonBuilder().startObject() .field("content", "中國駐洛杉磯領事館遭亞裔男子槍擊 嫌犯已自首") .endObject(); /** * 其中第一個引數productIndex為索引庫名,一個es叢集中可以有多個索引庫。 * 第二個引數productIndexType為索引型別,是用來區分同索引庫下不同型別的資料的,一個索引庫下可以有多個索引型別。 * 第三個引數productIndexId為document的id */ IndexRequestBuilder irbuilder1= client.prepareIndex("cnindex_mmseg", "cnindextype_mmseg","cnindexid_mmseg1").setRefresh(true).setSource(doc1); IndexRequestBuilder irbuilder2= client.prepareIndex("cnindex_mmseg", "cnindextype_mmseg","cnindexid_mmseg2").setRefresh(true).setSource(doc2); IndexRequestBuilder irbuilder3= client.prepareIndex("cnindex_mmseg", "cnindextype_mmseg","cnindexid_mmseg3").setRefresh(true).setSource(doc3); IndexRequestBuilder irbuilder4= client.prepareIndex("cnindex_mmseg", "cnindextype_mmseg","cnindexid_mmseg4").setRefresh(true).setSource(doc4); BulkRequestBuilder brbuilder=client.prepareBulk(); brbuilder.add(irbuilder1); brbuilder.add(irbuilder2); brbuilder.add(irbuilder3); brbuilder.add(irbuilder4); BulkResponse response=brbuilder.execute().actionGet(); System.out.println(response); } 2.4.3執行中文搜尋 /** * 執行中文搜尋 * @param client */ public static void search4CN_MMSEG(Client client){ //構造查詢條件 //TermQuery QueryBuilder qb1=QueryBuilders.termQuery("content", "校車"); /** QueryBuilder qb2=QueryBuilders.boolQuery().must(QueryBuilders.termQuery("content", "中國")) .must(QueryBuilders.termQuery("content", "中國")) .mustNot(QueryBuilders.termQuery("onSale", false)) .should(QueryBuilders.termQuery("type", 1)); QueryBuilder db3=QueryBuilders.filteredQuery(QueryBuilders.termQuery("content", "中國"), FilterBuilders.rangeFilter("price").from(30.0).to(500.0).includeLower(true).includeUpper(false)); */ SearchResponse response= client.prepareSearch("cnindex_mmseg").setTypes("cnindextype_mmseg").setQuery(qb1).setFrom(0).setSize(15).addHighlightedField("content").setHighlighterPreTags("<span style=\"color:red\">").setHighlighterPostTags("</span>").setExplain(true).execute().actionGet(); SearchHits shits=response.getHits(); SearchHit[] shs= shits.hits(); for(SearchHit sh: shs ){ String content=(String) sh.getSource().get("content"); System.out.println("content="+content); } } 3.pinyin4j與es的整合 3.1下載 3.2編譯
解壓下載的 編譯 在cmd下編譯 Windows開始選單---》執行-----》cmd-----》回車 e: cd E:\j2ee\search\中文分詞器\for_es\elasticsearch-analysis-pinyin-1.2.2 E:\j2ee\search\中文分詞器\for_es\elasticsearch-analysis-pinyin-1.2.2>E:\j2ee\maven\apache-maven-3.1.1-bin\apache-maven-3.1.1\bin\mvn package
3.3配置 3.3.1在%ES_HOME%目錄下新建目錄/plugins/analysis-pinyin mkdir -p /usr/local/search/elasticsearch-1.3.1/plugins/analysis-pinyin
3.3.2將lib/pinyin4j-2.5.0.jar和target/elasticsearch-analysis-pinyin-1.2.2.jar拷貝到目錄/usr/local/search/elasticsearch-1.3.1/plugins/analysis-pinyin下
2.3.4修改elasticsearch.yml vi /usr/local/search/elasticsearch-1.3.1/config/elasticsearch.yml
index: analysis: analyzer: ik: alias: [news_analyzer_ik,ik_analyzer] type: org.elasticsearch.index.analysis.IkAnalyzerProvider mmseg: alias: [news_analyzer_mmseg, mmseg_analyzer] type: org.elasticsearch.index.analysis.MMsegAnalyzerProvider pinyin: alias: [news_analyzer_pinyin, pinyin_analyzer] type: org.elasticsearch.index.analysis.PinyinAnalyzerProvider index.analysis.analyzer.default.type : "ik"
3.3.5重啟es /usr/local/search/elasticsearch-1.3.1/bin/service/elasticsearch stop /usr/local/search/elasticsearch-1.3.1/bin/service/elasticsearch start
3.4測試 2.4.1建立mapping,指定使用中文分詞器 3.4.2建立中文索引 3.4.3執行中文搜尋