Elasticsearch的學習以及其JAVA API的使用
此文章主要整理Elasticsearch的實際使用中遇到的一些搜尋技巧以及JAVA API的呼叫方法。後續會不斷地補充。
-
簡單搜尋
一條搜尋的json語句如下:
{
"query": {
...
}
}
可以指定起始值和返回結果數實現分頁查詢,如下:
{
"from": 0,
"size": 10,
"query": {
"match_all": {}
}
}
如果不指定分頁數的話預設起始值是0,返回結果數是10。
可以選擇性的載入一部分欄位,如下:
{
"fields": [
"userId"
],
"query": {
"match_all": {}
}
}
表示Hits結果只加載userId欄位,如果fields欄位為空或不存在則只返回"_index","_type","_id","_score"這些欄位
-
Match All Query
{ "query": { "match_all": {} } } matchAllQuery表示查詢匹配全部文件。其對應的Java類為MatchAllQueryBuilder。
-
Term Query
{
"query": {
"term" : { "user" : "Kimchy" }
}
}
termQuery表示精確匹配搜尋,不對內容進行分詞。即例項中表示是查詢內容的user欄位的值為Kimchy的文件。其對應的Java為
TermQueryBuilder。有多個構造器第一個引數為要匹配欄位,第二個引數為匹配值。
eg:
QueryBuilders.termQuery("name", "你的名字。")
-
Match Query
{
"query": {
"match": {
"name": "甜心格格 第二季"
}
}
}
matchQuery匹配單個欄位查詢,即查詢name欄位名為"甜心格格 第二季"的文件。其對應的JAVA類為MatchQueryBuilder。
{
"query": {
"match": {
"_all": "你神"
}
}
}
如果欄位為“_all”則表示對所有欄位進行檢索。matchQuery有三種類型:boolean
, phrase
,phrase_prefix。
-
Boolean
boolean是預設型別。根據官網文件,設定為boolean時意味著對所提供的文字進行分析,並且分析過程根據所提供的文字構造布林查詢。設定
operator可以控制,預設為or。即會對給出的值進行分詞。minimum_should_match
用來設定最小分詞匹配數。
-
Phrase和Phrase_prefix
phrase和phrase_prefix都可以檢索短語。不同的是phrase_prefix可以在最後一個詞進行字首匹配。
eg:
{
"query": {
"match_phrase_prefix": {
"name": "quick brown f"
}
}
}
-
MultiMatch Query
{
"query": {
"multi_match": {
"query": "你的名字(花絮預告)",
"fields": [
"name",
"awards"
]
}
}
}
multiMatchQuery是多個欄位匹配值。field欄位可以使用萬用字元指定。比如*_name可以匹配例如first_name與last_name這樣的欄位。^可以提升欄位的重要度,例如name^3。
它的type屬性可以被設定為best_fields、most_fields、cross_fields、phrase、phrase_prefix這幾種。具體的用法今後再研究。
對應的JAVA類為MultiMatchQueryBuilder。
ps:還有一種用法
{
"query": {
"term": {
"all_worlds": "日本"
}
}
}
這樣會查詢所有欄位中包含“日本”的文件。
-
Wildcard Query
{
"query": {
"wildcard": {
"name": "*的*"
}
}
}
wildcardQuery是模糊查詢。?匹配單個字元,*匹配多個字元。JAVA類WildcardQueryBuilder。
-
Query String Query
{ "query": { "query_string" : { "query" : "(new york city) OR (big apple)" } } }
Parameter | Description |
---|---|
|
The actual query to be parsed. See Query string syntax. |
|
The default field for query terms if no prefix field is specified. Defaults to the |
|
The default operator used if no explicit operator is specified. For example, with a default operator of |
|
The analyzer name used to analyze the query string. |
|
When set, |
|
Whether terms of wildcard, prefix, fuzzy, and range queries are to be automatically lower-cased or not (since they are not analyzed). Defaults to |
|
Set to |
|
Controls the number of terms fuzzy queries will expand to. Defaults to |
|
Set the fuzziness for fuzzy queries. Defaults to |
|
Set the prefix length for fuzzy queries. Default is |
|
Sets the default slop for phrases. If zero, then exact phrase matches are required. Default value is |
|
Sets the boost value of the query. Defaults to |
|
By default, wildcards terms in a query string are not analyzed. By setting this value to |
|
Defaults to |
|
Limit on how many automaton states regexp queries are allowed to create. This protects against too-difficult (e.g. exponentially hard) regexps. Defaults to 10000. |
|
A value controlling how many "should" clauses in the resulting boolean query should match. It can be an absolute value ( |
|
If set to |
|
Locale that should be used for string conversions. Defaults to |
|
Time Zone to be applied to any range query related to dates. See also JODA timezone. |
-
複合查詢
-
Bool Query
{
"query": {
"bool": {
"should": [
{
"term": {
"releaseYear": "2014"
}
},
{
"match_phrase_prefix": {
"name": "你的名字"
}
}
]
}
}
}
boolQuery為複合查詢,可以進行組合查詢。
Occur | Description |
---|---|
|
The clause (query) must appear in matching documents and will contribute to the score. |
|
The clause (query) must appear in matching documents. However unlike |
|
The clause (query) should appear in the matching document. In a boolean query with no |
|
The clause (query) must not appear in the matching documents. |
-
JAVA API
-
連線ES叢集
TransportClient
利用transport模組遠端連線一個elasticsearch叢集。它並不加入到叢集中,只是簡單的獲得一個或者多個初始化的transport地址,並以輪詢的方式與這些地址進行通訊。
// on startup
Client client = new TransportClient()
.addTransportAddress(new InetSocketTransportAddress("host1", 9300))
.addTransportAddress(new InetSocketTransportAddress("host2", 9300));
// on shutdown
client.close();
注意,如果你有一個與elasticsearch
叢集不同的叢集,你可以設定機器的名字。
Settings settings = ImmutableSettings.settingsBuilder()
.put("cluster.name", "myClusterName").build();
Client client = new TransportClient(settings);
//Add transport addresses and do something with the client...
你也可以用elasticsearch.yml
檔案來設定。
這個客戶端可以嗅到叢集的其它部分,並將它們加入到機器列表。為了開啟該功能,設定client.transport.sniff
為true。
Settings settings = ImmutableSettings.settingsBuilder()
.put("client.transport.sniff", true).build();
TransportClient client = new TransportClient(settings);
其它的transport客戶端設定有如下幾個:
Parameter | Description |
---|---|
client.transport.ignore_cluster_name | true:忽略連線節點的叢集名驗證 |
client.transport.ping_timeout | ping一個節點的響應時間,預設是5s |
client.transport.nodes_sampler_interval | sample/ping 節點的時間間隔,預設是5s |
PS:client使用完畢後最好關閉,測試過如果一直獲取連線不關閉的話連線可能會報錯。
-
獲取文件
獲取API允許你通過id從索引中獲取型別化的JSON文件,如下例:
GetResponse response = client.prepareGet("twitter", "tweet", "1")
.execute()
.actionGet();
預設情況下,operationThreaded
設定為true表示操作執行在不同的執行緒上面。下面是一個設定為false的例子。
GetResponse response = client.prepareGet("twitter", "tweet", "1")
.setOperationThreaded(false)
.execute()
.actionGet();
-
刪除文件
刪除api允許你通過id,從特定的索引中刪除型別化的JSON文件。
預設情況下,operationThreaded
設定為true表示操作執行在不同的執行緒上面。下面是一個設定為false的例子。
DeleteResponse response = client.prepareDelete("twitter", "tweet", "1")
.setOperationThreaded(false)
.execute()
.actionGet();
-
新增或更新文件
你能夠建立一個UpdateRequest
,然後將其傳送給client。
UpdateRequest updateRequest = new UpdateRequest(); updateRequest.index("index"); updateRequest.type("type"); updateRequest.id("1"); updateRequest.doc(jsonBuilder() .startObject() .field("gender", "male") .endObject()); client.update(updateRequest).get();
或者你也可以利用prepareUpdate
方法
client.prepareUpdate("ttl", "doc", "1") .setScript("ctx._source.gender = \"male\"" , ScriptService.ScriptType.INLINE) .get(); client.prepareUpdate("ttl", "doc", "1") .setDoc(jsonBuilder() .startObject() .field("gender", "male") .endObject()) .get();
1-3行用指令碼來更新索引,5-10行用doc來更新索引。
當然,java API也支援使用upsert
。如果文件還不存在,會根據upsert
內容建立一個新的索引。
IndexRequest indexRequest = new IndexRequest("index", "type", "1") .source(jsonBuilder() .startObject() .field("name", "Joe Smith") .field("gender", "male") .endObject()); UpdateRequest updateRequest = new UpdateRequest("index", "type", "1") .doc(jsonBuilder() .startObject() .field("gender", "male") .endObject()) .upsert(indexRequest); client.update(updateRequest).get();
如果文件index/type/1
已經存在,那麼在更新操作完成之後,文件為:
{ "name" : "Joe Dalton", "gender": "male" }
否則,文件為:
{ "name" : "Joe Smith", "gender": "male" }
-
Bulk
bulk API允許開發者在一個請求中索引和刪除多個文件。下面是使用例項。
import static org.elasticsearch.common.xcontent.XContentFactory.*;
BulkRequestBuilder bulkRequest = client.prepareBulk();
// either use client#prepare, or use Requests# to directly build index/delete requests
bulkRequest.add(client.prepareIndex("twitter", "tweet", "1")
.setSource(jsonBuilder()
.startObject()
.field("user", "kimchy")
.field("postDate", new Date())
.field("message", "trying out Elasticsearch")
.endObject()
)
);
bulkRequest.add(client.prepareIndex("twitter", "tweet", "2")
.setSource(jsonBuilder()
.startObject()
.field("user", "kimchy")
.field("postDate", new Date())
.field("message", "another post")
.endObject()
)
);
BulkResponse bulkResponse = bulkRequest.execute().actionGet();
if (bulkResponse.hasFailures()) {
// process failures by iterating through each bulk response item
}
-
搜尋
搜尋API允許開發者執行一個搜尋查詢,返回滿足查詢條件的搜尋資訊。它能夠跨索引以及跨型別執行。查詢既可以用Java查詢API也可以用Java過濾API。 查詢的請求體由SearchSourceBuilder
構建。
import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.action.search.SearchType;
import org.elasticsearch.index.query.FilterBuilders.*;
import org.elasticsearch.index.query.QueryBuilders.*;
SearchResponse response = client.prepareSearch("index1", "index2")
.setTypes("type1", "type2")
.setSearchType(SearchType.DFS_QUERY_THEN_FETCH)
.setQuery(QueryBuilders.termQuery("multi", "test")) // Query
.setPostFilter(FilterBuilders.rangeFilter("age").from(12).to(18)) // Filter
.setFrom(0).setSize(60).setExplain(true)
.execute()
.actionGet();
注意,所有的引數都是可選的。下面是最簡潔的形式。
// MatchAll on the whole cluster with all default options
SearchResponse response = client.prepareSearch().execute().actionGet();
在Java中使用scrolls
import static org.elasticsearch.index.query.FilterBuilders.*;
import static org.elasticsearch.index.query.QueryBuilders.*;
QueryBuilder qb = termQuery("multi", "test");
SearchResponse scrollResp = client.prepareSearch(test)
.setSearchType(SearchType.SCAN)
.setScroll(new TimeValue(60000))
.setQuery(qb)
.setSize(100).execute().actionGet(); //100 hits per shard will be returned for each scroll
//Scroll until no hits are returned
while (true) {
for (SearchHit hit : scrollResp.getHits()) {
//Handle the hit...
}
scrollResp = client.prepareSearchScroll(scrollResp.getScrollId()).setScroll(new TimeValue(600000)).execute().actionGet();
//Break condition: No hits are returned
if (scrollResp.getHits().getHits().length == 0) {
break;
}
}
多搜尋API
SearchRequestBuilder srb1 = node.client()
.prepareSearch().setQuery(QueryBuilders.queryString("elasticsearch")).setSize(1);
SearchRequestBuilder srb2 = node.client()
.prepareSearch().setQuery(QueryBuilders.matchQuery("name", "kimchy")).setSize(1);
MultiSearchResponse sr = node.client().prepareMultiSearch()
.add(srb1)
.add(srb2)
.execute().actionGet();
// You will get all individual responses from MultiSearchResponse#getResponses()
long nbHits = 0;
for (MultiSearchResponse.Item item : sr.getResponses()) {
SearchResponse response = item.getResponse();
nbHits += response.getHits().getTotalHits();
}
使用聚合
下面的例子顯示怎樣新增兩個聚合到你的搜尋中。
SearchResponse sr = node.client().prepareSearch()
.setQuery(QueryBuilders.matchAllQuery())
.addAggregation(
AggregationBuilders.terms("agg1").field("field")
)
.addAggregation(
AggregationBuilders.dateHistogram("agg2")
.field("birth")
.interval(DateHistogram.Interval.YEAR)
)
.execute().actionGet();
// Get your facet results
Terms agg1 = sr.getAggregations().get("agg1");
DateHistogram agg2 = sr.getAggregations().get("agg2");
使用搜索模板
定義你的模板引數為Map<String,String>
Map<String, String> template_params = new HashMap<>();
template_params.put("param_gender", "male");
你可以用你儲存在config/scripts
目錄中的模板。例如,你擁有如下的檔案config/scripts/template_gender.mustache
{
"template" : {
"query" : {
"match" : {
"gender" : "{{param_gender}}"
}
}
}
}
可以通過如下方式執行:
SearchResponse sr = client.prepareSearch()
.setTemplateName("template_gender")
.setTemplateType(ScriptService.ScriptType.FILE)
.setTemplateParams(template_params)
.get();
你也可以將模板儲存在一個專門的索引中,這個索引名為.scripts
client.preparePutIndexedScript("mustache", "template_gender",
"{\n" +
" \"template\" : {\n" +
" \"query\" : {\n" +
" \"match\" : {\n" +
" \"gender\" : \"{{param_gender}}\"\n" +
" }\n" +
" }\n" +
" }\n" +
"}").get();
為了用這個被索引的模板,需要用到ScriptService.ScriptType.INDEXED
:
SearchResponse sr = client.prepareSearch()
.setTemplateName("template_gender")
.setTemplateType(ScriptService.ScriptType.INDEXED)
.setTemplateParams(template_params)
.get();
-
查詢刪除
基於查詢的刪除API允許開發者基於查詢刪除一個或者多個索引、一個或者多個型別。下面是一個例子。
import static org.elasticsearch.index.query.FilterBuilders.*;
import static org.elasticsearch.index.query.QueryBuilders.*;
DeleteByQueryResponse response = client.prepareDeleteByQuery("test")
.setQuery(termQuery("_type", "type1"))
.execute()
.actionGet();