Elasticsearch的學習以及其JAVA API的使用

阿新 • • 發佈：2018-11-22

此文章主要整理Elasticsearch的實際使用中遇到的一些搜尋技巧以及JAVA API的呼叫方法。後續會不斷地補充。

簡單搜尋

一條搜尋的json語句如下:

{
  "query": {
    ... 
  }
}

可以指定起始值和返回結果數實現分頁查詢，如下：

{
"from": 0,
"size": 10,
"query": {
"match_all": {}
}
}

如果不指定分頁數的話預設起始值是0，返回結果數是10。

可以選擇性的載入一部分欄位，如下：

{
"fields": [
"userId"
],
"query": {
"match_all": {}
}
}

表示Hits結果只加載userId欄位，如果fields欄位為空或不存在則只返回"_index"，"_type"，"_id"，"_score"這些欄位

Match All Query

{
    "query": {
        "match_all": {}
    }
}
matchAllQuery表示查詢匹配全部文件。其對應的Java類為MatchAllQueryBuilder。

Term Query

{
  "query": {
    "term" : { "user" : "Kimchy" } 
  }
}

termQuery表示精確匹配搜尋，不對內容進行分詞。即例項中表示是查詢內容的user欄位的值為Kimchy的文件。其對應的Java為

TermQueryBuilder。有多個構造器第一個引數為要匹配欄位，第二個引數為匹配值。

eg：

QueryBuilders.termQuery("name", "你的名字。")

Match Query

{
"query": {
"match": {
"name": "甜心格格第二季"
}
}
}

matchQuery匹配單個欄位查詢，即查詢name欄位名為"甜心格格第二季"的文件。其對應的JAVA類為MatchQueryBuilder。

{
"query": {
"match": {
"_all": "你神"
}
}
}

如果欄位為“_all”則表示對所有欄位進行檢索。matchQuery有三種類型：boolean, phrase,phrase_prefix。

Boolean

boolean是預設型別。根據官網文件,設定為boolean時意味著對所提供的文字進行分析，並且分析過程根據所提供的文字構造布林查詢。設定operator可以控制，預設為or。即會對給出的值進行分詞。minimum_should_match 用來設定最小分詞匹配數。

Phrase和Phrase_prefix

phrase和phrase_prefix都可以檢索短語。不同的是phrase_prefix可以在最後一個詞進行字首匹配。

eg：

{
  "query": {
    "match_phrase_prefix": {
        "name": "quick brown f"
    }
  }
}

MultiMatch Query

{
"query": {
"multi_match": {
"query": "你的名字（花絮預告）",
"fields": [
"name",
"awards"
]
}
}
}

multiMatchQuery是多個欄位匹配值。field欄位可以使用萬用字元指定。比如*_name可以匹配例如first_name與last_name這樣的欄位。^可以提升欄位的重要度，例如name^3。

它的type屬性可以被設定為best_fields、most_fields、cross_fields、phrase、phrase_prefix這幾種。具體的用法今後再研究。

對應的JAVA類為MultiMatchQueryBuilder。

ps:還有一種用法

{
"query": {
"term": {
"all_worlds": "日本"
}
}
}

這樣會查詢所有欄位中包含“日本”的文件。

Wildcard Query

{
"query": {
"wildcard": {
"name": "*的*"
}
}
}

wildcardQuery是模糊查詢。？匹配單個字元，*匹配多個字元。JAVA類WildcardQueryBuilder。

Query String Query

{
    "query": {
        "query_string" : {
            "query" : "(new york city) OR (big apple)"
        }
    }
}

Parameter	Description
`query`	The actual query to be parsed. See Query string syntax.
`default_field`	The default field for query terms if no prefix field is specified. Defaults to the `index.query.default_field` index settings, which in turn defaults to `_all`.
`default_operator`	The default operator used if no explicit operator is specified. For example, with a default operator of `OR`, the query `capital of Hungary` is translated to `capital OR of OR Hungary`, and with default operator of `AND`, the same query is translated to `capital AND of AND Hungary`. The default value is `OR`.
`analyzer`	The analyzer name used to analyze the query string.
`allow_leading_wildcard`	When set, `*` or `?` are allowed as the first character. Defaults to `true`.
`lowercase_expanded_terms`	Whether terms of wildcard, prefix, fuzzy, and range queries are to be automatically lower-cased or not (since they are not analyzed). Defaults to `true`.
`enable_position_increments`	Set to `true` to enable position increments in result queries. Defaults to `true`.
`fuzzy_max_expansions`	Controls the number of terms fuzzy queries will expand to. Defaults to `50`
`fuzziness`	Set the fuzziness for fuzzy queries. Defaults to `AUTO`. See Fuzziness editfor allowed settings.
`fuzzy_prefix_length`	Set the prefix length for fuzzy queries. Default is `0`.
`phrase_slop`	Sets the default slop for phrases. If zero, then exact phrase matches are required. Default value is `0`.
`boost`	Sets the boost value of the query. Defaults to `1.0`.
`analyze_wildcard`	By default, wildcards terms in a query string are not analyzed. By setting this value to `true`, a best effort will be made to analyze those as well.
`auto_generate_phrase_queries`	Defaults to `false`.
`max_determinized_states`	Limit on how many automaton states regexp queries are allowed to create. This protects against too-difficult (e.g. exponentially hard) regexps. Defaults to 10000.
`minimum_should_match`	A value controlling how many "should" clauses in the resulting boolean query should match. It can be an absolute value (`2`), a percentage (`30%`) or a combination of both.
`lenient`	If set to `true` will cause format based failures (like providing text to a numeric field) to be ignored.
`locale`	Locale that should be used for string conversions. Defaults to `ROOT`.
`time_zone`	Time Zone to be applied to any range query related to dates. See also JODA timezone.

複合查詢
Bool Query

{
"query": {
"bool": {
"should": [
{
"term": {
"releaseYear": "2014"
}
},
{
"match_phrase_prefix": {
"name": "你的名字"
}
}
]
}
}
}

boolQuery為複合查詢，可以進行組合查詢。

Occur	Description
`must`	The clause (query) must appear in matching documents and will contribute to the score.
`filter`	The clause (query) must appear in matching documents. However unlike `must` the score of the query will be ignored.
`should`	The clause (query) should appear in the matching document. In a boolean query with no `must` or `filter` clauses, one or more `should` clauses must match a document. The minimum number of should clauses to match can be set using the `minimum_should_match`parameter.
`must_not`	The clause (query) must not appear in the matching documents.

JAVA API
連線ES叢集

TransportClient利用transport模組遠端連線一個elasticsearch叢集。它並不加入到叢集中，只是簡單的獲得一個或者多個初始化的transport地址，並以輪詢的方式與這些地址進行通訊。

// on startup
Client client = new TransportClient()
        .addTransportAddress(new InetSocketTransportAddress("host1", 9300))
        .addTransportAddress(new InetSocketTransportAddress("host2", 9300));

// on shutdown
client.close();

注意，如果你有一個與elasticsearch叢集不同的叢集，你可以設定機器的名字。

Settings settings = ImmutableSettings.settingsBuilder()
        .put("cluster.name", "myClusterName").build();
Client client =    new TransportClient(settings);
//Add transport addresses and do something with the client...

你也可以用elasticsearch.yml檔案來設定。

這個客戶端可以嗅到叢集的其它部分，並將它們加入到機器列表。為了開啟該功能，設定client.transport.sniff為true。

Settings settings = ImmutableSettings.settingsBuilder()
        .put("client.transport.sniff", true).build();
TransportClient client = new TransportClient(settings);

其它的transport客戶端設定有如下幾個：

Parameter	Description
client.transport.ignore_cluster_name	true：忽略連線節點的叢集名驗證
client.transport.ping_timeout	ping一個節點的響應時間，預設是5s
client.transport.nodes_sampler_interval	sample/ping 節點的時間間隔，預設是5s

PS：client使用完畢後最好關閉，測試過如果一直獲取連線不關閉的話連線可能會報錯。

獲取文件

獲取API允許你通過id從索引中獲取型別化的JSON文件，如下例：

GetResponse response = client.prepareGet("twitter", "tweet", "1")
        .execute()
        .actionGet();

預設情況下，operationThreaded設定為true表示操作執行在不同的執行緒上面。下面是一個設定為false的例子。

GetResponse response = client.prepareGet("twitter", "tweet", "1")
        .setOperationThreaded(false)
        .execute()
        .actionGet();

刪除文件

刪除api允許你通過id，從特定的索引中刪除型別化的JSON文件。

預設情況下，operationThreaded設定為true表示操作執行在不同的執行緒上面。下面是一個設定為false的例子。

DeleteResponse response = client.prepareDelete("twitter", "tweet", "1")
        .setOperationThreaded(false)
        .execute()
        .actionGet();

新增或更新文件

你能夠建立一個UpdateRequest,然後將其傳送給client。

UpdateRequest updateRequest = new UpdateRequest();
updateRequest.index("index");
updateRequest.type("type");
updateRequest.id("1");
updateRequest.doc(jsonBuilder()
        .startObject()
            .field("gender", "male")
        .endObject());
client.update(updateRequest).get();

或者你也可以利用prepareUpdate方法

 client.prepareUpdate("ttl", "doc", "1")
        .setScript("ctx._source.gender = \"male\""  , ScriptService.ScriptType.INLINE)
        .get();

 client.prepareUpdate("ttl", "doc", "1")
        .setDoc(jsonBuilder()
            .startObject()
                .field("gender", "male")
            .endObject())
        .get();

1-3行用指令碼來更新索引，5-10行用doc來更新索引。

當然，java API也支援使用upsert。如果文件還不存在，會根據upsert內容建立一個新的索引。

IndexRequest indexRequest = new IndexRequest("index", "type", "1")
        .source(jsonBuilder()
            .startObject()
                .field("name", "Joe Smith")
                .field("gender", "male")
            .endObject());
UpdateRequest updateRequest = new UpdateRequest("index", "type", "1")
        .doc(jsonBuilder()
            .startObject()
                .field("gender", "male")
            .endObject())
        .upsert(indexRequest);
client.update(updateRequest).get();

如果文件index/type/1已經存在，那麼在更新操作完成之後，文件為：

{
    "name"  : "Joe Dalton",
    "gender": "male"
}

否則，文件為：

{
    "name" : "Joe Smith",
    "gender": "male"
}

Bulk

bulk API允許開發者在一個請求中索引和刪除多個文件。下面是使用例項。

import static org.elasticsearch.common.xcontent.XContentFactory.*;

BulkRequestBuilder bulkRequest = client.prepareBulk();

// either use client#prepare, or use Requests# to directly build index/delete requests
bulkRequest.add(client.prepareIndex("twitter", "tweet", "1")
        .setSource(jsonBuilder()
                    .startObject()
                        .field("user", "kimchy")
                        .field("postDate", new Date())
                        .field("message", "trying out Elasticsearch")
                    .endObject()
                  )
        );

bulkRequest.add(client.prepareIndex("twitter", "tweet", "2")
        .setSource(jsonBuilder()
                    .startObject()
                        .field("user", "kimchy")
                        .field("postDate", new Date())
                        .field("message", "another post")
                    .endObject()
                  )
        );

BulkResponse bulkResponse = bulkRequest.execute().actionGet();
if (bulkResponse.hasFailures()) {
    // process failures by iterating through each bulk response item
}

搜尋

搜尋API允許開發者執行一個搜尋查詢，返回滿足查詢條件的搜尋資訊。它能夠跨索引以及跨型別執行。查詢既可以用Java查詢API也可以用Java過濾API。查詢的請求體由SearchSourceBuilder構建。

import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.action.search.SearchType;
import org.elasticsearch.index.query.FilterBuilders.*;
import org.elasticsearch.index.query.QueryBuilders.*;

SearchResponse response = client.prepareSearch("index1", "index2")
        .setTypes("type1", "type2")
        .setSearchType(SearchType.DFS_QUERY_THEN_FETCH)
        .setQuery(QueryBuilders.termQuery("multi", "test"))             // Query
        .setPostFilter(FilterBuilders.rangeFilter("age").from(12).to(18))   // Filter
        .setFrom(0).setSize(60).setExplain(true)
        .execute()
        .actionGet();

注意，所有的引數都是可選的。下面是最簡潔的形式。

// MatchAll on the whole cluster with all default options
SearchResponse response = client.prepareSearch().execute().actionGet();

在Java中使用scrolls

import static org.elasticsearch.index.query.FilterBuilders.*;
import static org.elasticsearch.index.query.QueryBuilders.*;

QueryBuilder qb = termQuery("multi", "test");

SearchResponse scrollResp = client.prepareSearch(test)
        .setSearchType(SearchType.SCAN)
        .setScroll(new TimeValue(60000))
        .setQuery(qb)
        .setSize(100).execute().actionGet(); //100 hits per shard will be returned for each scroll
//Scroll until no hits are returned
while (true) {
    for (SearchHit hit : scrollResp.getHits()) {
        //Handle the hit...
    }
    scrollResp = client.prepareSearchScroll(scrollResp.getScrollId()).setScroll(new TimeValue(600000)).execute().actionGet();
    //Break condition: No hits are returned
    if (scrollResp.getHits().getHits().length == 0) {
        break;
    }
}

多搜尋API

SearchRequestBuilder srb1 = node.client()
    .prepareSearch().setQuery(QueryBuilders.queryString("elasticsearch")).setSize(1);
SearchRequestBuilder srb2 = node.client()
    .prepareSearch().setQuery(QueryBuilders.matchQuery("name", "kimchy")).setSize(1);

MultiSearchResponse sr = node.client().prepareMultiSearch()
        .add(srb1)
        .add(srb2)
        .execute().actionGet();

// You will get all individual responses from MultiSearchResponse#getResponses()
long nbHits = 0;
for (MultiSearchResponse.Item item : sr.getResponses()) {
    SearchResponse response = item.getResponse();
    nbHits += response.getHits().getTotalHits();
}

使用聚合

下面的例子顯示怎樣新增兩個聚合到你的搜尋中。

SearchResponse sr = node.client().prepareSearch()
    .setQuery(QueryBuilders.matchAllQuery())
    .addAggregation(
            AggregationBuilders.terms("agg1").field("field")
    )
    .addAggregation(
            AggregationBuilders.dateHistogram("agg2")
                    .field("birth")
                    .interval(DateHistogram.Interval.YEAR)
    )
    .execute().actionGet();

// Get your facet results
Terms agg1 = sr.getAggregations().get("agg1");
DateHistogram agg2 = sr.getAggregations().get("agg2");

使用搜索模板

定義你的模板引數為Map<String,String>

Map<String, String> template_params = new HashMap<>();
template_params.put("param_gender", "male");

你可以用你儲存在config/scripts目錄中的模板。例如，你擁有如下的檔案config/scripts/template_gender.mustache

{
    "template" : {
        "query" : {
            "match" : {
                "gender" : "{{param_gender}}"
            }
        }
    }
}

可以通過如下方式執行：

SearchResponse sr = client.prepareSearch()
        .setTemplateName("template_gender")
        .setTemplateType(ScriptService.ScriptType.FILE)
        .setTemplateParams(template_params)
        .get();

你也可以將模板儲存在一個專門的索引中，這個索引名為.scripts

client.preparePutIndexedScript("mustache", "template_gender",
        "{\n" +
        "    \"template\" : {\n" +
        "        \"query\" : {\n" +
        "            \"match\" : {\n" +
        "                \"gender\" : \"{{param_gender}}\"\n" +
        "            }\n" +
        "        }\n" +
        "    }\n" +
        "}").get();

為了用這個被索引的模板，需要用到ScriptService.ScriptType.INDEXED:

SearchResponse sr = client.prepareSearch()
        .setTemplateName("template_gender")
        .setTemplateType(ScriptService.ScriptType.INDEXED)
        .setTemplateParams(template_params)
        .get();

查詢刪除

基於查詢的刪除API允許開發者基於查詢刪除一個或者多個索引、一個或者多個型別。下面是一個例子。

import static org.elasticsearch.index.query.FilterBuilders.*;
import static org.elasticsearch.index.query.QueryBuilders.*;

DeleteByQueryResponse response = client.prepareDeleteByQuery("test")
        .setQuery(termQuery("_type", "type1"))
        .execute()
        .actionGet();

Elasticsearch的學習以及其JAVA API的使用

簡單搜尋

Match All Query

Term Query

Match Query

Boolean

Phrase和Phrase_prefix

MultiMatch Query

Wildcard Query

Query String Query

複合查詢

Bool Query

JAVA API

連線ES叢集

獲取文件

刪除文件

新增或更新文件

Bulk

搜尋

在Java中使用scrolls

多搜尋API

使用聚合

使用搜索模板

查詢刪除

相關推薦

`Boolean`