queries and filters

Although we refer to the query DSL, in reality there are two DSLs: the query DSL and the filter DSL.Query clauses and filter clauses are similar in nature, but have slightly different purposes.



most important queries and filters

term filter {query:{ "term":"value" }} terms filer { query:{   "terms":["a","b"] } } range filter
exists and missing filter The exists and missing filters are used to find documents in which the specified field either has one or more values (exists) or doesn’t have any values (
missing). It is similar in nature to IS_NULL (missing) and NOT IS_NULL (exists)in SQL

bool filter


must should must_not
















The match query should be the standard query that you reach for whenever you want to query for a full-text or exact value in almost any field.

If you run a match query against a full-text field, it will analyze the query string by using the correct analyzer for that field before executing the search:

{"match":{"tweet":"About Search"}}

If you use it on a field containing an exact value, such as a number, a date, a Boolean, or a not_analyzed string field, then it will search for that exact value:

For exact-value searches, you probably want to use a filter instead of a query, as a filter will be cached.


bool query

combining queries with filters

GET /_search
{"query":{"filtered":{"query":{"match":{"email":"business opportunity"}},"filter":{"term":{"folder":"inbox"}}}}}

just a filter

While in query context, if you need to use a filter without a query (for instance, to match all emails in the inbox), you can just omit the query:

GET /_search

You seldom need to use a query as a filter, but we have included it for completeness' sake. The only time you may need it is when you need to use full-text matching while in filter context.

finding multiple exact values

GET /my_store/products/_search

contains, but does not equal

GET /my_index/my_type/_search

When used on date fields, the range filter supports date math operations. For example, if we want to find all documents that have a timestamp sometime in the last hour:

Less than January 1, 2014 plus one month

dealing with null values

GET /my_index/posts/_search
GET /my_index/posts/_search

all about caching

cache 是實時的,所以不用擔心快取的有效期問題。 Leaf filters have to consult the inverted index on disk, so it makes sense to cache them. Compound filters, on the other hand, use fast bit logic to combine the bitsets resulting from their inner clauses, so it is efficient to recalculate them every time.
Certain leaf filters, however, are not cached by default, because it doesn’t make sense to do so:
某些頁節點的過濾器不會被快取,因為快取他們並沒有意義。 例如 Script filters The results from script filters cannot be cached because the meaning of the script is opaque to Elasticsearch. Geo-filters The geolocation filters, which we cover in more detail in Geolocation , are usually used to filter results based on the geolocation of a specific user. Since each user has a unique geolocation, it is unlikely that geo-filters will be reused, so it makes no sense to cache them. Date ranges Date ranges that use the now function (for example "now-1h"), result in values accurate to the millisecond. Every time the filter is run, now returns a new time. Older filters will never be reused, so caching is disabled by default. However, when using now with rounding (for example, now/d rounds to the nearest day), caching is enabled by default.Sometimes the default caching strategy is not correct. Perhaps you have a complicated boolexpression that is reused several times in the same query. Or you have a filter on a date field that will never be reused. The default caching strategy can be overridden on almost any filter by setting the _cache flag:
{"range":{"timestamp":{"gt":"2014-01-02 16:15:14"},"_cache":false}}

filter order

過濾條件越精確的過濾器應該排在前邊。例如 a filter返回1w個結果,b filter返回10個結果,則應將b過濾器置於a之前。 Cached filters are very fast, so they should be placed before filters that are not cacheable.

full-text search

Term-based queries

Queries like the term or fuzzy queries are low-level queries that have no analysis phase. They operate on a single term. A term query for the term Foo looks for that exact term in the inverted index and calculates the TF/IDF relevance _score for each document that contains the term.

It is important to remember that the term query looks in the inverted index for the exact term only; it won’t match any variants like foo or FOO. It doesn’t matter how the term came to be in the index, just that it is. If you were to index ["Foo","Bar"] into an exact value not_analyzedfield, or Foo Bar into an analyzed field with the whitespace analyzer, both would result in having the two terms Foo and Bar in the inverted index.

Full-text queries

Queries like the match or query_string queries are high-level queries that understand the mapping of a field:

  • If you use them to query a date or integer field, they will treat the query string as a date or integer, respectively.
  • If you query an exact value (not_analyzed) string field, they will treat the whole query string as a single term.
  • But if you query a full-text (analyzed) field, they will first pass the query string through the appropriate analyzer to produce the list of terms to be queried.

a single-word queryedit

Our first example explains what happens when we use the match query to search within a full-text field for a single word:

GET /my_index/my_type/_search

Elasticsearch executes the preceding match query as follows:

  1. Check the field type.

    The title field is a full-text (analyzedstring field, which means that the query string should be analyzed too.

  2. Analyze the query string.

    The query string QUICK! is passed through the standard analyzer, which results in the single term quick. Because we have a just a single term, the match query can be executed as a single low-level term query.

  3. Find matching docs.

    The term query looks up quick in the inverted index and retrieves the list of documents that contain that term—in this case, documents 1, 2, and 3.

  4. Score each doc.

    The term query calculates the relevance _score for each matching document, by combining the term frequency (how often quick appears in the title field of each document), with the inverse document frequency (how often quick appears in the titlefield in all documents in the index), and the length of each field (shorter fields are considered more relevant). See What Is Relevance?.

multiword queries

GET /my_index/my_type/_search
{"query":{"match":{"title":{"query":"BROWN DOG!","operator":"and"}}}}

controlling precision

GET /my_index/my_type/_search
{"query":{"match":{"title":{"query":"quick brown dog","minimum_should_match":"75%"}}}}

controlling precision



