ElasticSearch 查詢命令

阿新 • • 發佈：2021-08-13

查詢方式
插入資料
檢視 index
term 查詢
分詞大小寫
terms 查詢
match 查詢
match_phrase 查詢
multi_match
match_all 查詢
bool 查詢
控制查詢返回數
控制返回欄位
排序
範圍查詢
萬用字元查詢
模糊查詢
分值
aggregation 查詢
metrics aggregation : avg
metrics aggregation : avg & histogram
metrics aggregation : max/min/sum
metrics aggregation : boxplot
metrics aggregation : cardinality
metrics aggregation : extended_stats
metrics aggregation : geo
metrics aggregation : matrix_stats
bucket aggregation : adjacency_matrix
bucket aggregation : composite
bucket aggregation : date_histogram/auto_date_histogram
bucket aggregation : term/filter/filters
bucket aggregation : range/date range
即分組統計 (count) 又計數 (如 avg)
Pipeline aggregations : avg_bucket
Pipeline aggregations : cumulative_sum
Pipeline aggregations : max_bucket
指令碼執行

查詢方式

ES 有自己的不同於 SQL 的查詢語法，也提供了 JDBC 等包可以執行相應的 SQL

這裡的例子用的是 ES 自己的查詢語法

插入資料

curl -X POST 'http://localhost:9200/my_index/_doc'  -H 'Content-Type: application/json' -d '{
  "name": "Wang",
  "title": "software designer",
  "age": 35,
  "address": {"city": "guangzhou", "district": "tianhe"},
  "content": "I want to do some AI machine learning works"
}'

會自動建立 my_index, _doc, 以及各個 field

檢視 index

curl localhost:9200/my_index?pretty


{
  "my_index" : {
    "aliases" : { },
    "mappings" : {
      "_doc" : {
        "properties" : {
          "address" : {
            "properties" : {
              "city" : {
                "type" : "text",
                "fields" : {
                  "keyword" : {
                    "type" : "keyword",
                    "ignore_above" : 256
                  }
                }
              },
              "district" : {
                "type" : "text",
                "fields" : {
                  "keyword" : {
                    "type" : "keyword",
                    "ignore_above" : 256
                  }
                }
              }
            }
          },
          "age" : {
            "type" : "long"
          },
          "content" : {
            "type" : "text",
            "fields" : {
              "keyword" : {
                "type" : "keyword",
                "ignore_above" : 256
              }
            }
          },
          "name" : {
            "type" : "text",
            "fields" : {
              "keyword" : {
                "type" : "keyword",
                "ignore_above" : 256
              }
            }
          },
          "title" : {
            "type" : "text",
            "fields" : {
              "keyword" : {
                "type" : "keyword",
                "ignore_above" : 256
              }
            }
          }
        }
      }
    },
    "settings" : {
      "index" : {
        "creation_date" : "1628737053188",
        "number_of_shards" : "5",
        "number_of_replicas" : "1",
        "uuid" : "-NHgaqt4R_SQs2KHd0aJwQ",
        "version" : {
          "created" : "6050199"
        },
        "provided_name" : "my_index"
      }
    }
  }
}

會列出 setting 和 mapping

term 查詢

要求完全匹配，即查詢條件不分詞（資料預設是按分詞索引）

curl -X GET -H "Content-Type: application/json"  localhost:9200/my_index/_search?pretty -d '{
  "query": {
     "term":{
        "content":"machine"
     }
   }
}'

能查到結果

curl -X GET -H "Content-Type: application/json"  localhost:9200/my_index/_search?pretty -d '{
  "query": {
     "term":{
        "content":"machine learning"
     }
   }
}'

不能查到結果

因為查詢條件 "machine learning" 必須完全匹配，但資料是按分詞索引的，沒有 "machine learning" 這個分詞

curl -X GET -H "Content-Type: application/json"  localhost:9200/my_index/_search?pretty -d '{
  "query": {
     "term":{
        "content.keyword":"machine"
     }
   }
}'

不能查到結果

keyword 代表不查分詞資料，而是查原資料，原資料不和 "machine" 完全匹配

curl -X GET -H "Content-Type: application/json"  localhost:9200/my_index/_search?pretty -d '{
  "query": {
     "term":{
        "content.keyword":"I want to do some AI machine learning works"
     }
   }
}'

能查到結果

查詢條件和原資料完全匹配

curl -X GET -H "Content-Type: application/json"  localhost:9200/my_index/_search?pretty -d '{
  "query": {
     "term":{
        "address.city":"guangzhou"
     }
   }
}'

能查到巢狀欄位的結果

分詞大小寫

貌似要使用小寫查詢，可能因為 es 預設將分詞都轉換成小寫

terms 查詢

要求多個詞中的任意一個能完全匹配

curl -X GET -H "Content-Type: application/json"  localhost:9200/my_index/_search?pretty -d '{
  "query": {
     "terms":{
        "content": ["machine", "learning"]
     }
   }
}'

能查到結果，分詞 machine 和 learning 都能匹配上

match 查詢

分詞匹配，即查詢條件會被做分詞處理，並且任一分詞滿足即可

curl -X GET -H "Content-Type: application/json"  localhost:9200/my_index/_search?pretty -d '{
  "query": {
     "match":{
        "content": "machine learning"
     }
   }
}'

能查到結果，兩個分詞都匹配

curl -X GET -H "Content-Type: application/json"  localhost:9200/my_index/_search?pretty -d '{
  "query": {
     "match":{
        "content": "works machine AI"
     }
   }
}'

能查到結果，所有分詞都匹配，並且和順序無關

curl -X GET -H "Content-Type: application/json"  localhost:9200/my_index/_search?pretty -d '{
  "query": {
     "match":{
        "content": "machine factory"
     }
   }
}'

能查到結果，有一個分詞即 machine 滿足即可

match_phrase 查詢

查詢條件會被當成一個完整的詞彙對待，原資料包含這個詞彙才匹配
（對比 term 則是原資料和查詢詞彙完全一樣才匹配，match_phrase 是包含的關係）

curl -X GET -H "Content-Type: application/json"  localhost:9200/my_index/_search?pretty -d '{
  "query": {
     "match_phrase":{
        "content": "machine factory"
     }
   }
}'

不能查到結果，因為 machine factory 不匹配

curl -X GET -H "Content-Type: application/json"  localhost:9200/my_index/_search?pretty -d '{
  "query": {
     "match_phrase":{
        "content": "works machine AI"
     }
   }
}'

不能查到結果，雖然原資料包含這三個分詞，但 match_phrase 是把 works machine AI 當成一個完整的單詞對待

curl -X GET -H "Content-Type: application/json"  localhost:9200/my_index/_search?pretty -d '{
  "query": {
     "match_phrase":{
        "content": "AI machine learning"
     }
   }
}'

能查到結果，因為 AI machine learning 有作為完整連續的單詞出現

curl -X GET -H "Content-Type: application/json"  localhost:9200/my_index/_search?pretty -d '{
  "query": {
     "match_phrase":{
        "content": {
           "query": "some AI works",
           "slop" : 2
        }
     }
   }
}'

能查到結果

雖然 some AI works 作為一個完整的單詞沒有出現，但 slop 2 表示如果最多跳過兩個分詞就能滿足的話也算匹配上，這裡跳過 machine learning

curl -X GET -H "Content-Type: application/json"  localhost:9200/my_index/_search?pretty -d '{
  "query": {
     "match_phrase":{
        "content": {
           "query": "some AI works",
           "slop" : 1
        }
     }
   }
}'

不能查到結果，只跳過一個分詞依然無法匹配上

multi_match

對多個欄位進行 match 查詢，有一個欄位滿足的話就算匹配上

curl -X GET -H "Content-Type: application/json"  localhost:9200/my_index/_search?pretty -d '{
  "query": {
     "multi_match":{
        "query": "machine learning",
        "fields" : ["title", "content"]
     }
   }
}'

能查到結果，content 滿足查詢條件

curl -X GET -H "Content-Type: application/json"  localhost:9200/my_index/_search?pretty -d '{
  "query": {
     "multi_match":{
        "query": "designer",
        "fields" : ["title", "content"]
     }
   }
}'

能查到結果，title 滿足查詢條件

curl -X GET -H "Content-Type: application/json"  localhost:9200/my_index/_search?pretty -d '{
  "query": {
     "multi_match":{
        "query": "manager",
        "fields" : ["title", "content"]
     }
   }
}'

不能查到結果，title 和 content 都不滿足查詢條件

match_all 查詢

返回所有文件

curl -X GET -H "Content-Type: application/json"  localhost:9200/my_index/_search?pretty -d '{
  "query": {
     "match_all":{}
   }
}'

不指定條件

bool 查詢

聯合查詢，多個條件同時滿足才匹配

每個條件可以是 must, filter, should, must_not

must: 必須滿足 must 子句的條件，並且參與計算分值
filter: 必須滿足 filter 子句的條件，不參與計算分值
should: 至少滿足 should 子句的一個或多個條件(由 minimum_should_match 引數決定)，參與計算分值
must_not: 必須不滿足 must_not 定義的條件

curl -X GET -H "Content-Type: application/json"  localhost:9200/my_index/_search?pretty -d '{
  "query": {
     "bool":{
        "must": [
            {
                "term": {"address.city": "guangzhou"}
            },
            {
                "match": {"content": "machine learning"}
            }
        ],
        "must_not": {
            "range": {"age": {"gt": 35}}
        },
        "filter": {
            "match": {"title": "designer"}
        },
        "should": [
            {
                "term": {"name": "Li"}
            },
            {
                "match_phrase": {"content": "AI machine"}
            }
        ],
        "minimum_should_match" : 1
     }
   }
}'

能查到結果，因為 bool 下的所有條件都能滿足

curl -X GET -H "Content-Type: application/json"  localhost:9200/my_index/_search?pretty -d '{
  "query": {
     "bool":{
        "must": [
            {
                "term": {"address.city": "guangzhou"}
            },
            {
                "match": {"content": "machine learning"}
            }
        ],
        "must_not": {
            "range": {"age": {"from": 35, "to": 40}}
        },
        "filter": {
            "match": {"title": "designer"}
        },
        "should": [
            {
                "term": {"name": "Li"}
            },
            {
                "match_phrase": {"content": "AI machine"}
            }
        ],
        "minimum_should_match" : 1
     }
   }
}'

不能查到結果，因為 must_not 條件不滿足

bool 可以巢狀

curl -X GET -H "Content-Type: application/json"  localhost:9200/my_index/_search?pretty -d '{
  "query": {
     "bool":{
        "must": [
            {
                "bool": {
                    "should": [
                        {
                            "term": {"name": "Li"}
                        },
                        {
                            "match_phrase": {"content": "AI machine"}
                        }
                    ]
                }
            },
            {
                "bool": {
                    "filter": {
                        "match": {"title": "designer"}
                    }
                }
            }
        ]
     }
   }
}'

must 裡面是多個 bool 查詢

控制查詢返回數

curl -X GET -H "Content-Type: application/json"  localhost:9200/my_index/_search?pretty -d '{
  "from":0,
  "size":2,
  "query": {
     "term":{
        "title":"designer"
     }
   }
}'

從第一個開始，最多返回 2 個

控制返回欄位

就像 SQL 的 select 選擇特定欄位一樣

curl -X GET -H "Content-Type: application/json"  localhost:9200/my_index/_search?pretty -d '{
  "_source":["name","age"],
  "query": {
     "term":{
        "title":"designer"
     }
   }
}'

只返回匹配文件的 name 和 age 欄位

排序

指定排序欄位

curl -X GET -H "Content-Type: application/json"  localhost:9200/my_index/_search?pretty -d '{
  "_source":["name","age"],
  "query": {
     "term":{
        "title":"designer"
     }
  },
  "sort": [
     {
         "age": {"order": "desc"}
     }
  ]
}'

結果按 age 的降序排

範圍查詢

支援 from, to, gte, lte, gt, lt 等等，比如

{
    "query": {
        "range": {
            "date": {
                "gte": "2021-08-01",
                "lte": "2021-08-02",
                "relation": "within",
                "format": "yyyy-MM-dd"
            }
        }
    }
}

relation 也可以是 CONTAINS, INTERSECTS (預設)

因為 date 欄位可以是一個範圍，比如

"date": {"gte":"2021-08-01","lte":"2021-08-03"}

within 表示 date 的範圍在 range 的範圍內
contains 表示 date 的範圍包含了 range 的範圍
intersects 表示 date 的範圍和 range 的範圍有交叉

可以通過 format 指定日期格式

萬用字元查詢

支援 * 和 ?

* 代表 0 個或多個字元
? 代表任意一個字元

模糊查詢

查詢類似的單詞

curl -X GET -H "Content-Type: application/json"  localhost:9200/my_index/_search?pretty -d '{
  "query": {
     "fuzzy":{
        "title":"desinger"
     }
   }
}'

雖然 desinger 寫錯了，但還是能查到

分值

查詢結果會有 _score 欄位表示該文件和查詢條件的相關度

決定分值的因素包括

詞頻: 在文件中出現的次數越多，權重越高
逆向文件頻率: 單詞在所有文件中出現的次數越多，權重越低，比如 and/the 等詞彙
文件長度: 文件越長權重越高

aggregation 查詢

插入更多資料

curl -X POST 'http://localhost:9200/my_index/_doc'  -H 'Content-Type: application/json' -d '{
  "name": "Wang",
  "title": "software designer",
  "age": 35,
  "address": {"city": "guangzhou", "district": "tianhe"},
  "content": "I want to do some AI machine learning works",
  "kpi": 3.2,
  "date": "2021-01-01T08:00:00Z"
}'

curl -X POST 'http://localhost:9200/my_index/_doc'  -H 'Content-Type: application/json' -d '{
  "name": "Li",
  "title": "senior software designer",
  "age": 30,
  "address": {"city": "guangzhou", "district": "tianhe"},
  "content": "I want to do some K8S works",
  "kpi": 4.0,
  "date": "2021-01-01T10:00:00Z"
}'

curl -X POST 'http://localhost:9200/my_index/_doc'  -H 'Content-Type: application/json' -d '{
  "name": "Zhang",
  "title": "Test Engineer",
  "age": 25,
  "address": {"city": "guangzhou", "district": "tianhe"},
  "content": "I want to do some auto-test works",
  "kpi": 4.5,
  "date": "2021-06-01T09:00:00Z"
}'

Aggregation 查詢包括以下幾類

Bucket aggregations: 統計每個分組的記錄的數量
Metrics aggregations: 統計每個分組的記錄的平均值/最大值/等等
Pipeline aggregations: 對 agg 的結果再做進一步計算

下面舉出部門 agg 操作的例子

所有的 agg 操作參考 https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations.html

metrics aggregation : avg

curl -X GET -H "Content-Type: application/json"  localhost:9200/my_index/_search?pretty -d '{
  "size": 0,
  "query": {
     "match":{
        "title": "designer Engineer"
     }
  },
  "aggs": {
     "kpi_avg": {
        "avg": { 
           "field": "kpi",
           "missing": 3.5
        } 
     }
  }
}'

aggs 是關鍵字，寫成 aggregations 也可以
kpi_avg 是自定義名字，會在結果中出現
avg 是關鍵字，表示要做 avg 操作，field 指定要做 avg 的欄位，missing 表示如果欄位不存在的話要使用的預設值

如果不指定 query 就是對所有資料做 agg

如果不指定 size 為 0，除了打出 agg 的結果，還會把匹配的資料都打出來
指定了 size 為 0 後，就只打出 agg 的結果

  "aggregations" : {
    "kpi_avg" : {
      "value" : 3.900000015894572
    }
  }

可以一次指定多個 aggs 查詢

metrics aggregation : avg & histogram

如果資料是 histogram 型別 (需要建立 index 時指定)

curl -X PUT 'http://localhost:9200/my_index_histogram' -H 'Content-Type: application/json' -d '{
  "mappings" : {
    "properties" : {
      "my_histogram" : {
        "type" : "histogram"
      }
    }
  }
}'

curl -X POST 'http://localhost:9200/my_index_histogram/_doc'  -H 'Content-Type: application/json' -d '{
  "name": "Zhao",
  "title": "manager",
  "age": 30,
  "address": {"city": "guangzhou", "district": "tianhe"},
  "my_histogram": {
      "values" : [3.5, 4.0, 4.5], 
      "counts" : [1, 2, 3] 
  }
}'

avg 處理是 (3.51 + 4.02 + 4.5*3) / (1+2+3)

curl -X GET -H "Content-Type: application/json"  localhost:9200/my_index_histogram/_search?pretty -d '{
  "size": 0,
  "aggs": {
     "score_avg": {
        "avg": { 
           "field": "my_histogram"
        } 
     }
  }
}'

結果

  "aggregations" : {
    "score_avg" : {
      "value" : 4.166666666666667
    }
  }

預設自動建立的 index 欄位不是 histogram 的

metrics aggregation : max/min/sum

curl -X GET -H "Content-Type: application/json"  localhost:9200/my_index/_search?pretty -d '{
  "size": 0,
  "aggs": {
     "kpi_avg": {
        "max": { 
           "field": "kpi"
        } 
     }
  }
}'

curl -X GET -H "Content-Type: application/json"  localhost:9200/my_index/_search?pretty -d '{
  "size": 0,
  "aggs": {
     "kpi_avg": {
        "min": { 
           "field": "kpi"
        } 
     }
  }
}'

curl -X GET -H "Content-Type: application/json"  localhost:9200/my_index/_search?pretty -d '{
  "size": 0,
  "aggs": {
     "kpi_avg": {
        "sum": { 
           "field": "kpi"
        } 
     }
  }
}'

計算 min/max/sum 等等

metrics aggregation : boxplot

curl -X GET -H "Content-Type: application/json"  localhost:9200/my_index/_search?pretty -d '{
  "size": 0,
  "query": {
     "match":{
        "title": "designer Engineer"
     }
  },
  "aggs": {
     "kpi_avg": {
        "boxplot": { 
           "field": "kpi",
           "missing": 3.5
        }
     }
  }
}'

結果

  "aggregations" : {
    "kpi_avg" : {
      "min" : 3.200000047683716,
      "max" : 4.5,
      "q1" : 3.400000035762787,
      "q2" : 4.0,
      "q3" : 4.375,
      "lower" : 3.200000047683716,
      "upper" : 4.5
    }
  }

箱型圖

q1 : 下四分位數 (25%)
q2 : 中位數 (50%)
q3 : 上四分位數 (75%)
lower : 不小於 q1-1.5(q3-q1) 的值中的最小值
upper : 不大於 q3+1.5(q3-q1) 的值中的最大值
min : 最小值
max : 最大值

metrics aggregation : cardinality

curl -X GET -H "Content-Type: application/json"  localhost:9200/my_index/_search?pretty -d '{
  "size": 0,
  "query": {
     "match":{
        "title": "designer Engineer"
     }
  },
  "aggs": {
     "kpi_count": {
        "cardinality": { 
           "field": "kpi"
        }
     }
  }
}'

結果

  "aggregations" : {
    "kpi_count" : {
      "value" : 3
    }
  }

相當於 count(distinct field)

metrics aggregation : extended_stats

curl -X GET -H "Content-Type: application/json"  localhost:9200/my_index/_search?pretty -d '{
  "size": 0,
  "query": {
     "match":{
        "title": "designer Engineer"
     }
  },
  "aggs": {
     "kpi_stats": {
        "extended_stats": { 
           "field": "kpi"
        }
     }
  }
}'

結果

  "aggregations" : {
    "kpi_stats" : {
      "count" : 3,
      "min" : 3.200000047683716,
      "max" : 4.5,
      "avg" : 3.900000015894572,
      "sum" : 11.700000047683716,
      "sum_of_squares" : 46.49000030517578,
      "variance" : 0.28666664441426565,
      "variance_population" : 0.28666664441426565,
      "variance_sampling" : 0.4299999666213985,
      "std_deviation" : 0.5354125926930237,
      "std_deviation_population" : 0.5354125926930237,
      "std_deviation_sampling" : 0.6557438269792545,
      "std_deviation_bounds" : {
        "upper" : 4.970825201280619,
        "lower" : 2.8291748305085243,
        "upper_population" : 4.970825201280619,
        "lower_population" : 2.8291748305085243,
        "upper_sampling" : 5.2114876698530805,
        "lower_sampling" : 2.588512361936063
      }
    }
  }

各種統計結果

metrics aggregation : geo

建立 geo 型別的資料

curl -X PUT 'http://localhost:9200/my_index_geo' -H 'Content-Type: application/json' -d '{
  "mappings" : {
    "properties" : {
      "location" : {
        "type" : "geo_point"
      }
    }
  }
}'

curl -X POST 'http://localhost:9200/my_index_geo/_doc'  -H 'Content-Type: application/json' -d '{
  "location": "52.374081,4.912350"
}'

curl -X POST 'http://localhost:9200/my_index_geo/_doc'  -H 'Content-Type: application/json' -d '{
  "location": "52.369219,4.901618"
}'

curl -X POST 'http://localhost:9200/my_index_geo/_doc'  -H 'Content-Type: application/json' -d '{
  "location": "52.371667,4.914722"
}'

curl -X POST 'http://localhost:9200/my_index_geo/_doc'  -H 'Content-Type: application/json' -d '{
  "location": "51.222900,4.405200"
}'

可以獲取中心點

curl -X GET -H "Content-Type: application/json"  localhost:9200/my_index_geo/_search?pretty -d '{
  "size": 0,
  "aggs": {
     "centroid": {
        "geo_centroid": { 
           "field": "location"
        }
     }
  }
}'

結果

  "aggregations" : {
    "centroid" : {
      "location" : {
        "lat" : 52.08446673466824,
        "lon" : 4.783472470007837
      },
      "count" : 4
    }
  }

還可以獲取邊界等等

metrics aggregation : matrix_stats

計算均值/方差/均差/相關數，等等

bucket aggregation : adjacency_matrix

curl -X GET -H "Content-Type: application/json"  localhost:9200/my_index/_search?pretty -d '{
  "size": 0,
  "aggs": {
     "people_group": {
        "adjacency_matrix": { 
           "filters": {
              "grpA" : { "terms" : { "title" : ["designer", "engineer"] }},
              "grpB" : { "terms" : { "title" : ["senior", "software"] }},
              "grpC" : { "terms" : { "content" : ["test", "pv"] }}
           }
        }
     }
  }
}'

結果

  "aggregations" : {
    "people_group" : {
      "buckets" : [
        {
          "key" : "grpA",
          "doc_count" : 3
        },
        {
          "key" : "grpA&grpB",
          "doc_count" : 2
        },
        {
          "key" : "grpA&grpC",
          "doc_count" : 1
        },
        {
          "key" : "grpB",
          "doc_count" : 2
        },
        {
          "key" : "grpC",
          "doc_count" : 1
        }
      ]
    }
  }

按照 filters 條件計算每個分組的數量

bucket aggregation : composite

curl -X GET -H "Content-Type: application/json"  localhost:9200/my_index/_search?pretty -d '{
  "size": 0,
  "aggs": {
     "composite_group": {
        "composite": { 
           "sources": [
              {"title" : { "terms" : { "field" : "title.keyword"}}},
              {"age" : { "terms" : { "field" : "age"}}}
           ]
        }
     }
  }
}'

必須使用 keyword 這樣不分詞，否則聚合不了

結果

  "aggregations" : {
    "composite_group" : {
      "after_key" : {
        "title" : "software designer",
        "age" : 35
      },
      "buckets" : [
        {
          "key" : {
            "title" : "Test Engineer",
            "age" : 25
          },
          "doc_count" : 1
        },
        {
          "key" : {
            "title" : "senior software designer",
            "age" : 30
          },
          "doc_count" : 1
        },
        {
          "key" : {
            "title" : "software designer",
            "age" : 35
          },
          "doc_count" : 1
        }
      ]
    }
  }

可以看到，結果類似於

select 
  field_a, field_b, count(*)
group by
  field_a, field_b

除了 terms 還可以是 Histogram、Date histogram、GeoTile grid

curl -X GET -H "Content-Type: application/json"  localhost:9200/my_index/_search?pretty -d '{
  "size": 0,
  "aggs": {
    "my_buckets": {
      "composite": {
        "sources": [
          {
            "date": {
              "date_histogram": {
                "field": "date",
                "calendar_interval": "1d"
              }
            }
          }
        ]
      }
    }
  }
}'

比如這個 Date histogram 分組的時候, 把 date 欄位精確到天然後按天分組

  "aggregations" : {
    "my_buckets" : {
      "after_key" : {
        "date" : 1622505600000
      },
      "buckets" : [
        {
          "key" : {
            "date" : 1609459200000
          },
          "doc_count" : 2
        },
        {
          "key" : {
            "date" : 1622505600000
          },
          "doc_count" : 1
        }
      ]
    }
  }

可以通過 format 欄位指定日期格式

bucket aggregation : date_histogram/auto_date_histogram

curl -X GET -H "Content-Type: application/json"  localhost:9200/my_index/_search?pretty -d '{
  "size": 0,
  "aggs": {
    "my_buckets": {
      "date_histogram": {
         "field": "date",
         "calendar_interval": "1d"
      }
    }
  }
}'

按天統計，但這裡會從最小值到最大值，這個例子是從 2021-01-01 到 2021-06-01 每天出一個 bucket 哪怕是 0

只能是 1d，要指定多天的必須用 fixed_interval

curl -X GET -H "Content-Type: application/json"  localhost:9200/my_index/_search?pretty -d '{
  "size": 0,
  "aggs": {
    "my_buckets": {
      "date_histogram": {
         "field": "date",
         "fixed_interval": "2d"
      }
    }
  }
}'

auto_date_histogram 和 date_histogram 差不多，但是是通過指定 bucket 讓系統自動選擇 interval 儘量達成 bucket 目標數

curl -X GET -H "Content-Type: application/json"  localhost:9200/my_index/_search?pretty -d '{
  "size": 0,
  "aggs": {
    "my_buckets": {
      "auto_date_histogram": {
         "field": "date",
         "buckets": 3
      }
    }
  }
}'

結果

  "aggregations" : {
    "my_buckets" : {
      "buckets" : [
        {
          "key_as_string" : "2021-01-01T00:00:00.000Z",
          "key" : 1609459200000,
          "doc_count" : 2
        },
        {
          "key_as_string" : "2021-04-01T00:00:00.000Z",
          "key" : 1617235200000,
          "doc_count" : 1
        }
      ],
      "interval" : "3M"
    }
  }

系統自動選了 3M 做 interval

bucket aggregation : term/filter/filters

curl -X GET -H "Content-Type: application/json"  localhost:9200/my_index/_search?pretty -d '{
  "size": 0,
  "aggs": {
    "my_buckets": {
      "terms": {
         "field": "age"
      }
    }
  }
}'

按某個 field 統計分組數, 相當於 select age, count(*) from table group by age

curl -X GET -H "Content-Type: application/json"  localhost:9200/my_index/_search?pretty -d '{
  "size": 0,
  "aggs": {
    "my_buckets": {
      "filter": { "term": {"title": "designer"}}
    }
  }
}'

按某個 field 的某個值統計, 相當於 select count(*) from table where title like %designer%

curl -X GET -H "Content-Type: application/json"  localhost:9200/my_index/_search?pretty -d '{
  "size": 0,
  "aggs": {
    "my_buckets": {
      "filters": {
         "filters": {
            "title": { "match": {"title": "designer"}},
            "age": { "match": {"age": 35}}
        }
      }
    }
  }
}'

分別統計兩個 field 相當於做了兩次 filter 查詢

bucket aggregation : range/date range

curl -X GET -H "Content-Type: application/json"  localhost:9200/my_index/_search?pretty -d '{
  "size": 0,
  "aggs": {
    "age_range": {
      "range": {
         "field": "age",
         "ranges": [
           { "to": 35 },
           { "from": 30, "to": 35 },
           { "from": 35 }
         ]
      }
    }
  }
}'

結果

  "aggregations" : {
    "age_range" : {
      "buckets" : [
        {
          "key" : "*-35.0",
          "to" : 35.0,
          "doc_count" : 2
        },
        {
          "key" : "30.0-35.0",
          "from" : 30.0,
          "to" : 35.0,
          "doc_count" : 1
        },
        {
          "key" : "35.0-*",
          "from" : 35.0,
          "doc_count" : 1
        }
      ]
    }
  }

計算各個年齡段的數量

curl -X GET -H "Content-Type: application/json"  localhost:9200/my_index/_search?pretty -d '{
  "size": 0,
  "aggs": {
    "my_range": {
      "date_range": {
         "field": "date",
         "ranges": [
           { "to": "now-3M/M" },
           { "from": "now-3M/M" }
         ]
      }
    }
  }
}'

結果

  "aggregations" : {
    "my_range" : {
      "buckets" : [
        {
          "key" : "*-2021-05-01T00:00:00.000Z",
          "to" : 1.6198272E12,
          "to_as_string" : "2021-05-01T00:00:00.000Z",
          "doc_count" : 2
        },
        {
          "key" : "2021-05-01T00:00:00.000Z-*",
          "from" : 1.6198272E12,
          "from_as_string" : "2021-05-01T00:00:00.000Z",
          "doc_count" : 1
        }
      ]
    }
  }

計算各個時間段的數量

curl -X GET -H "Content-Type: application/json"  localhost:9200/my_index/_search?pretty -d '{
  "size": 0,
  "aggs": {
    "my_range": {
      "date_range": {
         "field": "date",
         "ranges": [
           { "key": "Older", "from":"2021-05-01" },
           { "key": "Newer", "to":"2021-05-01" }
         ]
      }
    }
  }
}'

指定具體日期

即分組統計 (count) 又計數 (如 avg)

curl -X GET -H "Content-Type: application/json"  localhost:9200/my_index/_search?pretty -d '{
  "size": 0,
  "aggs": {
    "my_buckets": {
      "filter": { "term": {"title": "designer"}},
      "aggs": {
         "avg_age": { "avg": { "field": "age" } }
      }
    }
  }
}'

結果

  "aggregations" : {
    "my_buckets" : {
      "doc_count" : 2,
      "avg_age" : {
        "value" : 32.5
      }
    }
  }

可以看到即統計分組數，又對分組計算平均值

Pipeline aggregations : avg_bucket

POST _search
{
  "size": 0,
  "aggs": {
    "sales_per_month": {
      "date_histogram": {
        "field": "date",
        "calendar_interval": "month"
      },
      "aggs": {
        "sales": {
          "sum": {
            "field": "price"
          }
        }
      }
    },
    "avg_monthly_sales": {
// tag::avg-bucket-agg-syntax[]               
      "avg_bucket": {
        "buckets_path": "sales_per_month>sales",
        "gap_policy": "skip",
        "format": "#,##0.00;(#,##0.00)"
      }
// end::avg-bucket-agg-syntax[]               
    }
  }
}

結果

  "aggregations": {
    "sales_per_month": {
      "buckets": [
        {
          "key_as_string": "2015/01/01 00:00:00",
          "key": 1420070400000,
          "doc_count": 3,
          "sales": {
            "value": 550.0
          }
        },
        {
          "key_as_string": "2015/02/01 00:00:00",
          "key": 1422748800000,
          "doc_count": 2,
          "sales": {
            "value": 60.0
          }
        },
        {
          "key_as_string": "2015/03/01 00:00:00",
          "key": 1425168000000,
          "doc_count": 2,
          "sales": {
            "value": 375.0
          }
        }
      ]
    },
    "avg_monthly_sales": {
      "value": 328.33333333333333,
      "value_as_string": "328.33"
    }
  }

計算每個 bucket 的 avg，再計算 bucket avg 的 avg

Pipeline aggregations : cumulative_sum

POST /sales/_search
{
  "size": 0,
  "aggs": {
    "sales_per_month": {
      "date_histogram": {
        "field": "date",
        "calendar_interval": "month"
      },
      "aggs": {
        "sales": {
          "sum": {
            "field": "price"
          }
        },
        "cumulative_sales": {
          "cumulative_sum": {
            "buckets_path": "sales" 
          }
        }
      }
    }
  }
}

結果

   "aggregations": {
      "sales_per_month": {
         "buckets": [
            {
               "key_as_string": "2015/01/01 00:00:00",
               "key": 1420070400000,
               "doc_count": 3,
               "sales": {
                  "value": 550.0
               },
               "cumulative_sales": {
                  "value": 550.0
               }
            },
            {
               "key_as_string": "2015/02/01 00:00:00",
               "key": 1422748800000,
               "doc_count": 2,
               "sales": {
                  "value": 60.0
               },
               "cumulative_sales": {
                  "value": 610.0
               }
            },
            {
               "key_as_string": "2015/03/01 00:00:00",
               "key": 1425168000000,
               "doc_count": 2,
               "sales": {
                  "value": 375.0
               },
               "cumulative_sales": {
                  "value": 985.0
               }
            }
         ]
      }
   }

計算每個 bucket 的 sum，再計算 bucket sum 在每個階段的累加 sum

Pipeline aggregations : max_bucket

POST /sales/_search
{
  "size": 0,
  "aggs": {
    "sales_per_month": {
      "date_histogram": {
        "field": "date",
        "calendar_interval": "month"
      },
      "aggs": {
        "sales": {
          "sum": {
            "field": "price"
          }
        }
      }
    },
    "max_monthly_sales": {
      "max_bucket": {
        "buckets_path": "sales_per_month>sales" 
      }
    }
  }
}

結果

   "aggregations": {
      "sales_per_month": {
         "buckets": [
            {
               "key_as_string": "2015/01/01 00:00:00",
               "key": 1420070400000,
               "doc_count": 3,
               "sales": {
                  "value": 550.0
               }
            },
            {
               "key_as_string": "2015/02/01 00:00:00",
               "key": 1422748800000,
               "doc_count": 2,
               "sales": {
                  "value": 60.0
               }
            },
            {
               "key_as_string": "2015/03/01 00:00:00",
               "key": 1425168000000,
               "doc_count": 2,
               "sales": {
                  "value": 375.0
               }
            }
         ]
      },
      "max_monthly_sales": {
          "keys": ["2015/01/01 00:00:00"], 
          "value": 550.0
      }
   }

計算每個 bucket 的 sum，再取 sum 最大的 bucket

指令碼執行

支援指令碼查詢: 略

ElasticSearch 查詢命令

查詢方式

插入資料

檢視 index

term 查詢

分詞大小寫

terms 查詢

match 查詢

match_phrase 查詢

multi_match

match_all 查詢

bool 查詢

控制查詢返回數

控制返回欄位

排序

範圍查詢

萬用字元查詢

模糊查詢

分值

aggregation 查詢

metrics aggregation : avg

metrics aggregation : avg & histogram

metrics aggregation : max/min/sum

metrics aggregation : boxplot

metrics aggregation : cardinality

metrics aggregation : extended_stats

metrics aggregation : geo

metrics aggregation : matrix_stats

bucket aggregation : adjacency_matrix

bucket aggregation : composite

bucket aggregation : date_histogram/auto_date_histogram

bucket aggregation : term/filter/filters

bucket aggregation : range/date range

即分組統計 (count) 又計數 (如 avg)

Pipeline aggregations : avg_bucket

Pipeline aggregations : cumulative_sum

Pipeline aggregations : max_bucket

指令碼執行

相關推薦