ElasticSearch教程——kibana巢狀聚合,下鑽分析,聚合分析
兩個核心概念:bucket和metric
city name
北京 小李
北京 小王
上海 小張
上海 小麗
上海 小陳
基於city劃分buckets
劃分出來兩個bucket,一個是北京bucket,一個是上海bucket
北京bucket:包含了2個人,小李,小王
上海bucket:包含了3個人,小張,小麗,小陳
按照某個欄位進行bucket劃分,那個欄位的值相同的那些資料,就會被劃分到一個bucket中
有一些mysql的sql知識的話,聚合,首先第一步就是分組,對每個組內的資料進行聚合分析,分組,就是我們的bucket
metric:對一個數據分組執行的統計
當我們有了一堆bucket之後,就可以對每個bucket中的資料進行聚合分詞了,比如說計算一個bucket內所有資料的數量,或者計算一個bucket內所有資料的平均值,最大值,最小值
bucket:group by user_id --> 那些user_id相同的資料,就會被劃分到一個bucket中
metric,就是對一個bucket執行的某種聚合分析的操作,比如說求平均值,求最大值,求最小值
計算一個數量計算每個tag下的商品數量
GET /ecommerce/product/_search
{
"size" : 0,
"aggs": {
"group_by_tags": {
"terms": { "field": "tags" }
}
}
}
size:只獲取聚合結果,而不要執行聚合的原始資料
aggs:固定語法,要對一份資料執行分組聚合操作
gourp_by_tags:就是對每個aggs,都要起一個名字,這個名字是隨機的,你隨便取什麼都ok
terms:根據欄位的值進行分組
field:根據指定的欄位的值進行分組將文字
field的fielddata屬性設定為true (正排索引 用於巢狀聚合查詢, 詳細檢視fielddata原理初探)
PUT /ecommerce/_mapping/product
{
"properties": {
"tags": {
"type": "text",
"fielddata": true
}
}
}
GET /ecommerce/product/_search { "size": 0, "aggs": { "all_tags": { "terms": { "field": "tags" } } } } { "took": 20, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 4, "max_score": 0, "hits": [] }, "aggregations": { "group_by_tags": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "key": "fangzhu", "doc_count": 2 }, { "key": "meibai", "doc_count": 2 }, { "key": "qingxin", "doc_count": 1 } ] } } }
hits.hits:我們指定了size是0,所以hits.hits就是空的,否則會把執行聚合的那些原始資料給你返回回來
aggregations:聚合結果
gourp_by_tags:我們指定的某個聚合的名稱
buckets:根據我們指定的field劃分出的buckets
key:每個bucket對應的那個值
doc_count:這個bucket分組內,有多少個數據
每種tag對應的bucket中的資料的
預設的排序規則:按照doc_count降序排序
按搜尋結果聚合
對名稱中包含yagao的商品,計算每個tag下的商品數量
GET /ecommerce/product/_search
{
"size": 0,
"query": {
"match": {
"name": "yagao"
}
},
"aggs": {
"all_tags": {
"terms": {
"field": "tags"
}
}
}
}
{
"took": 35,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 3,
"max_score": 0,
"hits": []
},
"aggregations": {
"all_tags": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "fangzhu",
"doc_count": 2
},
{
"key": "meibai",
"doc_count": 1
},
{
"key": "qingxin",
"doc_count": 1
}
]
}
}
}
top_hits 獲取前幾個doc_
source 返回指定field
GET /ecommerce/product/_search
{
"size": 0,
"aggs" : {
"group_by_tags" : {
"terms" : { "field" : "tags" },
"aggs" : {
"top_tags": {
"top_hits": {
"_source": {
"include": "name"
},
"size": 1
}
}
}
}
}
}
計算每個tag下的商品的平均價格/最小价格/最大價格/總價
count:bucket,terms,自動就會有一個doc_count,就相當於是count
avg:avg aggs,求平均值
max:求一個bucket內,指定field值最大的那個資料
min:求一個bucket內,指定field值最小的那個資料
sum:求一個bucket內,指定field值的總和先分組,再算每組的平均值
GET /ecommerce/product/_search
{
"size": 0,
"aggs" : {
"group_by_tags" : {
"terms" : { "field" : "tags" },
"aggs" : {
"avg_price": { "avg": { "field": "price" } },
"min_price" : { "min": { "field": "price"} },
"max_price" : { "max": { "field": "price"} },
"sum_price" : { "sum": { "field": "price" } }
}
}
}
avg_price:我們自己取的metric aggs的名字
value:我們的metric計算的結果,每個bucket中的資料的price欄位求平均值後的結果
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 3,
"max_score": 0,
"hits": []
},
"aggregations": {
"group_by_tags": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "fangzhu",
"doc_count": 2,
"max_price": {
"value": 30
},
"min_price": {
"value": 25
},
"avg_price": {
"value": 27.5
},
"sum_price": {
"value": 55
}
},
{
"key": "meibai",
"doc_count": 1,
"max_price": {
"value": 30
},
"min_price": {
"value": 30
},
"avg_price": {
"value": 30
},
"sum_price": {
"value": 30
}
},
{
"key": "qingxin",
"doc_count": 1,
"max_price": {
"value": 40
},
"min_price": {
"value": 40
},
"avg_price": {
"value": 40
},
"sum_price": {
"value": 40
}
}
]
}
}
}
collect_mode
對於子聚合的計算,有兩種方式:
- depth_first 直接進行子聚合的計算
- breadth_first 先計算出當前聚合的結果,針對這個結果在對子聚合進行計算。
"order": { "avg_price": "desc" }
計算每個tag下的商品的平均價格,並且按照平均價格降序排序
GET /ecommerce/product/_search
{
"size": 0,
"aggs" : {
"all_tags" : {
"terms" : { "field" : "tags", "collect_mode" : "breadth_first", "order": { "avg_price": "desc" } },
"aggs" : {
"avg_price" : {
"avg" : { "field" : "price" }
}
}
}
}
}
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 3,
"max_score": 0,
"hits": []
},
"aggregations": {
"all_tags": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "qingxin",
"doc_count": 1,
"avg_price": {
"value": 40
}
},
{
"key": "meibai",
"doc_count": 1,
"avg_price": {
"value": 30
}
},
{
"key": "fangzhu",
"doc_count": 2,
"avg_price": {
"value": 27.5
}
}
]
}
}
}
"ranges": [{},{}]
按照指定的價格範圍區間進行分組,然後在每組內再按照tag進行分組,最後再計算每組的平均價格
GET /ecommerce/product/_search
{
"size": 0,
"aggs": {
"group_by_price": {
"range": {
"field": "price",
"ranges": [
{
"from": 0,
"to": 20
},
{
"from": 20,
"to": 40
},
{
"from": 40,
"to": 50
}
]
},
"aggs": {
"group_by_tags": {
"terms": {
"field": "tags"
},
"aggs": {
"average_price": {
"avg": {
"field": "price"
}
}
}
}
}
}
}
}
histogram
類似於terms,也是進行bucket分組操作,接收一個field,按照這個field的值的各個範圍區間,進行bucket分組操作
interval:10,劃分範圍,0~10,10~20,20~30
GET /ecommerce/product/_search
{
"size" : 0,
"aggs":{
"price":{
"histogram":{
"field": "price",
"interval": 10
},
"aggs":{
"revenue": {
"sum": {
"field" : "price"
}
}
}
}
}
}
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 3,
"max_score": 0,
"hits": []
},
"aggregations": {
"price": {
"buckets": [
{
"key": 20,
"doc_count": 1,
"revenue": {
"value": 25
}
},
{
"key": 30,
"doc_count": 1,
"revenue": {
"value": 30
}
},
{
"key": 40,
"doc_count": 1,
"revenue": {
"value": 40
}
}
]
}
}
}
date histogram
按照我們指定的某個date型別的日期field,以及日期interval,按照一定的日期間隔,去劃分bucket
date interval = 1m,
2017-01-01~2017-01-31,就是一個bucket
2017-02-01~2017-02-28,就是一個bucket
然後會去掃描每個資料的date field,判斷date落在哪個bucket中,就將其放入那個bucket
min_doc_count:即使某個日期interval,2017-01-01~2017-01-31中,一條資料都沒有,那麼這個區間也是要返回的,不然預設是會過濾掉這個區間的
extended_bounds,min,max:劃分bucket的時候,會限定在這個起始日期,和截止日期內
GET /tvs/sales/_search
{
"size" : 0,
"aggs": {
"sales": {
"date_histogram": {
"field": "sold_date",
"interval": "month",
"format": "yyyy-MM-dd",
"min_doc_count" : 0,
"extended_bounds" : {
"min" : "2016-01-01",
"max" : "2017-12-31"
}
}
}
}
}
{
"took": 11,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 8,
"max_score": 0,
"hits": []
},
"aggregations": {
"sales": {
"buckets": [
{
"key_as_string": "2016-01-01",
"key": 1451606400000,
"doc_count": 0
},
{
"key_as_string": "2016-02-01",
"key": 1454284800000,
"doc_count": 0
},
{
"key_as_string": "2016-03-01",
"key": 1456790400000,
"doc_count": 0
},
{
"key_as_string": "2016-04-01",
"key": 1459468800000,
"doc_count": 0
},
{
"key_as_string": "2016-05-01",
"key": 1462060800000,
"doc_count": 1
},
.....
]
}
}
}
aggregation,scope,一個聚合操作,必須在query的搜尋結果範圍內執行
出來兩個結果,一個結果,是基於query搜尋結果來聚合的; 一個結果,是對所有資料執行聚合的
global
就是global bucket,就是將所有資料納入聚合的scope,而不管之前的query
GET /tvs/sales/_search
{
"size": 0,
"query": {
"term": {
"brand": {
"value": "長虹"
}
}
},
"aggs": {
"single_brand_avg_price": {
"avg": {
"field": "price"
}
},
"all": {
"global": {},
"aggs": {
"all_brand_avg_price": {
"avg": {
"field": "price"
}
}
}
}
}
}
{
"took": 4,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 3,
"max_score": 0,
"hits": []
},
"aggregations": {
"all": {
"doc_count": 8,
"all_brand_avg_price": {
"value": 2650
}
},
"single_brand_avg_price": {
"value": 1666.6666666666667
}
}
}
single_brand_avg_price:就是針對query搜尋結果,執行的,拿到的,就是長虹品牌的平均價格
all.all_brand_avg_price:拿到所有品牌的平均價格
統計某品牌近三十天的平均價格
GET /tvs/sales/_search
{
"size": 0,
"query": {
"term": {
"brand": {
"value": "長虹"
}
}
},
"aggs": {
"recent_150d": {
"filter": {
"range": {
"sold_date": {
"gte": "now-30d"
}
}
},
"aggs": {
"recent_150d_avg_price": {
"avg": {
"field": "price"
}
}
}
}
}
}
aggs.filter,針對的是聚合去做的
如果放query裡面的filter,是全域性的,會對所有的資料都有影響
但是,如果,比如說,你要統計,長虹電視,最近1個月的平均值; 最近3個月的平均值; 最近6個月的平均值
bucket filter:對不同的bucket下的aggs,進行filter