ElasticSearch超強聚合查詢(一)
Elasticsearch聚合查詢一
聚合與搜尋的概念
通俗的說:搜尋是查詢某些具體的文件.然而聚合就是對這些搜尋到的文件進行統計.例如:
(你的es資料裡面記錄的都是一些關於針的資料)
- 針的平均長度是多少?
- 按照針的製造商來分組,針的長度中位值是多少?
- 每個月加入到某地區中的針有多少?
上面這些問題就是資料的聚合.聚合還可以有更加細緻的問題:
- 最受歡迎的針的製造商是什麼?
- 在資料中是否有異常的針?
聚合可以計算很多我們需要的資料,這些資料統計在關係行資料中的計算可能要花很長的時間,但是在Elasticsearch中,雖然這個和實現查詢的功能不同,但是他們使用相同的資料結構,它可以很快的速度就能把這些資料計算出來,就和查詢的速度幾乎是一樣的,而且這些資料結果還是實時
高階概念
- Buckets(桶/集合):滿足特定條件的文件的集合
- Metrics(指標):對桶內的文件進行統計計算(例如最小值,求和,最大值等).
舉例說明—關於汽車資料的相關聚合(Index=cars;type=transactions)
- 第一步新增建立相關的資料
POST /cars/transactions/_bulk { "index": {}} { "price" : 10000, "color" : "red", "make" : "honda", "sold" : "2014-10-28" } { "index": {}} { "price" : 20000, "color" : "red", "make" : "honda", "sold" : "2014-11-05" } { "index": {}} { "price" : 30000, "color" : "green", "make" : "ford", "sold" : "2014-05-18" } { "index": {}} { "price" : 15000, "color" : "blue", "make" : "toyota", "sold" : "2014-07-02" } { "index": {}} { "price" : 12000, "color" : "green", "make" : "toyota", "sold" : "2014-08-19" } { "index": {}} { "price" : 20000, "color" : "red", "make" : "honda", "sold" : "2014-11-05" } { "index": {}} { "price" : 80000, "color" : "red", "make" : "bmw", "sold" : "2014-01-01" } { "index": {}} { "price" : 25000, "color" : "blue", "make" : "ford", "sold" : "2014-02-12" }
注意點:官方文件說明,如何設定fildData.
Fielddata is disabled on text fields by default. Set fielddata=true on [your_field_name] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory.
設定方法:
實戰之—-查詢那個顏色的汽車銷量最好?
使用http-restfull查詢
GET /cars/transactions/_search { "size" : 0,//不需要返回文件,所以直接設定為0.可以提高查詢速度 "aggs" : { //這個是aggregations的縮寫,這邊使用者隨意,可以寫全稱也可以縮寫 "popular_colors" : { //定義一個聚合的名字,與java的方法命名類似,建議用'_'線來分隔單詞 "terms" : { //定義單個桶(集合)的型別為 terms "field" : "color"(欄位顏色進行分類,類似於sql中的group by color) } } } }
使用java-api的形式查詢
public void aggsTermsQuery(){
SearchResponse response = transportClient.prepareSearch("cars")
.setTypes("transactions")
.addAggregation(
AggregationBuilders.terms("popular_colors")
.field("color"))
.setSize(0)
.get();
Aggregation popular_colors = response.getAggregations().get("popular_colors");
}
返回的結果
{
...
"hits": {
"hits": [] //因為我們設定了返回的文件數量為0,所以在這個文件裡面是不會包含具體的文件的
},
"aggregations": {
"popular_colors": {
"buckets": [
{
"key": "red",
"doc_count": 4 //在紅色車子集合的數量
},
{
"key": "blue",
"doc_count": 2
},
{
"key": "green",
"doc_count": 2
}
]
}
}
}
實戰之—-在上面的聚合基礎上新增一些指標—>’average‘平均價格
- http請求查詢
GET /cars/transactions/_search
{
"size" : 0,
"aggs": {
"colors": {
"terms": {
"field": "color"
},
"aggs": { //為指標新增aggs層
"avg_price": { //指定指標的名字,在返回的結果中也是用這個變數名來儲存數值的
"avg": {//指標引數:平均值
"field": "price" //明確求平均值的欄位為'price'
}
}
}
}
}
}
- java-api查詢
@Test
public void setMertricsQuery(){
SearchResponse response = transportClient.prepareSearch("cars")
.setTypes("transactions")
.addAggregation(
AggregationBuilders.terms("colors")
.field("color")
//新增指標
.subAggregation(AggregationBuilders
.avg("avg_price")
.field("price")
)
)
.setSize(0)
.get();
Aggregation colors = response.getAggregations().get("colors");
}
{
...
"aggregations": {
"colors": {
"buckets": [
{
"key": "red",
"doc_count": 4,//這個指標是不需要設定都會帶上的
"avg_price": { //這個是我們在上面自定義的一個指標的名字
"value": 32500
}
},
{
"key": "blue",
"doc_count": 2,
"avg_price": {
"value": 20000
}
},
{
"key": "green",
"doc_count": 2,
"avg_price": {
"value": 21000
}
}
]
}
}
...
}
實戰之—-桶/集合(Buckets)的巢狀,在沙面的基礎上,先按照顏色劃分—>再汽車按照廠商劃分
- http請求
GET /cars/transactions/_search
{
"size" : 0,
"aggs": {
"colors": {
"terms": {
"field": "color"
},
"aggs": {
"avg_price": {
"avg": {
"field": "price"
}
},
"make": { //命名子集合的名字
"terms": {
"field": "make" //按照欄位'make'再次進行分類
}
}
}
}
}
}
- java-api請求方式
@Test
public void subMertricsQuery(){
SearchResponse response = transportClient.prepareSearch("cars")
.setTypes("transactions")
.addAggregation(
AggregationBuilders.terms("colors")
.field("color")
.subAggregation(AggregationBuilders
.avg("avg_price")
.field("price")
)
.subAggregation(AggregationBuilders
.terms("make")//子集合的名字
.field("make")//分類的欄位
)
)
.setSize(0)
.get();
Aggregation colors = response.getAggregations().get("colors");
}
- 返回結果
{
...
"aggregations": {
"colors": {
"buckets": [
{
"key": "red",
"doc_count": 4,
"make": { //子集合的名字
"buckets": [
{
"key": "honda",
"doc_count": 3
},
{
"key": "bmw",
"doc_count": 1
}
]
},
"avg_price": {
"value": 32500
}
},
...
}
實戰之—-在上面的結果基礎上,在增加一個指標,就是查詢出每個製造商生產的最貴和最便宜的車子的價格分別是多少
- http請求
GET /cars/transactions/_search
{
"size" : 0,
"aggs": {
"colors": {
"terms": {
"field": "color"
},
"aggs": {
"avg_price": { "avg": { "field": "price" }
},
"make" : {
"terms" : {
"field" : "make"
},
"aggs" : {
"min_price" : { //自定義變數名字
"min": { //引數-最小值
"field": "price"
}
},
"max_price" : {
"max": { //引數-最大值
"field": "price"
}
}
}
}
}
}
}
}
- java-api請求
@Test
public void subMertricsQuery(){
SearchResponse response = transportClient.prepareSearch("cars")
.setTypes("transactions")
.addAggregation(
AggregationBuilders.terms("colors")
.field("color")
.subAggregation(AggregationBuilders
.avg("avg_price")
.field("price")
)
.subAggregation(AggregationBuilders
.terms("make")
.field("make")
.subAggregation(AggregationBuilders
.max("max_price")
.field("price")
)
.subAggregation(AggregationBuilders
.min("min_price")
.field("price")
)
)
)
.setSize(0)
.get();
Aggregation colors = response.getAggregations().get("colors");
}
- 返回結果
{
...
"aggregations": {
"colors": {
"buckets": [
{
"key": "red",
"doc_count": 4,
"make": {
"buckets": [
{
"key": "honda",
"doc_count": 3,
"min_price": {
"value": 10000
},
"max_price": {
"value": 20000
}
},
{
"key": "bmw",
"doc_count": 1,
"min_price": {
"value": 80000
},
"max_price": {
"value": 80000
}
}
]
},
"avg_price": {
"value": 32500
}
},
...
--------------------- 本文來自 ydw_武漢 的CSDN 部落格 ,全文地址請點選:https://blog.csdn.net/ydwyyy/article/details/79487995?utm_source=copy