Elasticsearch 第六篇：聚合統計查詢

阿新 • • 發佈：2020-11-06

前面一直沒有記錄 Elasticsearch 的聚合查詢或者其它複雜的查詢。本篇做一下筆記，為了方便測試，索引資料依然是第五篇生成的測試索引庫 db_student_test ，別名是 student_test

第一部分基本聚合

1、最大值 max、最小值 min、平均值 avg 、總和 sum

場景：查詢語文、數學、英語這三科的最大值、最小值、平均值

POST  http://localhost:9200/student_test1/_search?size=0
{
    "aggs" : {
        "max_chinese" : { "max" : { "field" : "chinese" } },
        "min_chinese" : { "min" : { "field" : "chinese" } },
        "avg_chinese" : { "avg" : { "field" : "chinese" } },
        "max_math": { "max" : { "field" : "math" } },
        "min_math": { "min" : { "field" : "math" } },
        "avg_math": { "avg" : { "field" : "math" } },
        "max_english": { "max" : { "field" : "english" } },
        "min_english": { "min" : { "field" : "english" } },
        "avg_english": { "avg" : { "field" : "english" } }
    }
}

查詢結果是：

{
    "took": 0,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 10000,
            "relation": "gte"
        },
        "max_score": null,
        "hits": []
    },
    "aggregations": {
        "avg_english": {
            "value": 57.78366490546798
        },
        "max_chinese": {
            "value": 98
        },
        "min_chinese": {
            "value": 25
        },
        "min_math": {
            "value": 15
        },
        "max_english": {
            "value": 98
        },
        "avg_chinese": {
            "value": 59.353859695794505
        },
        "avg_math": {
            "value": 56.92907568735187
        },
        "min_english": {
            "value": 21
        },
        "max_math": {
            "value": 99
        }
    }
}

也可以來查詢語文科目分數總和，相當於 sql 的 sum 邏輯，雖然在這裡並沒有什麼意義：

POST  http://localhost:9200/student_test1/_search?size=0
{
    "aggs" : {
        "sum_chinese" : { "sum" : { "field" : "chinese" } }
    }
}

2、求個數，相當於 sql 的 count 邏輯

場景：查詢所有學生總數，這裡隨便 count 一個欄位就可以，例如數學這個欄位

POST  http://localhost:9200/student_test1/_search?size=0
{
  "aggs": {
    "age_count": {
      "value_count": {
        "field": "math"
      }
    }
  }
}

返回結果是：

{
    "took": 0,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 10000,
            "relation": "gte"
        },
        "max_score": null,
        "hits": []
    },
    "aggregations": {
        "age_count": {
            "value": 50084828
        }
    }
}

課間總數是：50084828 跟第五篇我們生成的資料總量一致

3、distinct 聚合，相當於 sql 的 count ( distinct )

場景：統計語文成績有多少種值

POST  http://localhost:9200/student_test1/_search?size=0
{
    "aggs" : {
        "type_count" : {
            "cardinality" : {
                "field" : "chinese"
            }
        }
    }
}

返回結果是：

{
    "took": 0,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 10000,
            "relation": "gte"
        },
        "max_score": null,
        "hits": []
    },
    "aggregations": {
        "type_count": {
            "value": 74
        }
    }
}

從結果上看，只有74個不同的分數，與第五篇隨機生成資料的規則匹配

4、統計聚合

場景：查詢語文成績總個數、最大值、最小值、平均值、總和等

POST  http://localhost:9200/student_test1/_search?size=0
{
  "aggs": {
    "chinese_stats": {
      "stats": {
        "field": "chinese"
      }
    }
  }
}

返回結果是：

{
    "took": 0,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 10000,
            "relation": "gte"
        },
        "max_score": null,
        "hits": []
    },
    "aggregations": {
        "chinese_stats": {
            "count": 50084828,
            "min": 25,
            "max": 98,
            "avg": 59.353859695794505,
            "sum": 2972727854
        }
    }
}

5、加強版統計聚合，查詢結果在上面的基礎上，加上方差等統計學上的資料

POST  http://localhost:9200/student_test1/_search?size=0 
{
  "aggs": {
    "chinese_stats": {
      "extended_stats": {
        "field": "chinese"
      }
    }
  }
}

6、分位聚合統計

預設的分位是 1% 5% 25% 50% 75% 95% 99% 《= 的概念

分位數的概念：25% 的分位數是 54，意思是小於等於 54 的樣本佔據了總樣本的 25% ，即是 54 這個數將最底層的1/4 的資料分割出來。

POST  http://localhost:9200/student_test1/_search?size=0 
{
  "aggs": {
    "chinese_percents": {
      "percentiles": {
        "field": "chinese"
      }
    }
  }
}

也可以自定義分位：

POST  http://localhost:9200/student_test1/_search?size=0 
{
  "aggs": {
    "chinese_percents": {
      "percentiles": {
        "field": "chinese",
        "percents" : [10,20,30,40,50,60,70,80,90] 
      }
    }
  }
}

7、範圍聚合統計

場景：分別查詢語文成績小於40分、小於50分、小於60分的比例

POST  http://localhost:9200/student_test1/_search?size=0 
{
  "aggs": {
    "gge_perc_rank": {
      "percentile_ranks": {
        "field": "chinese",
        "values": [40,50,60]
      }
    }
  }
}

以上是查詢成績小於40，小於50，小於60的佔比，得到的資料是： 21.29% 36.09% 51.12% 可以看到這是一個接近等差的數列，可見測試資料的隨機性還是很好的。

第二部分其它聚合方式

1、Term 聚合

場景：想知道學生的語文成績，在所有分數值上的個數

POST  http://localhost:9200/student_test1/_search?size=0
{
    "aggs" : {
        "genres" : {
            "terms" : { 
                "field" : "chinese"
            }
        }
    }
}

這個查詢會將欄位Chinese進行聚合，例如87分聚合成一個組，88分聚合成一個組，等等；

但是這裡預設是按組的大小排序，而且不會將所有的組都顯示出來，數量太小的組可能被忽略，查詢結果如下：

{
    "took": 1,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 10000,
            "relation": "gte"
        },
        "max_score": null,
        "hits": []
    },
    "aggregations": {
        "genres": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 42560269,
            "buckets": [
                {
                    "key": 61,
                    "doc_count": 752863
                },
                {
                    "key": 68,
                    "doc_count": 752835
                },
                {
                    "key": 55,
                    "doc_count": 752749
                },
                {
                    "key": 59,
                    "doc_count": 752444
                },
                {
                    "key": 76,
                    "doc_count": 752405
                },
                {
                    "key": 74,
                    "doc_count": 752309
                },
                {
                    "key": 56,
                    "doc_count": 752283
                },
                {
                    "key": 49,
                    "doc_count": 752273
                },
                {
                    "key": 52,
                    "doc_count": 752201
                },
                {
                    "key": 50,
                    "doc_count": 752197
                }
            ]
        }
    }
}

如果想要自定義篩選條件，Term聚合還可以按照以下設定來查詢：

post  http://localhost:9200/student_test1/_search?size=0
{
    "aggs" : {
        "genres" : {
            "terms" : { 
                "field" : "chinese",
                 "size" : 100,                     // 可能有100個不用的分數，我們將全部都展示出來
                 "order" : { "_count" : "asc" },   // 按照組數由小到大排序
                  "min_doc_count": 752200          //過濾條件：組數最小值是752200
            }
        }
    }
}

查詢結果是：

{
    "took": 0,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 10000,
            "relation": "gte"
        },
        "max_score": null,
        "hits": []
    },
    "aggregations": {
        "genres": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
                {
                    "key": 52,
                    "doc_count": 752201
                },
                {
                    "key": 49,
                    "doc_count": 752273
                },
                {
                    "key": 56,
                    "doc_count": 752283
                },
                {
                    "key": 74,
                    "doc_count": 752309
                },
                {
                    "key": 76,
                    "doc_count": 752405
                },
                {
                    "key": 59,
                    "doc_count": 752444
                },
                {
                    "key": 55,
                    "doc_count": 752749
                },
                {
                    "key": 68,
                    "doc_count": 752835
                },
                {
                    "key": 61,
                    "doc_count": 752863
                }
            ]
        }
    }
}

2、Filter 聚合

Filter 聚合會先進行條件過濾，在進行聚合

場景：查詢華南理工大學的學生的數學科目平均分（先篩選學校，再進行分數統計聚合）

{
    "aggs" : {
        "scut_math_avg" : {
            "filter" : { "term": { "school": "華南理工大學" } },
            "aggs" : {
                "avg_price" : { "avg" : { "field" : "math" } }
            }
        }
    }
}

查詢結果是：

{
    "took": 0,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 10000,
            "relation": "gte"
        },
        "max_score": null,
        "hits": []
    },
    "aggregations": {
        "scut_math_avg": {
            "doc_count": 1854993,
            "avg_price": {
                "value": 56.93080027795253
            }
        }
    }
}

3、Filters 多重聚合

場景：查詢各個學校，語文、數學、英語的平均分都是多少，可以採用多重聚合，速度可能有點慢，如下

POST  http://localhost:9200/student_test1/_search?size=0
{
  "aggs" : {
    "messages" : {
      "filters" : {
        "filters" : {
          "school_1" :   { "term" : { "school" : "華南理工大學" }},
          "school_2" : { "term" : { "school" : "中山大學" }},
          "school_3" : { "match" : { "school" : "暨南大學" }}
        }
      },
      "aggs" : {
           "avg_chinese" : { "avg" : { "field" : "chinese" } },
           "avg_math" : { "avg" : { "field" : "math" } }
      }
    }
  }
}

於是得到結果：

{
    "took": 0,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 10000,
            "relation": "gte"
        },
        "max_score": null,
        "hits": []
    },
    "aggregations": {
        "messages": {
            "buckets": {
                "school_1": {
                    "doc_count": 1854993,
                    "avg_chinese": {
                        "value": 59.353236912484306
                    },
                    "avg_math": {
                        "value": 56.93080027795253
                    }
                },
                "school_2": {
                    "doc_count": 1855016,
                    "avg_chinese": {
                        "value": 59.349129064115886
                    },
                    "avg_math": {
                        "value": 56.93540918245449
                    }
                },
                "school_3": {
                    "doc_count": 44519876,
                    "avg_chinese": {
                        "value": 59.35397212247402
                    },
                    "avg_math": {
                        "value": 56.92948502372289
                    }
                }
            }
        }
    }
}

4、Range 範圍聚合

場景：想要查詢語文成績各個分數段的人數，可以這樣查詢

POST  http://localhost:9200/student_test1/_search?size=0
{
    "aggs" : {
        "chinese_ranges" : {
            "range" : {
                "field" : "chinese",
                "ranges" : [
                    { "to" : 60 },
                    { "from" : 60, "to" : 75 },
                    { "from" : 75, "to" : 85 },
                    { "from" : 85 }
                ]
            }
        }
    }
}

查詢結果是：

{
    "took": 0,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 10000,
            "relation": "gte"
        },
        "max_score": null,
        "hits": []
    },
    "aggregations": {
        "chinese_ranges": {
            "buckets": [
                {
                    "key": "*-60.0",
                    "to": 60,
                    "doc_count": 25096839
                },
                {
                    "key": "60.0-75.0",
                    "from": 60,
                    "to": 75,
                    "doc_count": 11278543
                },
                {
                    "key": "75.0-85.0",
                    "from": 75,
                    "to": 85,
                    "doc_count": 7424634
                },
                {
                    "key": "85.0-*",
                    "from": 85,
                    "doc_count": 6284812
                }
            ]
        }
    }
}

這個返回結果的組名分別是 *-60.0 60.0-75.0 75.0-85.0 85.0-*
如果我們不想要這樣的組名，可以自定義組名，例如：

POST  http://localhost:9200/student_test1/_search?size=0
{
    "aggs" : {
        "chinese_ranges" : {
            "range" : {
                "field" : "chinese",
                "keyed" : true,
                "ranges" : [
                    { "key" : "不及格", "to" : 60 },
                    { "key" : "及格", "from" : 60, "to" : 75 },
                    { "key" : "良好", "from" : 75, "to" : 85 },
                    { "key" : "優秀", "from" : 85 }
                ]
            }
        }
    }
}

查詢結果將會是：

{
    "took": 1675,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 10000,
            "relation": "gte"
        },
        "max_score": null,
        "hits": []
    },
    "aggregations": {
        "chinese_ranges": {
            "buckets": {
                "不及格": {
                    "to": 60,
                    "doc_count": 25096839
                },
                "及格": {
                    "from": 60,
                    "to": 75,
                    "doc_count": 11278543
                },
                "良好": {
                    "from": 75,
                    "to": 85,
                    "doc_count": 7424634
                },
                "優秀": {
                    "from": 85,
                    "doc_count": 6284812
                }
            }
        }
    }
}

還有其它各種各樣的、複雜的聚合查詢，都是可以網上查資料，甚至還支援推薦系統的一些計算方法，例如矩陣的概念等等。

還可以參考 https://blog.csdn.net/alex_xfboy/article/details/8610

Elasticsearch 第六篇：聚合統計查詢

前面一直沒有記錄 Elasticsearch 的聚合查詢或者其它複雜的查詢。本篇做一下筆記，為了方便測試，索引資料依然是第五篇生成的測試索引庫 db_student_test ，別名是 student_test 第一部分基本聚合 1、最大值 max、最小

第六篇：SpringCloud之斷路器聚合監控(Hystrix Turbine)

上一篇文章講述瞭如何利用Hystrix Dashboard去監控斷路器的Hystrix command。在複雜的分散式系統中，相同服務的節點經常需要部署上百甚至上千個，很多時候，運維人員希望能夠把相同服務的節點狀態以一個整體叢集的形式展現出來，這樣可以更好的把握整個系統的狀態。為此，N

DAX 第六篇：統計函式（描述性統計）

統計函式用於建立聚合，對資料進行統計分析。在使用統計函式時，必須考慮到資料模型，表之間關係，資料重複等因素，一般都會搭配過濾函式實現資料的提取和分析。統計量一般是：均值、求和、計數、最大值、最小值、求中位數、獲得分位數等。一，求均值均值分為幾何均值和算術均值，幾何平均數是n個變數值連乘積的n次方根

正則表示式第六篇：呼叫CLR函式執行正則查詢

正則表示式在文字查詢方面，不管是速度還是功能，都十分強大。雖然SQL Server資料庫可以執行模糊查詢（像like子句）和全文查詢（Fulltext search），但是這兩個子句只能查詢簡單的模式，無法應對複雜的查詢需求。這是因為SQL Server沒有執行正則表示式的內建函式，無法直接執行正則查詢。我們

Django之路第六篇：完善博客

也會過程 object 通過打包但是項目目錄添加實現博客頁面設計博客頁面概要博客主頁面博客文章內容頁面博客撰寫頁面博客主頁面主頁面內容文章標題列表，超鏈接發表博客按鈕（超鏈接）列表編寫思路取出數據庫中所有文章

第六篇：配置Docker容器加速器

systemctl 國內 mon doc 一個 com shadow text 命令背景說明鑒於國內網絡穩定問題，到國外站點拉取docker鏡像十分緩慢，故需要配置國內鏡像以便提高鏡像下載速度。 1.使用這個url地址https://account.daocloud.i

第六篇：Jmeter Ftp服務器的連接

alt file 服務器添加 ima nbsp 線程 mage 分享圖片如上圖：創建一個---線程組----點擊配置元件---添加FTP請求缺省值； IP：為你的FTP服務的IP remote file：為你FTP服務上的一個文件； Localfile

ES6之路第六篇：數組的擴展

最好布爾 return lte 效果不一致 List 集合 define index 擴展運算符擴展運算符（spread）是三個點（...）。它好比 rest 參數的逆運算，將一個數組轉為用逗號分隔的參數序列。 1 console.log(...[1, 2, 3])

第六篇：匯編基礎指令講解

錯誤 info 其他不能 mov指令位或簡單的圖解 strong 目錄基礎匯編代碼 LDR(load) STR(store) B MOV(move) LDR（註意跟讀內存的LDR不一樣，格式不同） add（加） sub（減） BL(branch and Link

史上最簡單的SpringCloud教程｜第六篇：分布式配置中心(Spring Cloud Config)(Finchley版本)

prope shu 由於 ext master strip div 文件配置 rap 在上一篇文章講述zuul的時候，已經提到過，使用配置服務來保存各個服務的配置文件。它就是Spring Cloud Config。在分布式系統中，由於服務數量巨多，為了方便服務配置文件統

史上最簡單的SpringCloud教程｜第六篇：分布式配置中心(Spring Cloud Config)

tex down 代碼多少 erl ogr 管理變量實時最新Finchley版本：https://www.fangzhipeng.com/springcloud/2018/08/30/sc-f6-config/或者http://blog.csdn.net/fore

第六篇：面向對象

www ref tsl 榮耀 blog url list python font 第六篇：面向對象 PYTHON-面向對象類綁定方法 PYTHON-面向對象繼承派生 PYTHON-面向對象-練習-王者榮耀對砍遊戲第六篇：面向對象

第六篇：fastJson常用方法總結

1、瞭解json json就是一串字串只不過元素會使用特定的符號標註。 {} 雙括號表示物件 [] 中括號表示陣列 "" 雙引號內是屬性或值 : 冒號表示後者是前者的值（這個值可以是字串、數字、也可以是另一個數組或物件）

Python金融系列第六篇：現代投資組合理論

作者：chen_h 微訊號 & QQ：862251340 微信公眾號：coderpai 第一篇：計算股票回報率，均值和方差第二篇：簡單線性迴歸第三篇：隨機變數和分佈第四篇：置信區間和假設檢驗第五篇：多元線性迴歸和殘差分析第六篇：現代投資組合

史上最簡單的SpringCloud教程｜第六篇：分散式配置中心(Spring Cloud Config)

最新Finchley版本： https://www.fangzhipeng.com/springcloud/2018/08/30/sc-f6-config/ 或者 http://blog.csdn.net/forezp/article/details/81041028

史上最簡單的SpringCloud教程｜第六篇：分散式配置中心(Spring Cloud Config)(Finchley版本)

在上一篇文章講述zuul的時候，已經提到過，使用配置服務來儲存各個服務的配置檔案。它就是Spring Cloud Config。在分散式系統中，由於服務數量巨多，為了方便服務配置檔案統一管理，實時更新，所以需要分散式配置中心元件。在Spring Cloud中，有分散式配置中心元件spri

第六篇：基本資料型別及用法（3）

集合set 1.集合由不同無序的元素組成，集合中只能存放不可變型別（數字，字串，元祖），例如：s={123,"abc",(1,2,"a")} -重複元素會被去除，所以可用集合去重，例:不考慮順序，去除列表li中重複元素 1 li=["alex",123,"he

第六篇：微信素材管理工具類

1、前言微信公眾號在使用的介面的時候是通過media_id來進行的，所以在使用的介面的時候我們往往需要先上傳素材，素材管理分為臨時素材和永久素材 1)、臨時素材媒體檔案在微信後臺儲存時間為3天，即3天后media_id失效，詳細文件檢視：上傳臨時素材 2)、開發者可通過本介面上傳到

從零開始學產品第六篇：更強大的測試，自動化測試和效能測試

本篇為【從零開始學產品】系列課第1章第5節歡迎到公眾號選單欄，獲取產品經理課程更多資料 “測試就是拿點滑鼠在電腦上瞎點，或者是用手機隨便戳幾下麼？” “不，是有計劃有意圖的測試，比如說，邊界測試，隨機測試，端到端測試等等。

Docker $ 第六篇：Docker 網路功能

一.Docker網路功能簡介 1.Docker允許通過外部訪問容器，容器需要開放一個埠對映到容器外部的埠，外部可通過這個埠訪問到聯網的容器。二.外部訪問容器 1.執行容器開放埠 # docker run --rm -d -p 80:80 --name webserver

Elasticsearch 第六篇：聚合統計查詢

第一部分 基本聚合

第二部分 其它聚合方式

相關推薦

第一部分基本聚合

第二部分其它聚合方式