elasticsearch學習筆記--聚合函式篇

阿新 • • 發佈：2018-12-31

Elasticsearch 有一個功能叫聚合（aggregations），允許我們基於資料生成一些精細的分析結果。聚合與 SQL 中的
GROUP BY 類似但更強大。

首先看一下我當前megacorp索引下employeetype中的資料,執行如下語句:
語句1:

GET /megacorp/employee/_search
{
  "query": {
    "match_all": {}
  }
}

結果:

{
  "took": 0,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful 
": 5,
    "failed": 0
  },
  "hits": {
    "total": 3,
    "max_score": 1,
    "hits": [
      {
        "_index": "megacorp",
        "_type": "employee",
        "_id": "2",
        "_score": 1,
        "_source": {
          "first_name": "Jane",
          "last_name": "Smith",
          "age": 32 
,
          "about": "I like to collect rock albums",
          "interests": [
            "music"
          ]
        }
      },
      {
        "_index": "megacorp",
        "_type": "employee",
        "_id": "1",
        "_score": 1,
        "_source": {
          "first_name": "John",
          "last_name 
": "Smith",
          "age": 25,
          "about": "I love to go rock climbing",
          "interests": [
            "sports",
            "music"
          ]
        }
      },
      {
        "_index": "megacorp",
        "_type": "employee",
        "_id": "3",
        "_score": 1,
        "_source": {
          "first_name": "Douglas",
          "last_name": "Fir",
          "age": 35,
          "about": "I like to build cabinets",
          "interests": [
            "forestry"
          ]
        }
      }
    ]
  }
}

正文：
舉個例子，基於上述資料探勘出僱員中最受歡迎的興趣愛好：
語句2：

GET /megacorp/employee/_search
{
  "aggs": {
    "all_interests": {
      "terms": { "field": "interests" }
    }
  }
}

查詢結果如下:

{
   ...
   "hits": { ... },
   "aggregations": {
      "all_interests": {
         "buckets": [
            {
               "key":       "music",
               "doc_count": 2
            },
            {
               "key":       "forestry",
               "doc_count": 1
            },
            {
               "key":       "sports",
               "doc_count": 1
            }
         ]
      }
   }
}

結論:統計所有實體的interests的具體專案和每個專案的個數。
需要說明的是在執行語句2之前需要先執行一段語句(至於why?可以參考我的另一篇博文)：

PUT megacorp/_mapping/employee/
{
  "properties": {
    "interests": { 
      "type":     "text",
      "fielddata": true
    }
  }
}

該語句的目的是使得megacorp索引下employee 型別中的interests欄位可以使用聚合函式聚合(**all_**interests)，同理其他欄位在使用聚合函式時也必須執行如上語句，比如對last_name想使用聚合函式，就必須執行如下語句:

PUT megacorp/_mapping/employee/
{
  "properties": {
    "last_name": { 
      "type":     "text",
      "fielddata": true
    }
  }
}

聚合函式有很多種，比如還有avg_interests。
另外如果想知道姓為Smith 的僱員中最受歡迎的興趣愛好，可以直接新增適當的查詢來組合查詢：

GET /megacorp/employee/_search
{
  "query": {
    "match": {
      "last_name": "smith"
    }
  },
  "aggs": {
    "all_interests": {
      "terms": {
        "field": "interests"
      }
    }
  }
}

結果:

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 0.2876821,
    "hits": [
      {
        "_index": "megacorp",
        "_type": "employee",
        "_id": "2",
        "_score": 0.2876821,
        "_source": {
          "first_name": "Jane",
          "last_name": "Smith",
          "age": 32,
          "about": "I like to collect rock albums",
          "interests": [
            "music"
          ]
        }
      },
      {
        "_index": "megacorp",
        "_type": "employee",
        "_id": "1",
        "_score": 0.2876821,
        "_source": {
          "first_name": "John",
          "last_name": "Smith",
          "age": 25,
          "about": "I love to go rock climbing",
          "interests": [
            "sports",
            "music"
          ]
        }
      }
    ]
  },
  "aggregations": {
    "all_interests": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "music",
          "doc_count": 2
        },
        {
          "key": "sports",
          "doc_count": 1
        }
      ]
    }
  }
}

聚合還支援分級彙總。比如，查詢特定興趣愛好員工的平均年齡：

GET /megacorp/employee/_search
{
    "aggs" : {
        "all_interests" : {
            "terms" : { "field" : "interests" },
            "aggs" : {
                "avg_age" : {
                    "avg" : { "field" : "age" }
                }
            }
        }
    }
}

結果:

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 3,
    "max_score": 1,
    "hits": [
      {
        "_index": "megacorp",
        "_type": "employee",
        "_id": "2",
        "_score": 1,
        "_source": {
          "first_name": "Jane",
          "last_name": "Smith",
          "age": 32,
          "about": "I like to collect rock albums",
          "interests": [
            "music"
          ]
        }
      },
      {
        "_index": "megacorp",
        "_type": "employee",
        "_id": "1",
        "_score": 1,
        "_source": {
          "first_name": "John",
          "last_name": "Smith",
          "age": 25,
          "about": "I love to go rock climbing",
          "interests": [
            "sports",
            "music"
          ]
        }
      },
      {
        "_index": "megacorp",
        "_type": "employee",
        "_id": "3",
        "_score": 1,
        "_source": {
          "first_name": "Douglas",
          "last_name": "Fir",
          "age": 35,
          "about": "I like to build cabinets",
          "interests": [
            "forestry"
          ]
        }
      }
    ]
  },
  "aggregations": {
    "all_interests": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "music",
          "doc_count": 2,
          "avg_age": {
            "value": 28.5
          }
        },
        {
          "key": "forestry",
          "doc_count": 1,
          "avg_age": {
            "value": 35
          }
        },
        {
          "key": "sports",
          "doc_count": 1,
          "avg_age": {
            "value": 25
          }
        }
      ]
    }
  }
}

上面的語句的意思是統計具體的每種興趣愛好喜歡的人數以及這些人的平均年齡。

elasticsearch學習筆記--聚合函式篇

Elasticsearch 有一個功能叫聚合（aggregations），允許我們基於資料生成一些精細的分析結果。聚合與 SQL 中的 GROUP BY 類似但更強大。首先看一下我當前megacorp索引下employeetype中的資料,執行

ElasticSearch學習筆記之二十二指標聚合續

ElasticSearch學習筆記之二十二指標聚合續 Max Aggregation Min Aggregation Percentiles Aggregation Stats Aggregation Sum Aggregation Va

ElasticSearch學習筆記之二十一指標聚合

ElasticSearch學習筆記之二十一指標聚合指標聚合 Avg Aggregation Script Value Script Missing value Weighted Avg Agg

ElasticSearch學習筆記之二十五索引詞聚合

ElasticSearch學習筆記之二十五索引詞聚合 Terms Aggregation(索引詞聚合) Size Document counts are approximate（文件計數是近似值） Shard Size Calc

ElasticSearch學習筆記之二十四桶聚合續

ElasticSearch學習筆記之二十四桶聚合續 Date Range Aggregation(時間範圍聚合) Missing Values Date Format/Pattern Time zone in date range ag

ElasticSearch學習筆記之二十三桶聚合

ElasticSearch學習筆記之二十三桶聚合桶聚合 Children Aggregation（子聚合） Range Aggregation（範圍聚合） Keyed Response

Elasticsearch學習筆記(三) 聚合查詢

聚合查詢 1. 準備資料 2. group by(舉例: 按照性別分組) 2.1 SQL描述 2.2 DSL描述 2.3 返回結果 3. avg(舉例: 求平均年齡) 3.1 SQL描述 3.2

Python學習筆記(matplotlib實戰篇)--函式積分圖

Python學習筆記--極座標所用的庫及環境: 　　IDE:Pycharm 　　Python環境：python3.7 　　Matplotlib: Matplotlib 1.11 　　Numpy： Numpy1.15 函式積分圖程式碼及效果圖 1 import matplotlib.pypl

ElasticSearch學習筆記之三十三 IK分詞器擴充套件字典及text全文型別資料分詞聚合查詢

ElasticSearch學習筆記之三十三 IK分詞器擴充套件字典及text全文型別資料分詞聚合查詢專屬詞彙分詞失敗擴充套件字典檢視當前詞庫自定義詞典更新配置再次檢視分詞 text全文型別資料分詞聚合

C#學習筆記第四篇之Equals，GetHashCode ,ToString函式深度剖析(二)

C#學習筆記第四篇之Equals，GetHashCode ,ToString函式深度剖析(二）上一篇詳細搞了Equals，這一篇重點說下GetHashCode函式有什麼用，怎麼用，在哪裡用，用的時候注意什麼。然後簡要說下ToString的意義所在，因為比較簡單

Redis學習筆記1--入門篇

ase list ica cati ctu apple string replace first 一、Redis簡介： Redis（http://redis.io）是一款開源的、高性能的鍵-值存儲（key-value store），它是用ANSI C來編寫。Redis的項目

【php學習筆記】ticks篇

water parse htm 發現 clas strong 使用而且 break 1. 什麽是ticks 我們來看一下手冊上面對ticks的解釋： A tick is an event that occurs for every N low-leve

ElasticSearch學習筆記--安裝

upd 建議 node .html 添加 logs cat sea 版本 1、安裝ElasticSearch https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-index_.html

Vue學習筆記進階篇——多元素及多組件過渡

之前 bsp lan ssa 當前好的 can cancel 簽名本文為轉載，原文：Vue學習筆記進階篇——多元素及多組件過渡多元素的過渡對於原生標簽可以使用 v-if/v-else.但是有一點需要註意：當有相同標簽名的元素切換時，需要通過 key 特性設置唯一

Vue學習筆記進階篇——Render函數

resp targe 無效數據 iso 簡潔如果 som cimage 本文為轉載，原文：Vue學習筆記進階篇——Render函數基礎 Vue 推薦在絕大多數情況下使用 template 來創建你的 HTML。然而在一些場景中，你真的需要 JavaScript 的完全

Vue學習筆記進階篇——過渡狀態

節點 val start 學習筆記 update 設置 hub reat res 本文為轉載，原文：Vue學習筆記進階篇——過渡狀態Vue 的過渡系統提供了非常多簡單的方法設置進入、離開和列表的動效。那麽對於數據元素本身的動效呢，比如：數字和運算顏色的顯示 SVG 節

Vue學習筆記進階篇——列表過渡及其他

absolut compute top sla 做的有一個 .cn -s cas 本文為轉載，原文：Vue學習筆記進階篇——列表過渡及其他本文將介紹Vue中的列表過渡，動態過渡，以及可復用過渡是實現。列表過渡目前為止，關於過渡我們已經講到：單個節點同一時間渲染

JavaWeb學習筆記總結目錄篇

resp comment let .html ref 開發工具總結 targe art JavaWeb學習筆記總結目錄篇 JavaWeb學習筆記一： XML解析 JavaWeb學習筆記二 Http協議和Tomcat服務器 JavaWeb學習筆記三 Servlet

elasticsearch學習筆記--原理介紹

restfu 及其觸發 com 相似之處 mpi fsync rip 開源代碼前言：上一篇中我們對ES有了一個比較大概的概念，知道它是什麽，幹什麽用的，今天給大家主要講一下他的工作原理介紹：ElasticSearch是一個基於Lucene的搜索服務器。它提供了一個分布

R學習筆記第四篇：函數，分支和循環

匿名操作數 play 控制 als layers null 操作 str 變量用於臨時存儲數據，而函數用於操作數據，實現代碼的重復使用。在R中，函數只是另一種數據類型的變量，可以被分配，操作，甚至把函數作為參數傳遞給其他函數。分支控制和循環控制，和通用編程語言的風格很相似

elasticsearch學習筆記--聚合函式篇

相關推薦