1. 程式人生 > >elasticsearch-dsl聚合-2

elasticsearch-dsl聚合-2

接續上篇,本篇介紹elasticsearch聚合查詢,使用python庫elasticsearch-dsl進行聚合查詢操作。

條形圖

聚合有一個令人激動的特性就是能夠十分容易地將資料轉換成圖表和圖形。

    • 建立直方圖需要指定一個區間,如果我們要為售價建立一個直方圖,可以將間隔設為 20,000。這樣做將會在每個 $20,000 檔建立一個新桶,然後文件會被分到對應的桶中。
       1 GET cars/transactions/_search
       2 {
       3   "size": 0,
       4   "aggs": {
       5     "price": {
       6       "histogram
      ": { 7 "field": "price", 8 "interval": 20000 9 }, 10 "aggs": { 11 "revenue": { 12 "sum": { 13 "field": "price" 14 } 15 } 16 } 17 } 18 } 19 }
      1 s = Search(index='cars')
      2 s.aggs.bucket("price", "histogram
      ", field="price", interval=20000).metric("revenue", "sum", field="price") 3 response = s.execute()

      圖形化表示

    • 更強大的統計
       1 GET /cars/transactions/_search
       2 {
       3   "size" : 0,
       4   "aggs": {
       5     "makes": {
       6       "terms": {
       7         "field": "make",
       8         "size": 10
       9       },
      10       "
      aggs": { 11 "stats": { 12 "extended_stats": { 13 "field": "price" 14 } 15 } 16 } 17 } 18 } 19 }
      1 s = Search(index='cars')
      2 s.aggs.bucket("makes", "terms", field="make", size=10).metric("stats", "extended_stats", field="price")
      3 response = s.execute()
    • 按時間統計(date_histogram),每月銷售了多少臺汽車?
       1 GET cars/transactions/_search
       2 {
       3   "size": 0,
       4   "aggs": {
       5     "sales": {
       6       "date_histogram": {
       7         "field": "sold",
       8         "interval": "month",
       9         "format": "yyyy-MM-dd",
      10         "extended_bounds": {
      11           "min": "2014-01-01",
      12           "max": "2014-12-31"
      13         }
      14       }
      15     }
      16   }
      17 }
      1 s = Search(index='cars')
      2 s.aggs.bucket("sales", "date_histogram", field="sold", interval="month",
      3                   format="yyyy-MM-dd", extended_bounds={"min": "2014-01-01", "max": "2014-12-31"})
      4 response = s.execute()
    • 計算每個季度所有汽車品牌的銷售總額以及每種汽車品牌的銷售總額
       1 GET cars/transactions/_search
       2 {
       3   "size": 0,
       4   "aggs": {
       5     "sales": {
       6       "date_histogram": {
       7         "field": "sold",
       8         "interval": "quarter",
       9         "format": "yyyy-MM-dd",
      10         "extended_bounds": {
      11           "min": "2014-01-01",
      12           "max": "2014-12-31"
      13         }
      14       },
      15       "aggs": {
      16         "per_make_sum": {
      17           "terms": {
      18             "field": "make"
      19           },
      20           "aggs": {
      21             "sum_price": {
      22               "sum": {
      23                 "field": "price"
      24               }
      25             }
      26           }
      27         },
      28         "total_sum": {
      29           "sum": {
      30             "field": "price"
      31           }
      32         }
      33       }
      34     }
      35   }
      36 }
      1 s = Search(index='cars')
      2 a1 = A("date_histogram", field="sold", interval="quarter", format="yyyy-MM-dd",
      3            extended_bounds={"min": "2014-01-01", "max": "2014-12-31"})
      4 a2 = A("terms", field="make")
      5 s.aggs.bucket("sales", a1).bucket("per_make_sum", a2).metric("sum_price", "sum", field="price")
      6 s.aggs["sales"].metric("total_sum", "sum", field="price")
      7 response = s.execute()
    • 限定範圍的聚合,福特在售車有多少種顏色?
       1 GET cars/transactions/_search
       2 {
       3   "query": {
       4     "match": {
       5       "make": "ford"
       6     }
       7   },
       8   "aggs": {
       9     "colors": {
      10       "terms": {
      11         "field": "make"
      12       }
      13     }
      14   }
      15 }
      1 s = Search(index="cars").query("match", make="ford")
      2 s.aggs.bucket("colors", "terms", field="make")
      3 response = s.execute()
    • 全域性桶(全域性桶包含所有的文件,它無視查詢的範圍),比方說我們想知道福特汽車與所有汽車平均售價的比較
       1 GET cars/transactions/_search
       2 {
       3   "query": {
       4     "match": {
       5       "make": "ford"
       6     }
       7   },
       8   "aggs": {
       9     "single_avg_price": {
      10       "avg": {
      11         "field": "price"
      12       }
      13     },
      14     "all": {
      15       "global": {},         --global忽略過濾條件
      16       "aggs": {
      17         "avg_price": {
      18           "avg": {
      19             "field": "price"
      20           }
      21         }
      22       }
      23     }
      24   }
      25 }
      1 s = Search(index="cars").query("match", make="ford")
      2 s.aggs.metric("single_avg_price", "avg", field="price")
      3 s.aggs.bucket("all", "global").metric("avg_price", "avg", field="price")
      4 response = s.execute()
    • 過濾,找到售價在 $10,000 美元之上的所有汽車同時也為這些車計算平均售價
       1 GET cars/transactions/_search
       2 {
       3   "query": {
       4     "constant_score": {
       5       "filter": {
       6         "range": {
       7           "price": {
       8             "gte": 10000
       9           }
      10         }
      11       }
      12     }
      13   },
      14   "aggs": {
      15     "single_avg_price": {
      16       "avg": {
      17         "field": "price"
      18       }
      19     }
      20   }
      21 }
      1 s = Search(index="cars").query("range", price={"gte": 10000})
      2 s.aggs.metric("single_avg_price", "avg", field="price")
      3 response = s.execute()
    • 過濾桶(一種特殊桶),搜尋福特汽車在2014年上半年銷售汽車的均價
       1 GET /cars/transactions/_search
       2 {
       3    "size" : 0,
       4    "query":{
       5       "match": {
       6          "make": "ford"
       7       }
       8    },
       9    "aggs":{
      10       "recent_sales": {
      11          "filter": { 
      12             "range": {
      13                "sold": {
      14                   "from": "2014-01-01",
      15                   "to": "2014-06-30"
      16                }
      17             }
      18          },
      19          "aggs": {
      20             "average_price":{
      21                "avg": {
      22                   "field": "price" 
      23                }
      24             }
      25          }
      26       }
      27    }
      28 }
      1 s = Search(index="cars").query("match", make="ford")
      2 q = Q("range", sold={"from": "2014-01-01", "to": "2014-06-30"})
      3 s.aggs.bucket("recent_sales", "filter", q).metric("average_price", "avg", field="price")
      4 response = s.execute()
    • 後過濾器(post_filter),只過濾搜尋結果,不過濾聚合結果,對聚合沒有影響
       1 GET cars/transactions/_search
       2 {
       3   
       4   "query": {
       5     "match": {
       6       "make": "ford"
       7     }
       8   },
       9   "post_filter": {
      10     "term": {
      11       "color": "green"
      12     }
      13   },
      14   "aggs": {
      15     "all_colors": {
      16       "terms": {
      17         "field": "color"
      18       }
      19     }
      20   }
      21 }
      1 s = Search(index="cars").query("match", make="ford").post_filter("term", color="green")
      2 s.aggs.bucket("all_colors", "terms", field="color")
      3 response = s.execute()

內建排序

  • _count:按文件數排序。對 terms 、 histogram 、 date_histogram 有效
  • _term:按詞項的字串值的字母順序排序。只在 terms 內使用
  • _key:按每個桶的鍵值數值排序(理論上與 _term 類似)。 只在 histogram 和 date_histogram 內使用
    • 讓我們做一個 terms 聚合但是按 doc_count 值的升序排序
       1 GET cars/transactions/_search
       2 {
       3   "size": 0,
       4   "aggs": {
       5     "colors": {
       6       "terms": {
       7         "field": "color",
       8         "order": {
       9           "_count": "asc"
      10         }
      11       }
      12     }
      13   }
      14 }
      1 s = Search(index="cars")
      2 s.aggs.bucket("colors", "terms", field="color", order={"_count": "asc"})
      3 response = s.execute()
    • 按度量排序,按照汽車顏色分類,再按照汽車平均售價升序排列
       1 GET cars/transactions/_search
       2 {
       3   "size": 0,
       4   "aggs": {
       5     "colors": {
       6       "terms": {
       7         "field": "color",
       8         "order": {
       9           "avg_price": "asc"
      10         }
      11       },
      12       "aggs": {
      13         "avg_price": {
      14           "avg": {
      15             "field": "price"
      16           }
      17         }
      18       }
      19     }
      20   }
      21 }
      1 s = Search(index="cars")
      2 s.aggs.bucket("colors", "terms", field="color", order={"avg_price": "asc"}).metric("avg_price", "avg", field="price")
      3 response = s.execute()
    • 基於“深度”度量排序

我們可以定義更深的路徑,將度量用尖括號( > )巢狀起來,像這樣: my_bucket>another_bucket>metric

需要提醒的是巢狀路徑上的每個桶都必須是 單值 的。 filter 桶生成 一個單值桶:所有與過濾條件匹配的文件都在桶中。 多值桶(如:terms )動態生成許多桶,無法通過指定一個確定路徑來識別。

目前,只有三個單值桶: filter global reverse_nested

    • 讓我們快速用示例說明,建立一個汽車售價的直方圖,但是按照紅色和綠色(不包括藍色)車各自的方差來排序
       1 GET /cars/transactions/_search
       2 {
       3     "size" : 0,
       4     "aggs" : {
       5         "colors" : {
       6             "histogram" : {
       7               "field" : "price",
       8               "interval": 20000,
       9               "order": {
      10                 "red_green_cars>stats.variance" : "asc" 
      11               }
      12             },
      13             "aggs": {
      14                 "red_green_cars": {
      15                     "filter": { "terms": {"color": ["red", "green"]}}, 
      16                     "aggs": {
      17                         "stats": {"extended_stats": {"field" : "price"}} 
      18                     }
      19                 }
      20             }
      21         }
      22     }
      23 }
      1 s = Search(index="cars")
      2 a = A("histogram", field="price", interval=20000, order={"red_green_cars>stats.variance": "asc"})
      3 q = A("filter", filter={"terms": {"color": ["red", "green"]}})
      4 s.aggs.bucket("colors", a).bucket("red_green_cars", q).metric("stats", "extended_stats", field="price")
      5 response = s.execute()