1. 程式人生 > >Hive視窗函式之累積值、平均值、首尾值的計算學習

Hive視窗函式之累積值、平均值、首尾值的計算學習

Hive視窗函式可以計算一定範圍內、一定值域內、或者一段時間內的累積和以及移動平均值等;可以結合聚集函式SUM() 、AVG()等使用;可以結合FIRST_VALUE() 和LAST_VALUE(),返回視窗的第一個和最後一個值。
- 如果只使用partition by子句,未指定order by的話,我們的聚合是分組內的聚合. 
- 使用了order by子句,未使用window子句的情況下,預設從起點到當前行.
window子句: 
- PRECEDING:往前 
- FOLLOWING:往後 
- CURRENT ROW:當前行 
- UNBOUNDED:起點,UNBOUNDED PRECEDING 表示從前面的起點, UNBOUNDED FOLLOWING:表示到後面的終點


1、計算累計和
統計1-12月的累積和,即1月為1月份的值,2月為1、2月份值的和,3月為123月份的和,12月為1-12月份值的和。
關鍵字解析:
SUM(SUM(amount)) 內部的SUM(amount)為需要累加的值;
ORDER BY month 按月份對查詢讀取的記錄進行排序,就是視窗範圍內的排序;
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW 定義起點和終點,UNBOUNDED PRECEDING 為起點,表明從第一行開始, CURRENT ROW為預設值,就是這一句等價於:
ROWS UNBOUNDED PRECEDING
PRECEDING:在前 N 行的意思。
FOLLOWING:在後 N 行的意思。


1.1、計算所有月份的累計和
select pt_month,sum(amount) pay_amount,sum(sum(amount))over(order by pt_month) cumulative_amount
from data_chushou_pay_info
where pt_month between '2017-01' and '2017-11' and state=0
group by pt_month;

select pt_month,sum(amount) pay_amount,sum(sum(amount))over(order by pt_month ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) cumulative_amount
from data_chushou_pay_info
where pt_month between '2017-01' and '2017-11' and state=0
group by pt_month;

1.2、計算前3個月和本月共4個月的累積和
select pt_month,sum(amount) pay_amount,sum(sum(amount))over(order by pt_month ROWS BETWEEN 3 PRECEDING AND CURRENT ROW) cumulative_amount
from data_chushou_pay_info
where pt_month between '2017-01' and '2017-11' and state=0
group by pt_month;

select pt_month,sum(amount) pay_amount,sum(sum(amount))over(order by pt_month ROWS 3 PRECEDING) cumulative_amount
from data_chushou_pay_info
where pt_month between '2017-01' and '2017-11' and state=0
group by pt_month;

1.3、計算前1月後1月和本月共3個月的累積和
select pt_month,sum(amount) pay_amount,sum(sum(amount))over(order by pt_month ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) cumulative_amount
from data_chushou_pay_info
where pt_month between '2017-01' and '2017-11' and state=0
group by pt_month;

2、計算平均值
2.1、計算前1月後1月和本月共3個月各月總值的平均值
select pt_month,sum(amount) pay_amount,avg(sum(amount))over(order by pt_month ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) average_amount
from data_chushou_pay_info
where pt_month between '2017-01' and '2017-11' and state=0
group by pt_month;

2.2、計算前3個月和本月共4個月各月總值的平均值
select pt_month,sum(amount) pay_amount,avg(sum(amount))over(order by pt_month ROWS BETWEEN 3 PRECEDING AND CURRENT ROW) cumulative_amount
from data_chushou_pay_info
where pt_month between '2017-01' and '2017-11' and state=0
group by pt_month;

3、計算窗體第一條和最後一條的值
select pt_month,sum(amount) pay_amount,first_value(sum(amount))over(order by pt_month ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) first_amount,last_value(sum(amount))over(order by pt_month ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) last_amount
from data_chushou_pay_info
where pt_month between '2017-01' and '2017-11' and state=0
group by pt_month;