Hive視窗函式詳細介紹1
阿新 • • 發佈:2020-07-26
在hive中,視窗函式(又叫開窗函式)具有強大的功能,掌握好視窗函式,能夠幫助我們非常方便的解決很多問題。首先我們要了解什麼是視窗函式,簡單的說視窗函式是hive中一種可以按指定視窗大小計算的函式,例如,sum(),avg(),min(),max()等聚合函式,還有rank(),row_number() 可用作排序使用的視窗函式。下面一一對它們做介紹。
首先,要介紹一些在視窗函式中經常使用的函式或關鍵字,用來控制視窗函式中視窗的大小。
over():用來指定視窗函式的視窗大小,這個視窗可以隨著資料行的變化而發生變化。
current_row:當前行
n preceding :往前n行資料
n following:往後n行資料
unbounded:起點,unbounded preceding 表示從前面的起點開始,unbounded following 表示到資料行的終點。
如果不指定,預設從起始行到當前行。
1.sum(),avg(),min(),max()
資料準備
create table if not exists buy_info ( name string, buy_date string, buy_num int ) row format delimited fields terminatedby '|'
select * from buy_info; 資訊如下
liulei 2015-04-11 5
liulei 2015-04-12 7
liulei 2015-04-13 3
liulei 2015-04-14 2
liulei 2015-04-15 4
liulei 2015-04-16 4
select name ,buy_time ,num, sum(num) over(partition by name order by buy_date asc) as info1 ,--先按姓名分組,再按購買時間升序排序,最後求和,預設是從起始行到當前行sum(num) over(partition by name order by buy_date rows between unbounded preceding and unbounded and current_row) as info2, --從起點到當前行,和1結果一樣
sum(num) over(partition by name order by buy_date rows between 3 preceding and current_row) as info3,--從當前行往前數3行到當前行
sum(num) over(partition by name) as info4 , --分組內所有行
sum(num) over(partition by name order by buy_date rows between 1 preceding and 1 following) as info5,--當前行+往前一行+往後一行
sum(num) over(partition by name order by buy_date rows between 1 prededing and unbounded following) as info6 --從當前行往前數一行到最後一行
查詢結果如下
name | buy_date | buy_num | info1 | info2 | Info3 | info4 | info5 | info6 |
liulei | 2015-04-11 | 5 | 5 | 5 | 5 | 25 | 5 | 25 |
liulei | 2015-04-12 | 7 | 12 | 12 | 12 | 25 | 15 | 25 |
liulei | 2015-04-13 | 3 | 15 | 15 | 15 | 25 | 12 | 20 |
liulei | 2015-04-14 | 2 | 17 | 17 | 17 | 25 | 9 | 13 |
liulei | 2015-04-15 | 4 | 21 | 21 | 16 | 25 | 10 | 10 |
liulei | 2015-04-16 | 4 | 25 | 25 | 13 | 25 | 8 | 8 |