Hive 視窗函式詳細介紹3 lag,lead,first_value,last_value
阿新 • • 發佈:2020-07-28
這篇文章繼續介紹4個視窗函式。
lag
lag(column_name,n,default):用於統計視窗內往上第n行的值,第一個引數為列名,第二個引數為往上第n行(可選,預設為1),第三個引數為預設值(當往上n行為null時,取預設值,若不指定,則為null)
lead
lead與lag想法,lead(column_name,n,default)用於統計視窗內向下取n行的值
first_value
first_value() 取分組排序後,截止到當前行的第一個值
last_value
last_value() 取分組排序後,截止到當前行的最後一個值
下面通過具體的例項介紹它們的用法
createtable if not exists buy_info ( name string, buy_date string, buy_num int ) row format delimited fields terminated by '|'; select * from buy_info;
name | buy_date | buy_num |
zhangsan | 2020-02-23 | 21 |
zhangsan | 2020-03-12 | 34 |
zhangsan | 2020-04-15 | 12 |
zhangsan | 2020-05-12 | 51 |
lisi | 2020-03-16 | 12 |
lisi | 2020-03-21 | 24 |
lisi | 2020-07-12 | 41 |
lisi | 2020-07-27 |
32 |
select name , buy_date,buy_num,
lag(buy_date,1,'1970-01-01') over(partition by name order by buy_date) as last_date,
lead(buy_date,1,'2020-12-31') over(partition by name order by buy_date) as next_date,
first_value () over(partition by name order by buy_date) as first_date,
last_value() over(partition by name order by buy_date) as last_date from buy_info;
查詢結果如下
name | buy_date | buy_num | last_date | next_date | first_date | last_date |
zhangsan | 2020-02-23 | 21 | 1970-01-01 | 2020-03-12 | 2020-02-23 | 2020-05-12 |
zhangsan | 2020-03-12 | 34 | 2020-02-23 | 2020-04-15 | 2020-02-23 | 2020-05-12 |
zhangsan | 2020-04-15 | 12 | 2020-03-12 | 2020-05-12 | 2020-02-23 | 2020-05-12 |
zhangsan | 2020-05-12 | 51 | 2020-04-15 | 2020-12-31 | 2020-02-23 | 2020-05-12 |
lisi | 2020-03-16 | 12 | 1970-01-01 | 2020-03-21 | 2020-03-16 | 2020-07-27 |
lisi | 2020-03-21 | 24 | 2020-03-16 | 2020-07-12 | 2020-03-16 | 2020-07-27 |
lisi | 2020-07-12 | 41 | 2020-03-21 | 2020-07-27 | 2020-03-16 | 2020-07-27 |
lisi | 2020-07-27 | 32 | 2020-07-12 | 2020-12-31 | 2020-03-16 | 2020-07-27 |