1. 程式人生 > 實用技巧 >Hive 視窗函式詳細介紹3 lag,lead,first_value,last_value

Hive 視窗函式詳細介紹3 lag,lead,first_value,last_value

這篇文章繼續介紹4個視窗函式。

lag

lag(column_name,n,default):用於統計視窗內往上第n行的值,第一個引數為列名,第二個引數為往上第n行(可選,預設為1),第三個引數為預設值(當往上n行為null時,取預設值,若不指定,則為null)

lead

lead與lag想法,lead(column_name,n,default)用於統計視窗內向下取n行的值

first_value

first_value() 取分組排序後,截止到當前行的第一個值

last_value

last_value() 取分組排序後,截止到當前行的最後一個值

下面通過具體的例項介紹它們的用法

create
table if not exists buy_info ( name string, buy_date string, buy_num int ) row format delimited fields terminated by '|'; select * from buy_info;
name buy_date buy_num
zhangsan 2020-02-23 21
zhangsan 2020-03-12 34
zhangsan 2020-04-15 12
zhangsan 2020-05-12 51
lisi 2020-03-16 12
lisi 2020-03-21 24
lisi 2020-07-12 41
lisi 2020-07-27

32


select   name , buy_date,buy_num,
lag(buy_date,1,'1970-01-01')  over(partition  by  name  order  by  buy_date)   as   last_date,
lead(buy_date,1,'2020-12-31')   over(partition  by  name  order  by buy_date)  as  next_date,
first_value ()  over(partition  by  name  order  by   buy_date)  as   first_date,
last_value()  over(partition  by   name   order by  buy_date)  as last_date    from   buy_info; 

查詢結果如下

name buy_date buy_num last_date next_date first_date last_date
zhangsan 2020-02-23 21 1970-01-01 2020-03-12 2020-02-23 2020-05-12
zhangsan 2020-03-12 34 2020-02-23 2020-04-15 2020-02-23 2020-05-12
zhangsan 2020-04-15 12 2020-03-12 2020-05-12 2020-02-23 2020-05-12
zhangsan 2020-05-12 51 2020-04-15 2020-12-31 2020-02-23 2020-05-12
lisi 2020-03-16 12 1970-01-01 2020-03-21 2020-03-16 2020-07-27
lisi 2020-03-21 24 2020-03-16 2020-07-12 2020-03-16 2020-07-27
lisi 2020-07-12 41 2020-03-21 2020-07-27 2020-03-16 2020-07-27
lisi 2020-07-27 32 2020-07-12 2020-12-31 2020-03-16 2020-07-27