根據uid計算使用者活躍日期及進行使用者日活分類
阿新 • • 發佈:2018-11-17
1、建表
drop table if exists oss_bi_all_type_of_dau101; CREATE EXTERNAL TABLE `oss_bi_all_type_of_dau101`( `uid` bigint, `newer_day` string, `remain_day_num` int, `parms` map<string,string>) PARTITIONED BY ( `pt_month` string, `pt_day` string) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH SERDEPROPERTIES ( 'colelction.delim'='\u0002', 'mapkey.delim'='\u0003') STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION 'hdfs:///emr-cluster/user/hive/warehouse/oss_bi_all_type_of_dau101';
2、新增分割槽及資料
alter table oss_bi_all_type_of_dau101 drop if exists partition (pt_month='2018-08',pt_day='2018-08-01'); alter table oss_bi_all_type_of_dau101 add partition (pt_month='2018-08',pt_day='2018-08-01') location 'oss://[ossKey]:[ossKeyVal]@chushou-hz.oss-cn-hangzhou-internal.aliyuncs.com/hive/oss_bi_all_type_of_dau101/2018-08/2018-08-01'; with tab_curr_dau as( select pt_day,parms['uid'] uid from oss_bi_all_jellyfish_log where pt_day ='2018-08-01' and log_type=2 and parms['roomId']=101 group by pt_day,parms['uid']), tab_his_dau as ( select uid,min(pt_day) newer_day from oss_bi_all_type_of_dau101 where pt_day <'2018-08-01' group by uid) insert into oss_bi_all_type_of_dau101 partition (pt_month='2018-08',pt_day='2018-08-01')(uid,newer_day,remain_day_num,parms) select a1.uid,case when a2.uid is null then a1.pt_day else a2.newer_day end newer_day,datediff(a1.pt_day,case when a2.uid is null then a1.pt_day else a2.newer_day end) remain_day_num,str_to_map('') parms from tab_curr_dau a1 left join tab_his_dau a2 on a1.uid=a2.uid ;
3、說明
建表的時候可以設定儲存在hdfs上,在新增分割槽的時候可以將資料指定在oss上。
map空資料的insert into的方法,通過str_to_map('')方法來實現。
可以寫成python指令碼,進行順序排程;因為當天的資料依賴於昨天的資料跑出,所以必須序列排程。