[Hive_12] Hive 的自定義函數
阿新 • • 發佈:2019-02-08
hub clas array 個數 define share exp lis 標簽
0. 說明
UDF //user define function
//輸入單行,輸出單行,類似於 format_number(age,‘000‘)
UDTF //user define table-gen function
//輸入單行,輸出多行,類似於 explode(array);
UDAF //user define aggr function
//輸入多行,輸出單行,類似於 sum(xxx)
Hive 通過 UDF 實現對 temptags 的解析
1. UDF
1.1 代碼示例
Code
1.2 用戶自定義函數的使用
1. 將 Hive 自定義函數打包並發送到 /soft/hive/lib 下
2. 重啟 Hive
3. 註冊函數
# 永久函數 create function myudf as ‘com.share.udf.MyUDF‘; # 臨時函數 create temporary function myudf as ‘com.share.udf.MyUDF‘;
1.3 Demo
Hive 通過 UDF 實現對 temptags 的解析
0. 準備數據
1. 建表
create table temptags(id int,json string) row format delimited fields terminated by‘\t‘;
2. 加載數據
load data local inpath ‘/home/centos/files/temptags.txt‘ into table temptags;
3. 代碼編寫
Code
4. 打包
5. 添加 fastjson-1.2.47.jar & myhive-1.0-SNAPSHOT.jar 到 /soft/hive/lib 中
6. 重啟 Hive
7. 註冊臨時函數
create temporary function parsejson as ‘com.share.udf.ParseJson‘;
8. 測試
select id ,parsejson(json) as tags from temptags;
# 將 id 和 tag 炸開 select id, tag from temptags lateral view explode(parsejson(json)) xx as tag; # 開始統計每個商家每個標簽個數 select id, tag, count(*) as count
from (select id, tag from temptags lateral view explode(parsejson(json)) xx as tag) a
group by id, tag; # 進行商家內標簽數的排序 select id, tag , count, row_number()over(partition by id order by count desc) as rank
from (select id, tag, count(*) as count from (select id, tag from temptags lateral view explode(parsejson(json)) xx as tag) a
group by id,tag) b ; # 將標簽和個數進行拼串,取得前 10 標簽數 select id, concat(tag,‘_‘,count)
from (select id, tag , count, row_number()over(partition by id order by count desc) as rank
from (select id, tag, count(*) as count from (select id, tag from temptags lateral view explode(parsejson(json)) xx as tag) a
group by id,tag) b )c
where rank<=10; #聚合拼串 //concat_ws(‘,‘, List<>) //collect_set(name) 將所有字段變為數組,去重 //collect_list(name) 將所有字段變為數組,不去重 select id, concat_ws(‘,‘,collect_set(concat(tag,‘_‘,count))) as tags
from (select id, tag , count, row_number()over(partition by id order by count desc) as rank
from (select id, tag, count(*) as count from (select id, tag from temptags lateral view explode(parsejson(json)) xx as tag) a
group by id,tag) b )c where rank<=10 group by id;
1.4 虛列:lateral view
123456 味道好_10,環境衛生_9
id tags
1 [味道好,環境衛生] => 1 味道好
1 環境衛生
select name, workplace from employee lateral view explode(work_place) xx as workplace;
[Hive_12] Hive 的自定義函數