hive lateral view 與 explode詳解
1.explode
Then running the query:
SELECT explode(myCol) AS myNewCol FROM myTable;
- 1
- 1
will produce:
The usage with Maps is similar:
SELECT explode(myMap) AS (myMapKey, myMapValue) FROM myMapTable;
- 1
- 1
總結起來一句話:explode就是將hive一行中複雜的array或者map結構拆分成多行。
使用例項:
xxx表中有一個欄位mvt為string型別,資料格式如下:
[{“eid”:”38”,”ex”:”affirm_time_Android”,”val”:”1”,”vid”:”31”,”
用explode小試牛刀一下:
select explode(split(regexp_replace(mvt,'\\[|\\]',''),'\\},\\{')) from ods_mvt_hourly where day=20160710 limit 10;
- 1
- 1
最後出來的結果如下:
{“eid”:”38”,”ex”:”affirm_time_Android”,”val”:”1”,”vid”:”31”,”vr”:”var1”
“eid”:”42”,”ex”:”new_comment_Android”,”val”:”1”,”vid”:”34”,”vr”:”var1”
“eid”:”40”,”ex”:”new_rpname_Android”,”val”:”1”,”vid”:”1”,”vr”:”var1”
“eid”:”19”,”ex”:”hotellistlpage_Android”,”val”:”1”,”vid”:”1”,”vr”:”var01”
“eid”:”29”,”ex”:”bookhotelpage_Android”,”val”:”0”,”vid”:”1”,”vr”:”var01”
“eid”:”17”,”ex”:”trainMode_Android”,”val”:”1”,”vid”:”1”,”vr”:”mode_Android”
“eid”:”44”,”ex”:”ihotelList_Android”,”val”:”1”,”vid”:”36”,”vr”:”var1”
“eid”:”47”,”ex”:”ihotelDetail_Android”,”val”:”0”,”vid”:”38”,”vr”:”var1”}
{“eid”:”38”,”ex”:”affirm_time_Android”,”val”:”1”,”vid”:”31”,”vr”:”var1”
“eid”:”42”,”ex”:”new_comment_Android”,”val”:”1”,”vid”:”34”,”vr”:”var1”
2.lateral view
hive wiki 上的解釋如下:
Example
Consider the following base table named pageAds. It has two columns: pageid (name of the page) and adid_list (an array of ads appearing on the page)
An example table with two rows:
and the user would like to count the total number of times an ad appears across all pages.
A lateral view with explode() can be used to convert adid_list into separate rows using the query:
SELECT pageid, adid
FROM pageAds LATERAL VIEW explode(adid_list) adTable AS adid;
- 1
- 2
- 1
- 2
The resulting output will be
Then in order to count the number of times a particular ad appears, count/group by can be used:
SELECT adid, count(1)
FROM pageAds LATERAL VIEW explode(adid_list) adTable AS adid
GROUP BY adid;
- 1
- 2
- 3
- 1
- 2
- 3
The resulting output will be
由此可見,lateral view與explode等udtf就是天生好搭檔,explode將複雜結構一行拆成多行,然後再用lateral view做各種聚合。
3.例項
還是第一部分的例子,上面我們explode出來以後的資料,不是標準的json格式,我們通過lateral view與explode組合解析出標準的json格式資料:
SELECT ecrd, CASE WHEN instr(mvtstr,'{')=0
AND instr(mvtstr,'}')=0 THEN concat('{',mvtstr,'}') WHEN instr(mvtstr,'{')=0
AND instr(mvtstr,'}')>0 THEN concat('{',mvtstr) WHEN instr(mvtstr,'}')=0
AND instr(mvtstr,'{')>0 THEN concat(mvtstr,'}') ELSE mvtstr END AS mvt
FROM ods.ods_mvt_hourly LATERAL VIEW explode(split(regexp_replace(mvt,'\\[|\\]',''),'\\},\\{')) addTable AS mvtstr
WHERE DAY='20160710' and ecrd is not null limit 10
- 1
- 2
- 3
- 4
- 5
- 6
- 1
- 2
- 3
- 4
- 5
- 6
查詢出來的結果:
xxx
{“eid”:”38”,”ex”:”affirm_time_Android”,”val”:”1”,”vid”:”31”,”vr”:”var1”}
xxx
{“eid”:”42”,”ex”:”new_comment_Android”,”val”:”1”,”vid”:”34”,”vr”:”var1”}
xxx
{“eid”:”40”,”ex”:”new_rpname_Android”,”val”:”1”,”vid”:”1”,”vr”:”var1”}
xxx
{“eid”:”19”,”ex”:”hotellistlpage_Android”,”val”:”1”,”vid”:”1”,”vr”:”var01”}
xxx
{“eid”:”29”,”ex”:”bookhotelpage_Android”,”val”:”0”,”vid”:”1”,”vr”:”var01”
xxx
{“eid”:”17”,”ex”:”trainMode_Android”,”val”:”1”,”vid”:”1”,”vr”:”mode_Android”}
xxx
{“eid”:”44”,”ex”:”ihotelList_Android”,”val”:”1”,”vid”:”36”,”vr”:”var1”}
xxx
{“eid”:”47”,”ex”:”ihotelDetail_Android”,”val”:”1”,”vid”:”38”,”vr”:”var1”}
xxx
{“eid”:”38”,”ex”:”affirm_time_Android”,”val”:”1”,”vid”:”31”,”vr”:”var1”}
xxx
{“eid”:”42”,”ex”:”new_comment_Android”,”val”:”1”,”vid”:”34”,”vr”:”var1”}
4.Ending
Lateral View通常和UDTF一起出現,為了解決UDTF不允許在select欄位的問題。
Multiple Lateral View可以實現類似笛卡爾乘積。
Outer關鍵字可以把不輸出的UDTF的空結果,輸出成NULL,防止丟失資料。