hive-列轉行和行轉列
阿新 • • 發佈:2019-02-10
1. 假設我們在hive中有兩張表,其中一張表是存使用者基本資訊,另一張表是存使用者的地址資訊等,表資料假設如下:
user_basic_info:
id | name |
1 | a |
2 | b |
3 | c |
4 | d |
name | address |
a | add1 |
a | add2 |
b | add3 |
c | add4 |
d | add5 |
id | name | address |
1 | a | add1,add2 |
2 | b | add3 |
3 | c | add4 |
4 | d | add5 |
建表:
create table user_basic_info(id string, name string);
create table user_address(name string, address string);
載入資料:
load data local inpath '/home/jthink/work/workspace/hive/row_col_tran/data1' into table user_basic_info; load data local inpath '/home/jthink/work/workspace/hive/row_col_tran/data2' into table user_address;
執行合併:
select max(ubi.id), ubi.name, concat_ws(',', collect_set(ua.address)) as address from user_basic_info ubi join user_address ua on ubi.name=ua.name group by ubi.name;
執行結果:
1 a add1,add2
2 b add3
3 c add4
4 d add5
2. 假設我們有一張表:
user_info:
id | name | address |
1 | a | add1,add2 |
2 | b | add3 |
3 | c | add4 |
4 | d | add5 |
id | name | address |
1 | a | add1 |
1 | a | add2 |
2 | b | add3 |
3 | c | add4 |
4 | d | add5 |
我們很容易想到用UDTF,explode():
select explode(address) as address from user_info;
這樣執行的結果只有address, 但是我們需要完整的資訊:
select id, name, explode(address) as address from user_info;
這樣做是不對的, UDTF's are not supported outside the SELECT clause, nor nested in expressions
所以我們需要這樣做:
select id, name, add from user_info ui lateral view explode(ui.address) adtable as add;
結果為:
1 a add1
1 a add2
2 b add3
3 c add4
4 d add5