Hive中的join操作
阿新 • • 發佈:2018-12-23
在做sql 操作的時候多表join 在所難免,下面主要介紹一下hive 中做join
LEFT JOIN,RIGHT JOIN, FULL OUTER JOIN ,inner join, left semi join 準備資料 1,a 2,b 3,c 4,d 7,y 8,u 2,bb 3,cc 7,yy 9,pp 建表: create table a(id int,name string) row format delimited fields terminated by ','; create table b(id int,name string) row format delimited fields terminated by ','; 匯入資料: load data local inpath '/home/hadoop/a.txt' into table a; load data local inpath '/home/hadoop/b.txt' into table b; 1. inner join select * from a inner join b on a.id=b.id; +-------+---------+-------+---------+--+ | a.id | a.name | b.id | b.name | +-------+---------+-------+---------+--+ | 2 | b | 2 | bb | | 3 | c | 3 | cc | | 7 | y | 7 | yy | +-------+---------+-------+---------+--+ 就是求交集。 2. inner join select * from a left join b on a.id=b.id; +-------+---------+-------+---------+--+ | a.id | a.name | b.id | b.name | +-------+---------+-------+---------+--+ | 1 | a | NULL | NULL | | 2 | b | 2 | bb | | 3 | c | 3 | cc | | 4 | d | NULL | NULL | | 7 | y | 7 | yy | | 8 | u | NULL | NULL | +-------+---------+-------+---------+--+ 左邊沒有找到連線的置空。 3. right join select * from a right join b on a.id=b.id; 4. full outer join select * from a full outer join b on a.id=b.id; +-------+---------+-------+---------+--+ | a.id | a.name | b.id | b.name | +-------+---------+-------+---------+--+ | 1 | a | NULL | NULL | | 2 | b | 2 | bb | | 3 | c | 3 | cc | | 4 | d | NULL | NULL | | 7 | y | 7 | yy | | 8 | u | NULL | NULL | | NULL | NULL | 9 | pp | +-------+---------+-------+---------+--+ 兩邊資料都顯示。 5. left semi join select * from a left semi join b on a.id = b.id; +-------+---------+--+ | a.id | a.name | +-------+---------+--+ | 2 | b | | 3 | c | | 7 | y | +-------+---------+--+ 只返回左邊一半,即a的東西,效率高一點。