hive 常用語句彙總
hive的常用語句及UDF
基本語句
-》 欄位的查詢
-》 where、limit、distince
-》查詢部門編號是30的員工
select empno,ename,deptno from emp where deptno='30'
-》檢視前3條記錄
select * from emp limit 3;
-》查詢當前有哪些部門
select distinct deptno from emp;
-》 between and, > < = is null , is not null,in,not in
-》查詢員工編號大於7500
select * from emp where empno > 7500;
-》查詢薪資2000 到3000之間的
select * from emp where sal between 2000 and 3000;
-》查詢獎金不為空的員工
select * from emp where comm is not null;
-》聚合函式 max 、 min、avg、count、sum
select count(1) cnt from emp;
select max(sal) max_sal from emp;
select avg(sal) avg_sal from emp;
-》 group by, having
-> 求每個部門的評價工資
select deptno,avg(sal) from emp group by deptno;
->求部門平均工資大於2000的
select deptno, avg(sal) avg from emp group by deptno having avg > 2000;
-》 join
- 等值join(innner join) :兩邊都有的值進行join
select a.empno,a.ename,a.sal,b.deptno,b.dname from emp a inner join dept b on a.deptno = b.deptno;
- left join : 以左表的值為基準
select a.empno,a.ename,a.sal,b.deptno,b.dname from emp a left join dept b on a.deptno = b.deptno;
- right join : 以右表的值為基準
select a.empno,a.ename,a.sal,b.deptno,b.dname from emp a right join dept b on a.deptno = b.deptno;
- full join:以兩張表中所有的值為基準
select a.empno,a.ename,a.sal,b.deptno,b.dname from emp a full join dept b on a.deptno = b.deptno;
四、hive中的四種排序
-> order by:對某一列進行全域性排序
select empno,ename,deptno,sal from emp order by sal desc;
-> sort by:對每個reduce進行內部排序,如果只有一個reduce,等同於order by
set mapreduce.job.reduces =2
insert overwrite local directory '/opt/datas/sort' select empno,ename,deptno,sal from emp sort by sal desc;
-> distribute by:對資料按照某個欄位進行分割槽,交給不同的reduce進行處理,一般與sort by 連用,必須放在sort by前面
insert overwrite local directory '/opt/datas/distribute' select empno,ename,deptno,sal from emp distribute by empno sort by sal desc;
-> cluster by:當我們的distribute by 與sort by 使用的是同一個欄位時,可用cluster by代替