Hive -函式
阿新 • • 發佈:2018-11-21
聚合函式
max min sum avg count (對於這種聚合函式就會執行Map Reduce)
hive (default)> select count(1) from ruoze_emp where deptno=10;(查詢部門編號為10的人的數量)
hive (default)> select max(sal) min(sal) avg(sal) sum(sal) from ruoze_emp;
分組函式
出現在select中的欄位,要麼出現在group by子句中,要麼出現在聚合函式中
hive (default)> select deptno,avg(sal) from ruoze_emp group by deptno;(求部門的平均工資。) hive (default)> select deptno,job,max(sal) from ruoze_emp group by deptno,job;(求每個部門、工作崗位的最高工資) 10 CLERK 1300.0 10 MANAGER 2450.0 10 PRESIDENT 5000.0 20 ANALYST 3000.0 20 CLERK 1100.0 20 MANAGER 2975.0 30 CLERK 950.0 30 MANAGER 2850.0 30 SALESMAN 1600.0 hive (default)> select deptno,avg(sal) from ruoze_emp group by deptno having avg(sal)>2000;(求每個部門的平均薪水大於2000的部門)(如果把having改成where會報錯的,因為作用在分組之上的函式要用having,即group by與having搭配使用)
case when then if-else
hive (default)> select ename, sal, case when sal>1 and sal<=1000 then 'LOWER' when sal>1000 and sal<=2000 then 'MIDDLE' when sal>2000 and sal<=4000 then 'HIGH' ELSE 'HIGHEST' end from ruoze_emp; SMITH 800.0 LOWER ALLEN 1600.0 middle WARD 1250.0 middle JONES 2975.0 HIGH HIVE 10300.0 HIGHEST
檢視hive的內建函式
hive (default)> show functions;(檢視hive的內建函式)
hive (default)> desc function upper;(檢視具體的某個函式的用法)
upper(str) - Returns str with all characters changed to uppercase (upper函式後面跟字串,其作用是把字串變為大寫)
hive (default)> desc function extended upper;(更詳細檢視)
資料傾斜
union all select count(1) from ruoze_emp where deptno=10 union all select count(1) from ruoze_emp where deptno=20;
a = a1 union all a2 (a表假設是傾斜的,把a表分為兩部分,傾斜的a1和不傾斜的a2 然後把它們的結果進行聯合到一起去)
型別轉換函式
cast(value as TYPE)
舉例如下:
hive (default)> select empno,ename,sal,comm,cast(comm as int) from ruoze_emp;(把comm下面的值轉為整型)
7369 SMITH 800.0 NULL NULL
7499 ALLEN 1600.0 300.0 300
7521 WARD 1250.0 500.0 500
7566 JONES 2975.0 NULL NULL
(注:如果轉換失敗,返回值就是null)
hive (default)> select cast('5' as int);(把字串5轉為int型別)
hive (default)> select current_timestamp;
2018-11-08 22:11:19.285
hive (default)> select cast(current_timestamp as date);
2018-11-08
字串相關函式:
1.substr:
hive (default)> select substr('abcdefg',2,3);(從字串的第二個字元開始取三個字元)
bcd
2.concat_ws:
hive (default)> desc function extended concat_ws;
concat_ws(separator, [string | array(string)]+) - returns the concatenation of the strings separated by the separator.
hive (default)> select concat_ws('.','www',array('facebook','com'));
www.facebook.com
hive (default)> select concat_ws('.','192','168','2','65');
192.168.2.65 (注意有無array的區別)
hive (default)> select length('192.168.2.65');
12
3.split:
hive (default)> select split ("192.168.2.65",'.');
["","","","","","","","","","","","",""]
hive (default)> select split ("192.168.2.65",'\\.');(用轉義字元對點進行轉義)
["192","168","2","65"]
4.explode:
hive (default)> desc function extended explode;
explode(a) - separates the elements of array a into multiple rows, or the elements of a map into multiple rows and columns
例項:
[[email protected] data]$ vi student.txt(建立一個文件)
1,doudou,化學:物理:數學:語文
2,dasheng,化學:數學:生物:生理:衛生
3,rachel,化學:語文:英語:體育:生物
hive (default)> create table ruoze_student(id int,name string,subjects array<string>)row format delimited fields terminated by ','COLLECTION ITEMS TERMINATED BY ':';
load data local inpath '/home/hadoop/data/student.txt' into table ruoze_student;
hive (default)> select explode(subjects) from ruoze_student;
化學
物理
數學
語文
化學
數學
生物
生理
衛生
化學
語文
英語
體育
生物
hive (default)> select distinct s.sub from(select explode (subjects) as sub from ruoze_student) s;(完成對上面學科的去重)
體育
化學
衛生
數學
物理
生物
生理
英語
語文
面試題:使用hive完成wordcount
hive (default)> create table ruoze_wc(sentence string);
hive (default)> load data local inpath"/home/hadoop/data/wc.txt" into table ruoze_wc;
hive (default)> select *from ruoze_wc;
hello,world,welcome
hello,welcome
步驟1:把字串進行拆分
hive (default)> select split(sentence,',') from ruoze_wc;
["hello","world","welcome"]
["hello","welcome"]
步驟2:把數組裡面的每個單詞都拆出來,按每行一個單詞
hive (default)> select explode(split(sentence,',')) from ruoze_wc;
hello
world
welcome
hello
welcome
步驟3:對單詞進行統計個數
hive (default)> select word, count(1) as c from(select explode(split(sentence,",")) as word from ruoze_wc) t group by word order by c desc; (t是屬於別名,雖然沒有用到,但是如果不加上別名格式會報錯。desc是指按照降序排列)
welcome 2
hello 2
world 1