1. 程式人生 > >Hive-常用操作

Hive-常用操作

前提條件:

安裝好hadoop2.7.3(LInux系統下)

安裝好hive2.3.3(Linux系統下)

安裝好Xampp(Windows系統下),併成功用Navicat連線Xampp Mysql。參考:Navicat連線Xampp資料庫

 

準備源資料:

1. 開啟終端,新建emp.csv檔案

$ nano emp.csv

輸入內容如下,儲存退出。

7369,SMITH,CLERK,7902,1980/12/17,800,,20
7499,ALLEN,SALESMAN,7698,1981/2/20,1600,300,30
7521,WARD,SALESMAN,7698,1981/2/22,1250,500,30
7566,JONES,MANAGER,7839,1981/4/2,2975,,20
7654,MARTIN,SALESMAN,7698,1981/9/28,1250,1400,30
7698,BLAKE,MANAGER,7839,1981/5/1,2850,,30
7782,CLARK,MANAGER,7839,1981/6/9,2450,,10
7788,SCOTT,ANALYST,7566,1987/4/19,3000,,20
7839,KING,PRESIDENT,,1981/11/17,5000,,10
7844,TURNER,SALESMAN,7698,1981/9/8,1500,0,30
7876,ADAMS,CLERK,7788,1987/5/23,1100,,20
7900,JAMES,CLERK,7698,1981/12/3,950,,30
7902,FORD,ANALYST,7566,1981/12/3,3000,,20
7934,MILLER,CLERK,7782,1982/1/23,1300,,10

 2. 新建dept.csv檔案

$ nano dept.csv

輸入以下內容,儲存退出

10,ACCOUNTING,NEW YORK
20,RESEARCH,DALLAS
30,SALES,CHICAGO
40,OPERATIONS,BOSTON

 

實驗操作:

(1)把上面兩張表上傳到hdfs某個目錄下,如/001/hive

在linux終端下輸入命令:

hdfs dfs -mkdir -p /001/hive
hdfs dfs -put dept.csv /001/hive
hdfs dfs -put emp.csv /001/hive

(2)建立員工表(emp+學號,如:emp001)注意:在hive命令列下輸入:

     進入hive命令列:

$ hive

新建hive表,表名為emp001 

create table emp001(empno int,ename string,job string,mgr int,hiredate string,sal int,comm int,deptno int) row format delimited fields terminated by ',';

(3)建立部門表(dept+學號,如:dept001)

create table dept001(deptno int,dname string,loc string) row format delimited fields terminated by ',';

(4)匯入資料

load data inpath '/001/hive/emp.csv' into table emp001;  
load data inpath '/001/hive/dept.csv' into table dept001;

(5)根據員工的部門號建立分割槽,表名emp_part+學號,如:emp_part001

create table emp_part001(empno int,ename string,job string,mgr int,hiredate string,sal int,comm int)partitioned by (deptno int)row format delimited fields terminated by ',';

 往分割槽表中插入資料:指明匯入的資料的分割槽(通過子查詢匯入資料)。

insert into table emp_part001 partition(deptno=10) select empno,ename,job,mgr,hiredate,sal,comm from emp001 where deptno=10;
insert into table emp_part001 partition(deptno=20) select empno,ename,job,mgr,hiredate,sal,comm from emp001 where deptno=20;
insert into table emp_part001 partition(deptno=30) select empno,ename,job,mgr,hiredate,sal,comm from emp001 where deptno=30;

(6)建立一個桶表,表名emp_bucket+學號,如:emp_bucket001,根據員工的職位(job)進行分桶

create table emp_bucket001(empno int,ename string,job string,mgr int,hiredate string,sal int,comm int,deptno int)clustered by (job) into 4 buckets row format delimited fields terminated by ',';	

 通過子查詢插入資料:

insert into emp_bucket001 select * from emp001;

(7)查詢所有的員工資訊

select * from emp001;

(8)查詢員工資訊:員工號 姓名 薪水

select empno,ename,sal from emp001;

(9)多表查詢

select dept001.dname,emp001.ename from emp001,dept001 where emp001.deptno=dept001.deptno;

(10)做報表,根據職位給員工漲工資,把漲前、漲後的薪水顯示出來 

按如下規則漲薪,PRESIDENT漲1000元,MANAGER漲800元,其他人員漲400元

select empno,ename,job,sal,
case job when 'PRESIDENT' then sal+1000
 when 'MANAGER' then sal+800
 else sal+400
end 
from emp001;

 

完成!