DDL資料定義(二)表的分割槽
分割槽表
原理:
分割槽表實際上就是對應一個HDFS檔案系統上的獨立的資料夾,該資料夾下是該分割槽所有的資料檔案,Hive中的分割槽就是分目錄,把一個大的資料集根據業務需要分割成小的資料集,在查詢時,通過WHERE子句中表達式選擇查詢需要的指定分割槽,這樣查詢的效率高很多
分割槽表的基本操作
- 引入分割槽表(根據日期對日誌進行管理)
/user/hive/warehouse/log_partition/20170702/20170702.log
/user/hive/warehouse/log_partition/20170703/20170703.log
/user/hive/warehouse/log_partition/20170704/20170704.log
-
建立分割槽表的語法
hive (default)> create table dept_partition( deptno int, dname string, loc string ) partitioned by (month string) row format delimited fields terminated by '\t';
-
載入資料到分割槽表
hive (hive)> load data local inpath '/opt/datas/dept.txt' into table dept_partition partition(month='201709'); hive (hive)> load data local inpath '/opt/datas/dept.txt' into table dept_partition partition(month='201708'); hive (hive)> load data local inpath '/opt/datas/dept.txt' into table dept_partition partition(month='201707');
-
查詢分割槽表中的資料
單分割槽查詢select * from dept_partition where month='201709';
多分割槽聯合查詢(使用union聯合多個select語句)
select * from dept_partition where month='201709' union select * from dept_partition where month='201708';
-
增加分割槽
建立單個分割槽hive (hive)> alter table dept_partition add partition(month='201705');
同時建立多個分割槽
hive (hive)> alter table dept_partition add partition(month='201704') partition(month='201703');
-
刪除分割槽
刪除單個分割槽hive (hive)> alter table dept_partition drop partition(month='201705');
刪除多個分割槽(注意分割槽間存在,號)
hive (hive)> alter table dept_partition drop partition(month='201704') ,partition(month='201703');
-
檢視分割槽表有多少分割槽
hive (hive)> show partitions dept_partition; OK partition month=201706 month=201707 month=201708 month=201709
-
查詢分割槽表的結構
hive (hive)> desc formatted dept_partition;
動態分割槽
1.開啟動態分割槽
set hive.exec.dynamic.partition=true;
2.設定動態分割槽模式
set hive.exec.dynamic.partition.mode=nostrict;
預設是strict,表示必須指定至少一個分割槽為靜態分割槽
nostrict模式允許所有的分割槽欄位都可以使用動態分割槽
資料來源
1,zshang,18,game-girl-book,stu_addr:beijing-work_addr:shanghai,2018-08-08
2,lishi,16,shop-boy-book,stu_addr:hunan-work_addr:shanghai,2018-08-09
3,wang2mazi,20,fangniu-eat,stu_addr:shanghai-work_addr:tianjing,2018-08-10
4,zshang,18,game-girl-book,stu_addr:beijing-work_addr:shanghai,2018-08-08
5,lishi,16,shop-boy-book,stu_addr:hunan-work_addr:shanghai,2018-08-09
6,wang2mazi,20,fangniu-eat,stu_addr:shanghai-work_addr:tianjing,2018-08-10
4.建立表
create table person1(
id int,
name string,
age int,
likes array<string>,
address map<string,string>,
dt string
)
row format delimited fields terminated by ','
collection items terminated by '-'
map keys terminated by ':'
lines terminated by '\n';
5.插入資料到表中
load data local inpath '/test/person.txt' into table person1;
6.建立分割槽表
create table datap(
id int,
name string,
age int,
likes array<string>,
address map<string,string>
)
partitioned by (dt string)
row format delimited fields terminated by ','
collection items terminated by '-'
map keys terminated by ':'
lines terminated by '\n';
7.插入資料
insert into datap partition(dt) select id,name,age,likes,address,dt from person1 distribute by dt;
8.檢視分割槽資訊
二級分割槽表
建立二級分割槽表(只是partitioned by加了欄位)
create table dept_partition2(
deptno int,
dname string,
loc string
)
partitioned by (month string,day string)
row format delimited fields terminated by '\t';
正常的載入資料
load data local inpath '/opt/datas/dept.txt'
into table dept_partition2 partition(month='201709',day='13');
查詢分割槽資料
hive (hive)> select * from dept_partition2;
把資料直接上傳到分割槽目錄上,讓分割槽表和資料產生關聯的三種方式
方式一:上傳資料後修復
上傳資料
hive (hive)> dfs -mkdir -p /user/hive/warehouse/hive.db/dept_partition2/month=201709/day=12;
hive (hive)> dfs -put /opt/datas/dept.txt /user/hive/warehouse/hive.db/dept_partition2/month=201709/day=12;
查詢資料(老版本的hive,查詢不到剛上傳的資料)
hive (hive)> select * from dept_partition2 where month='201709' and day='12';
OK
dept_partition2.deptno dept_partition2.dname dept_partition2.loc dept_partition2.montdept_partition2.day
Time taken: 2.766 seconds
執行修復命令
hive (hive)> msck repair table dept_partition2;
再次查詢
hive (hive)> select * from dept_partition2 where month='201709' and day='12';
方式二:上傳資料後新增分割槽
上傳資料
hive (default)> dfs -mkdir -p /user/hive/warehouse/dept_partition2/month=201709/day=11;
hive (default)> dfs -put /opt/module/datas/dept.txt /user/hive/warehouse/dept_partition2/month=201709/day=11;
執行新增分割槽
hive (default)> alter table dept_partition2 add partition(month='201709', day='11');
查詢資料
hive (default)> select * from dept_partition2 where month='201709' and day='11';
方式三:上傳資料後load資料到分割槽
建立目錄
hive (default)> dfs -mkdir -p /user/hive/warehouse/dept_partition2/month=201709/day=10;
上傳資料
hive (default)> load data local inpath '/opt/module/datas/dept.txt' into table dept_partition2 partition(month='201709',day='10');
查詢資料
hive (default)> select * from dept_partition2 where month='201709' and day='10';