1. 程式人生 > >分割槽表,管理表

分割槽表,管理表

建立分割槽表:

create table if not exists china_partition(
ProvinceID int,
ProvinceName string,
CityID int,
CityName string,
ZipCode int,
DistrictID int,
DistrictName string)
partitioned by ( Province string,City string )
row format delimited fields terminated by ','
;
注意:分割槽欄位名和資料欄位名不能相同,不然報錯如下:
hive> create table if not exists china_partition(
    > ProvinceID int,
    > ProvinceName string,
    > CityID int,
    > CityName string,
    > ZipCode int,
    > DistrictID int,
    > DistrictName string)
    > partitioned by ( ProvinceName string,CityName string )
    > row format delimited fields terminated by ','
    > ;
FAILED: SemanticException [Error 10035]: Column repeated in partitioning columns
分割槽表載入資料:
load data local inpath '/home/hadoop/china_data/beijing.txt' into table china_partition partition ( Province='beijing',city='beijing');

hdfs檔案系統目錄結構:


也可以使用 show partitions 檢視分割槽:

hive> show partitions china_partition;
使用 hive.mapred.mode(可選值為 strict,nonstrict)

如果 hive.mapred.mode 設定為 strict,查詢分割槽表示 Hql 語句必須有 where 子句,不然會報錯:

hive> select * from china_partition;
FAILED: SemanticException Queries against partitioned tables without a partition filter are disabled for safety reasons. If you know what you are doing, please make sure that hive.strict.checks.large.query is set to false and that hive.mapred.mode is not set to 'strict' to enable them. No partition predicate for Alias "china_partition" Table "china_partition"

如果 hive.mapred.mode 設定為 nonstrict,查詢分割槽表可以不帶where子句:

hive> set hive.mapred.mode=nonstrict;
hive> select * from china_partition;
OK
china_partition.provinceid	china_partition.provincename	china_partition.cityid	china_partition.cityname	china_partition.zipcode	china_partition.districtid	china_partition.districtname	china_partition.province	china_partition.city
1	北京市	1	北京市	100000	1	東城區	beijing	beijing
1	北京市	1	北京市	100000	2	西城區	beijing	beijing
1	北京市	1	北京市	100000	3	崇文區	beijing	beijing
1	北京市	1	北京市	100000	4	宣武區	beijing	beijing
1	北京市	1	北京市	100000	5	朝陽區	beijing	beijing
1	北京市	1	北京市	100000	6	豐臺區	beijing	beijing
1	北京市	1	北京市	100000	7	石景山區	beijing	beijing
1	北京市	1	北京市	100000	8	海淀區	beijing	beijing
1	北京市	1	北京市	100000	9	門頭溝區	beijing	beijing
1	北京市	1	北京市	100000	10	房山區	beijing	beijing
1	北京市	1	北京市	100000	11	通州區	beijing	beijing
1	北京市	1	北京市	100000	12	順義區	beijing	beijing
1	北京市	1	北京市	100000	13	昌平區	beijing	beijing
1	北京市	1	北京市	100000	14	大興區	beijing	beijing
1	北京市	1	北京市	100000	15	懷柔區	beijing	beijing
1	北京市	1	北京市	100000	16	平谷區	beijing	beijing
1	北京市	1	北京市	100000	17	密雲縣	beijing	beijing
1	北京市	1	北京市	100000	18	延慶縣	beijing	beijing
Time taken: 0.125 seconds, Fetched: 18 row(s)

如果分割槽特別多,使用者執行查詢部分分割槽,也是使用:
hive> show partitions china_partition partition (province='beijing');
OK
partition
province=beijing/city=beijing
Time taken: 0.18 seconds, Fetched: 1 row(s)
使用 describe formatted table_name 也可以顯示分割槽資訊:
hive> describe formatted china_partition;
OK
col_name	data_type	comment
# col_name            	data_type           	comment             
	 	 
provinceid          	int                 	                    
provincename        	string              	                    
cityid              	int                 	                    
cityname            	string              	                    
zipcode             	int                 	                    
districtid          	int                 	                    
districtname        	string              	                    
	 	 
# Partition Information	 	 
# col_name            	data_type           	comment             
	 	 
province            	string              	                    
city                	string              	                    
	 	 
# Detailed Table Information	 	 
Database:           	default             	 
Owner:              	hadoop              	 
CreateTime:         	Tue Apr 25 16:05:55 CST 2017	 
LastAccessTime:     	UNKNOWN             	 
Retention:          	0                   	 
Location:           	hdfs://localhost:9000/user/hive/warehouse/china_partition	 
Table Type:         	MANAGED_TABLE       	 
Table Parameters:	 	 
	transient_lastDdlTime	1493107555          
	 	 
# Storage Information	 	 
SerDe Library:      	org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe	 
InputFormat:        	org.apache.hadoop.mapred.TextInputFormat	 
OutputFormat:       	org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat	 
Compressed:         	No                  	 
Num Buckets:        	-1                  	 
Bucket Columns:     	[]                  	 
Sort Columns:       	[]                  	 
Storage Desc Params:	 	 
	field.delim         	,                   
	serialization.format	,                   
Time taken: 0.065 seconds, Fetched: 38 row(s)
hive> 
hive分割槽表還有很多策略例如 archive,touce,enable no_drop,enable offline 等。