1. 程式人生 > 實用技巧 >HIVE分割槽

HIVE分割槽

HIVE分割槽

簡單分割槽

hive> CREATE TABLE psn_2(
    > id int,
    > name string,
    > likes array<string>,
    > address map<string,string>
    > )
    > PARTITIONED BY (age int)
    > ROW FORMAT DELIMITED
    > FIELDS TERMINATED BY ','
    > COLLECTION ITEMS TERMINATED BY '-'
    > MAP KEYS TERMINATED BY ':';
OK
Time taken: 20.107 seconds
hive> desc formtted psn_2;
FAILED: SemanticException [Error 10001]: Table not found formtted
hive> desc formatted psn_2;
OK
# col_name            	data_type           	comment             
	 	 
id                  	int                 	                    
name                	string              	                    
likes               	array<string>       	                    
address             	map<string,string>  	                    
	 	 
# Partition Information	 	 
# col_name            	data_type           	comment             
	 	 
age                 	int                 	                    
	 	 
# Detailed Table Information	 	 
Database:           	default             	 
Owner:              	root                	 
CreateTime:         	Sat Apr 25 23:45:22 CST 2020	 
LastAccessTime:     	UNKNOWN             	 
Protect Mode:       	None                	 
Retention:          	0                   	 
Location:           	hdfs://mycluster/user/hive/warehouse/psn_2	 
Table Type:         	MANAGED_TABLE       	 
Table Parameters:	 	 
	transient_lastDdlTime	1587829522          
	 	 
# Storage Information	 	 
SerDe Library:      	org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe	 
InputFormat:        	org.apache.hadoop.mapred.TextInputFormat	 
OutputFormat:       	org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat	 
Compressed:         	No                  	 
Num Buckets:        	-1                  	 
Bucket Columns:     	[]                  	 
Sort Columns:       	[]                  	 
Storage Desc Params:	 	 
	colelction.delim    	-                   
	field.delim         	,                   
	mapkey.delim        	:                   
	serialization.format	,                   
Time taken: 0.682 seconds, Fetched: 37 row(s)
hive> LOAD DATA LOCAL INPATH '/root/data/data' INTO TABLE psn_2 PARTITION(age=10);
Loading data to table default.psn_2 partition (age=10)
Partition default.psn_2{age=10} stats: [numFiles=1, numRows=0, totalSize=419, rawDataSize=0]
OK
Time taken: 16.219 seconds
hive> LOAD DATA LOCAL INPATH '/root/data/data' INTO TABLE psn_2 PARTITION(age=20);
Loading data to table default.psn_2 partition (age=20)
Partition default.psn_2{age=20} stats: [numFiles=1, numRows=0, totalSize=419, rawDataSize=0]
OK
Time taken: 1.096 seconds
hive> select * from psn_2;
OK
1	小明1	["lol","book","movie"]	{"beijing":"shangxuetang","shanghai":"pudong"}	10
2	小明2	["lol","book","movie"]	{"beijing":"shangxuetang","shanghai":"pudong"}	10
3	小明3	["lol","book","movie"]	{"beijing":"shangxuetang","shanghai":"pudong"}	10
4	小明4	["lol","book","movie"]	{"beijing":"shangxuetang","shanghai":"pudong"}	10
5	小明5	["lol","movie"]	{"beijing":"shangxuetang","shanghai":"pudong"}	10
6	小明6	["lol","movie"]	{"beijing":"shangxuetang","shanghai":"pudong"}	10
7	小明7	["lol","movie"]	{"beijing":"shangxuetang","shanghai":"pudong"}	10
1	小明1	["lol","book","movie"]	{"beijing":"shangxuetang","shanghai":"pudong"}	20
2	小明2	["lol","book","movie"]	{"beijing":"shangxuetang","shanghai":"pudong"}	20
3	小明3	["lol","book","movie"]	{"beijing":"shangxuetang","shanghai":"pudong"}	20
4	小明4	["lol","book","movie"]	{"beijing":"shangxuetang","shanghai":"pudong"}	20
5	小明5	["lol","movie"]	{"beijing":"shangxuetang","shanghai":"pudong"}	20
6	小明6	["lol","movie"]	{"beijing":"shangxuetang","shanghai":"pudong"}	20
7	小明7	["lol","movie"]	{"beijing":"shangxuetang","shanghai":"pudong"}	20
Time taken: 0.656 seconds, Fetched: 14 row(s)

注意分割槽的欄位不需要在table中定義出來,

這樣的分割槽是在表的檔案下面建立分割槽資料夾,資料夾的名稱為分割槽鍵的值,資料夾下面則是資料內容。

多分割槽

hive> CREATE TABLE psn_3(
    > id int,
    > name string,
    > likes array<string>,
    > address map<string,string>
    > )
    > PARTITIONED BY (age int,sex string)
    > ROW FORMAT DELIMITED
    > FIELDS TERMINATED BY ','
    > COLLECTION ITEMS TERMINATED BY '-'
    > MAP KEYS TERMINATED BY ':';
OK
Time taken: 20.107 seconds
hive> LOAD DATA LOCAL INPATH '/root/data/data' INTO TABLE psn_2 PARTITION(age=10,sex='man');

相同在hdfs的資料夾 psn_3/age=?/sex='?'/這樣的組合資料夾。

檢視data的資料時候:

[root@hadoopNode02 mgs]# hdfs dfs -cat /user/hive/warehouse/psn_2/age=10/data
1,小明1,lol-book-movie,beijing:shangxuetang-shanghai:pudong
2,小明2,lol-book-movie,beijing:shangxuetang-shanghai:pudong
3,小明3,lol-book-movie,beijing:shangxuetang-shanghai:pudong
4,小明4,lol-book-movie,beijing:shangxuetang-shanghai:pudong
5,小明5,lol-movie,beijing:shangxuetang-shanghai:pudong
6,小明6,lol-movie,beijing:shangxuetang-shanghai:pudong
7,小明7,lol-movie,beijing:shangxuetang-shanghai:pudong

age =10 的欄位是資料夾帶人給讀入的

新增分割槽

新增分割槽是不按照順序的

ALTER TABLE table_name ADD [IF NOT EXISTS] PARTITION partition_spec [LOCATION 'location'][, PARTITION partition_spec [LOCATION 'location'], ...];
 
partition_spec:
  : (partition_column = partition_col_value, partition_column = partition_col_value, ...)

例如:

ALTEL TABLE table_name add PARTITION([這裡新增進去的分割槽名稱,定義好的分割槽所有欄位])
ALTER TABLE page_view ADD PARTITION (dt='2008-08-08', country='us') location '/path/to/us/part080808'
                          PARTITION (dt='2008-08-09', country='us') location '/path/to/us/part080809';

刪除分割槽

·新增的時候必須一次新增必須是指定分割槽的所有欄位,但是刪除可以分開刪除欄位,

ALTER TABLE table_name drop [IF NOT EXISTS] PARTITION partition_spec [LOCATION 'location'][, PARTITION partition_spec [LOCATION 'location'], ...];