HIVE分割槽
阿新 • • 發佈:2020-08-04
HIVE分割槽
簡單分割槽
hive> CREATE TABLE psn_2( > id int, > name string, > likes array<string>, > address map<string,string> > ) > PARTITIONED BY (age int) > ROW FORMAT DELIMITED > FIELDS TERMINATED BY ',' > COLLECTION ITEMS TERMINATED BY '-' > MAP KEYS TERMINATED BY ':'; OK Time taken: 20.107 seconds hive> desc formtted psn_2; FAILED: SemanticException [Error 10001]: Table not found formtted hive> desc formatted psn_2; OK # col_name data_type comment id int name string likes array<string> address map<string,string> # Partition Information # col_name data_type comment age int # Detailed Table Information Database: default Owner: root CreateTime: Sat Apr 25 23:45:22 CST 2020 LastAccessTime: UNKNOWN Protect Mode: None Retention: 0 Location: hdfs://mycluster/user/hive/warehouse/psn_2 Table Type: MANAGED_TABLE Table Parameters: transient_lastDdlTime 1587829522 # Storage Information SerDe Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe InputFormat: org.apache.hadoop.mapred.TextInputFormat OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat Compressed: No Num Buckets: -1 Bucket Columns: [] Sort Columns: [] Storage Desc Params: colelction.delim - field.delim , mapkey.delim : serialization.format , Time taken: 0.682 seconds, Fetched: 37 row(s) hive> LOAD DATA LOCAL INPATH '/root/data/data' INTO TABLE psn_2 PARTITION(age=10); Loading data to table default.psn_2 partition (age=10) Partition default.psn_2{age=10} stats: [numFiles=1, numRows=0, totalSize=419, rawDataSize=0] OK Time taken: 16.219 seconds hive> LOAD DATA LOCAL INPATH '/root/data/data' INTO TABLE psn_2 PARTITION(age=20); Loading data to table default.psn_2 partition (age=20) Partition default.psn_2{age=20} stats: [numFiles=1, numRows=0, totalSize=419, rawDataSize=0] OK Time taken: 1.096 seconds hive> select * from psn_2; OK 1 小明1 ["lol","book","movie"] {"beijing":"shangxuetang","shanghai":"pudong"} 10 2 小明2 ["lol","book","movie"] {"beijing":"shangxuetang","shanghai":"pudong"} 10 3 小明3 ["lol","book","movie"] {"beijing":"shangxuetang","shanghai":"pudong"} 10 4 小明4 ["lol","book","movie"] {"beijing":"shangxuetang","shanghai":"pudong"} 10 5 小明5 ["lol","movie"] {"beijing":"shangxuetang","shanghai":"pudong"} 10 6 小明6 ["lol","movie"] {"beijing":"shangxuetang","shanghai":"pudong"} 10 7 小明7 ["lol","movie"] {"beijing":"shangxuetang","shanghai":"pudong"} 10 1 小明1 ["lol","book","movie"] {"beijing":"shangxuetang","shanghai":"pudong"} 20 2 小明2 ["lol","book","movie"] {"beijing":"shangxuetang","shanghai":"pudong"} 20 3 小明3 ["lol","book","movie"] {"beijing":"shangxuetang","shanghai":"pudong"} 20 4 小明4 ["lol","book","movie"] {"beijing":"shangxuetang","shanghai":"pudong"} 20 5 小明5 ["lol","movie"] {"beijing":"shangxuetang","shanghai":"pudong"} 20 6 小明6 ["lol","movie"] {"beijing":"shangxuetang","shanghai":"pudong"} 20 7 小明7 ["lol","movie"] {"beijing":"shangxuetang","shanghai":"pudong"} 20 Time taken: 0.656 seconds, Fetched: 14 row(s)
注意分割槽的欄位不需要在table中定義出來,
這樣的分割槽是在表的檔案下面建立分割槽資料夾,資料夾的名稱為分割槽鍵的值,資料夾下面則是資料內容。
多分割槽
hive> CREATE TABLE psn_3( > id int, > name string, > likes array<string>, > address map<string,string> > ) > PARTITIONED BY (age int,sex string) > ROW FORMAT DELIMITED > FIELDS TERMINATED BY ',' > COLLECTION ITEMS TERMINATED BY '-' > MAP KEYS TERMINATED BY ':'; OK Time taken: 20.107 seconds hive> LOAD DATA LOCAL INPATH '/root/data/data' INTO TABLE psn_2 PARTITION(age=10,sex='man');
相同在hdfs的資料夾 psn_3/age=?/sex='?'/這樣的組合資料夾。
檢視data的資料時候:
[root@hadoopNode02 mgs]# hdfs dfs -cat /user/hive/warehouse/psn_2/age=10/data 1,小明1,lol-book-movie,beijing:shangxuetang-shanghai:pudong 2,小明2,lol-book-movie,beijing:shangxuetang-shanghai:pudong 3,小明3,lol-book-movie,beijing:shangxuetang-shanghai:pudong 4,小明4,lol-book-movie,beijing:shangxuetang-shanghai:pudong 5,小明5,lol-movie,beijing:shangxuetang-shanghai:pudong 6,小明6,lol-movie,beijing:shangxuetang-shanghai:pudong 7,小明7,lol-movie,beijing:shangxuetang-shanghai:pudong
age =10 的欄位是資料夾帶人給讀入的
新增分割槽
新增分割槽是不按照順序的
ALTER TABLE table_name ADD [IF NOT EXISTS] PARTITION partition_spec [LOCATION 'location'][, PARTITION partition_spec [LOCATION 'location'], ...];
partition_spec:
: (partition_column = partition_col_value, partition_column = partition_col_value, ...)
例如:
ALTEL TABLE table_name add PARTITION([這裡新增進去的分割槽名稱,定義好的分割槽所有欄位])
ALTER TABLE page_view ADD PARTITION (dt='2008-08-08', country='us') location '/path/to/us/part080808'
PARTITION (dt='2008-08-09', country='us') location '/path/to/us/part080809';
刪除分割槽
·新增的時候必須一次新增必須是指定分割槽的所有欄位,但是刪除可以分開刪除欄位,
ALTER TABLE table_name drop [IF NOT EXISTS] PARTITION partition_spec [LOCATION 'location'][, PARTITION partition_spec [LOCATION 'location'], ...];