分割槽表,管理表
阿新 • • 發佈:2019-02-10
建立分割槽表:
create table if not exists china_partition(
ProvinceID int,
ProvinceName string,
CityID int,
CityName string,
ZipCode int,
DistrictID int,
DistrictName string)
partitioned by ( Province string,City string )
row format delimited fields terminated by ','
;
注意:分割槽欄位名和資料欄位名不能相同,不然報錯如下:
hive> create table if not exists china_partition(
> ProvinceID int,
> ProvinceName string,
> CityID int,
> CityName string,
> ZipCode int,
> DistrictID int,
> DistrictName string)
> partitioned by ( ProvinceName string,CityName string )
> row format delimited fields terminated by ','
> ;
FAILED: SemanticException [Error 10035]: Column repeated in partitioning columns
分割槽表載入資料:
load data local inpath '/home/hadoop/china_data/beijing.txt' into table china_partition partition ( Province='beijing',city='beijing');
hdfs檔案系統目錄結構:
也可以使用 show partitions 檢視分割槽:
hive> show partitions china_partition;
使用 hive.mapred.mode(可選值為 strict,nonstrict)
如果 hive.mapred.mode 設定為 strict,查詢分割槽表示 Hql 語句必須有 where 子句,不然會報錯:
hive> select * from china_partition; FAILED: SemanticException Queries against partitioned tables without a partition filter are disabled for safety reasons. If you know what you are doing, please make sure that hive.strict.checks.large.query is set to false and that hive.mapred.mode is not set to 'strict' to enable them. No partition predicate for Alias "china_partition" Table "china_partition"
如果 hive.mapred.mode 設定為 nonstrict,查詢分割槽表可以不帶where子句:
hive> set hive.mapred.mode=nonstrict;
hive> select * from china_partition;
OK
china_partition.provinceid china_partition.provincename china_partition.cityid china_partition.cityname china_partition.zipcode china_partition.districtid china_partition.districtname china_partition.province china_partition.city
1 北京市 1 北京市 100000 1 東城區 beijing beijing
1 北京市 1 北京市 100000 2 西城區 beijing beijing
1 北京市 1 北京市 100000 3 崇文區 beijing beijing
1 北京市 1 北京市 100000 4 宣武區 beijing beijing
1 北京市 1 北京市 100000 5 朝陽區 beijing beijing
1 北京市 1 北京市 100000 6 豐臺區 beijing beijing
1 北京市 1 北京市 100000 7 石景山區 beijing beijing
1 北京市 1 北京市 100000 8 海淀區 beijing beijing
1 北京市 1 北京市 100000 9 門頭溝區 beijing beijing
1 北京市 1 北京市 100000 10 房山區 beijing beijing
1 北京市 1 北京市 100000 11 通州區 beijing beijing
1 北京市 1 北京市 100000 12 順義區 beijing beijing
1 北京市 1 北京市 100000 13 昌平區 beijing beijing
1 北京市 1 北京市 100000 14 大興區 beijing beijing
1 北京市 1 北京市 100000 15 懷柔區 beijing beijing
1 北京市 1 北京市 100000 16 平谷區 beijing beijing
1 北京市 1 北京市 100000 17 密雲縣 beijing beijing
1 北京市 1 北京市 100000 18 延慶縣 beijing beijing
Time taken: 0.125 seconds, Fetched: 18 row(s)
如果分割槽特別多,使用者執行查詢部分分割槽,也是使用:
hive> show partitions china_partition partition (province='beijing');
OK
partition
province=beijing/city=beijing
Time taken: 0.18 seconds, Fetched: 1 row(s)
使用 describe formatted table_name 也可以顯示分割槽資訊:hive> describe formatted china_partition;
OK
col_name data_type comment
# col_name data_type comment
provinceid int
provincename string
cityid int
cityname string
zipcode int
districtid int
districtname string
# Partition Information
# col_name data_type comment
province string
city string
# Detailed Table Information
Database: default
Owner: hadoop
CreateTime: Tue Apr 25 16:05:55 CST 2017
LastAccessTime: UNKNOWN
Retention: 0
Location: hdfs://localhost:9000/user/hive/warehouse/china_partition
Table Type: MANAGED_TABLE
Table Parameters:
transient_lastDdlTime 1493107555
# Storage Information
SerDe Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
InputFormat: org.apache.hadoop.mapred.TextInputFormat
OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
Compressed: No
Num Buckets: -1
Bucket Columns: []
Sort Columns: []
Storage Desc Params:
field.delim ,
serialization.format ,
Time taken: 0.065 seconds, Fetched: 38 row(s)
hive>
hive分割槽表還有很多策略例如 archive,touce,enable no_drop,enable offline 等。