HIVE入門之資料模型

阿新 • • 發佈：2019-01-24

內部表

內部表(Table)
-與資料庫的Table在概念上類似
-每一個Table在Hive中都有一個相應的目錄(HDFS上的目錄)儲存資料
-所有的Table資料(不包括External Table)都儲存在這個目錄(HDFS目錄)中
-表的元資料資訊，儲存在元資料資料庫中(mysql)
-刪除表後，元資料和資料都會被刪除

建立表--案例:

create table t1
(t1 int, tname string, age int);

在hive中建立一張表，如果不指定表所儲存的位置，那麼這張表會建立在HDFS檔案系統中的/user/hive/warehouse目錄下

create table t2
(tid int, tname string, age int)
location '/mytable/hive/t2';

指定表的位置為HDFS中的/mytable/hive/t2

create table t3
(tid int, tname string, age int)
row format delimited fields terminated by ',';

表示以csv檔案格式儲存,因為csv儲存的分隔符為逗號
//row format 指定表示行的格式

加入資料--案例:

create table t4
as
select * from sample_data;

//採用sample_data查詢的集合來建立t4表
//檢視HDFS中的檔案發現，t4表中資料與資料之間沒有分隔符
這裡我們同樣可以指定分隔符:

create table t4
row format delimited fields terminated by ','
as
select * from sample_data;

//採用sample_data查詢的集合來建立t5表，並以','為分隔符

在一張表上加入新的列---案例:

alter table t1 add columns(english int);

刪除一張表--案例:

drop table t1;

//當刪除一張表時，它會把對應的檔案放入HDFS的回收站中，所以刪除之後
//我們可以利用一定的方式恢復表中的資料

分割槽表

分割槽表(Partition):
(可以提高查詢的效率)
-Partition對應於資料庫Partiton列的密集索引
-在Hive中，表中的一個Partition對應於表下的一個目錄，所有的Partition的資料都儲存在對應的目錄中

建立表--案例

create table partition_table
(sid int, sname string)
partitioned by (gender string)
row format delimited fields terminated by ',';

//建立一張以','分隔，以性別進行分割槽的分割槽表partition_table

insert into table partition_table partition(gender = 'M') select sid,sname from sample_data where gender = 'M';

//將sample_data表中，gender為'M'的行資料，插入到paetition_table表中gender為'M'的分割槽中

insert into table partition_table partition(gender = 'F') select sid,sname from sample_data where gender = 'F';

//將sample_data表中，gender為'F'的行資料，插入到paetition_table表中gender為'F'的分割槽中

外部表

外部表(External Table) -指向已經在HDFS中存在的資料，可以建立Partition -它和內部表在元資料的組織上時相同的，而實際儲存則有極大的差異 -外部表只有一個過程，載入資料和建立表同時完成，並不會移動到資料倉庫目錄中，只會與外部資料建立一個連結，當刪除該表時，僅刪除該連結而不刪除實際的資料

外部表建立--案例

create external table external_student
(sid int, sname string, age int)
row format delimited fields terminate
location '/input';

//建立一個以','為分隔符的外部表，這個外部表與HDFS中/input目錄下的檔案相關聯

桶表

桶表(Bucket Table)
桶表是對資料進行雜湊取值，然後放到不同檔案儲存。也就是說，桶表中的資料，是通過雜湊運算後，將其打散，再存入檔案當中，這樣做會避免造成熱塊，從而提高查詢速度。

桶表建立--案例

create table bucket_table
(sid int, sname string, age int)
clustered by (sname) into 5 buckets;

//建立一個桶表，這個桶表是以sname作為雜湊運算，運算後的結果放到5個桶中

後記：網課筆記

HIVE入門之資料模型

內部表

分割槽表

外部表

桶表

HIVE入門之資料模型

Hive入門之資料型別

（六）Hive SQL之資料型別和儲存格式

Hive-5-Hive SQL之資料型別和儲存格式

Hive中的資料模型

資料倉庫之資料模型

python 入門之 – 資料字典（十八）

HTML5邊玩邊學（8）：俄羅斯方塊就是這麼簡單之資料模型篇

HBase 入門之資料刷寫(Memstore Flush)詳細說明

【第二篇】ASP.NET MVC快速入門之資料註解（MVC5+EF6）

Kotlin入門之資料型別（Int String...）

HBase之資料模型(DataModel)

Cassandra學習筆記之資料模型

Hbase入門(三)——資料模型

大資料數倉之Hive入門《一》

python全棧開發從入門到放棄之socket並發編程之IO模型

hive：資料模型—桶表

油田採油生產業務建模之資料流圖實踐（EA使用入門）

大資料Hive系列之Hive MapReduce

大資料Hive系列之Hive常用SQL

HIVE入門之資料模型

內部表

分割槽表

外部表

桶表

相關推薦