1. 程式人生 > >Hive 簡單SQL

Hive 簡單SQL

使用 span cat dfs ble jdb store esc oca

1.創建表

(1)內部表和外部表的區別

默認創建的是內部表,可以指定目錄,如果不指定則會創建默認目錄,一旦drop,該目錄和數據都會被刪除

創建external table 的時候需要指定存放目錄,並且drop表的時候,不會刪除該目錄和目錄下的數據,只會刪除元信息


#創建一個外部表

0: jdbc:hive2://192.168.163.102:10000> create external table t10(c1 int,c2 string) row format delimited fields terminated by ',' stored as testfile location "/dir1";


[root@Darren2 tmp]# hdfs dfs -put file1 /dir1

[root@Darren2 tmp]# hdfs dfs -ls -R /dir1

-rw-r--r-- 1 root supergroup 24 2017-11-25 20:53 /dir1/file1


0: jdbc:hive2://192.168.163.102:10000> drop table t10;

No rows affected (0.41 seconds)


[root@Darren2 tmp]# hdfs dfs -ls -R /dir1

17/11/25 20:56:41 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

-rw-r--r-- 1 root supergroup 24 2017-11-25 20:53 /dir1/file1


#創建一個默認的內部表

0: jdbc:hive2://192.168.163.102:10000> create table t2(c1 int,c2 string) row format delimited fields terminated by ',' stored as textfile;


(2)Hive支持的存儲文件格式

textfile, sequencefile, orc, parquet,avro

0: jdbc:hive2://192.168.163.102:10000> create table t5(c1 int,c2 string) row format delimited fields terminated by ',' stored as sequencefile ;

0: jdbc:hive2://192.168.163.102:10000> insert into t5 select * from t4;


#作為sequencefile格式存儲的文件無法直接查看其內容

[root@Darren2 tmp]# hdfs dfs -ls /user/hive/warehouse/testdb1.db/t5/

-rwxr-xr-x 1 root supergroup 146 2017-11-26 03:03 /user/hive/warehouse/testdb1.db/t5/000000_0

0: jdbc:hive2://192.168.163.102:10000> desc formatted t5;


2.導入數據到hive

語法:

LOAD DATA [LOCAL] INPATH 'filepath' [OVERWRITE] INTO TABLE tablename [PARTITION (partcol1=val1, partcol2=val2 ...)]


(1) 直接把本地的文件導入到hive中的表

0: jdbc:hive2://192.168.163.102:10000> load data local inpath '/tmp/file1' into table t1;

0: jdbc:hive2://192.168.163.102:10000> select * from t1;

+--------+--------+--+

| t1.c1 | t1.c2 |

+--------+--------+--+

| 1 | aaa |

| 2 | bbb |

| 3 | ccc |

| 4 | ddd |

+--------+--------+--+


(2)加載數據到表中,但是會覆蓋表中所有數據,實質是覆蓋t1目錄下的所有文件

0: jdbc:hive2://192.168.163.102:10000> load data local inpath '/tmp/file3' overwrite into table t1;

No rows affected (0.597 seconds)

0: jdbc:hive2://192.168.163.102:10000> select * from t1;

+--------+---------+--+

| t1.c1 | t1.c2 |

+--------+---------+--+

| 1 | yiyi |

| 2 | erer |

| 3 | sansan |

| 4 | sisi |

+--------+---------+--+

4 rows selected (0.073 seconds)


(3)把hdfs上的文件導入到hive中的表

[root@Darren2 tmp]# cat /tmp/file2

5,eee


[root@Darren2 tmp]# hdfs dfs -put /tmp/file2 /user/hive/warehouse/testdb1.db/t1

0: jdbc:hive2://192.168.163.102:10000> load data inpath '/user/hive/warehouse/testdb1.db/t1/file2' into table t1;

0: jdbc:hive2://192.168.163.102:10000> select * from t1;

+--------+--------+--+

| t1.c1 | t1.c2 |

+--------+--------+--+

| 1 | aaa |

| 2 | bbb |

| 3 | ccc |

| 4 | ddd |

| 5 | eee |

+--------+--------+--+


(4)根據一個表創建另一個表,同時插入數據

0: jdbc:hive2://192.168.163.102:10000> create table t2 as select * from t1;


(5)根據一個表先創建表結構,後插入數據

0: jdbc:hive2://192.168.163.102:10000> create table t3 like t1;

0: jdbc:hive2://192.168.163.102:10000> insert into t3 select * from t1;


3,從查詢結果導數據到文件系統中

(1)從查詢結果導數據到HDFS文件系統中

0: jdbc:hive2://192.168.163.102:10000> select * from t1;

+--------+---------+--+

| t1.c1 | t1.c2 |

+--------+---------+--+

| 1 | yiyi |

| 2 | erer |

| 3 | sansan |

| 4 | sisi |

+--------+---------+--+


0: jdbc:hive2://192.168.163.102:10000> insert overwrite directory '/user/hive/warehouse/tmp' select * from testdb1.t1;

[root@Darren2 tmp]# hdfs dfs -ls -R /user/hive/warehouse/tmp

-rwxr-xr-x 1 root supergroup 30 2017-11-26 00:25 /user/hive/warehouse/tmp/000000_0

[root@Darren2 tmp]# hdfs dfs -get /user/hive/warehouse/tmp/000000_0 /tmp/


導出的文件的分隔符對應的ASCII碼是Ctrl+a 即\001

[root@Darren2 tmp]# vim /tmp/000000_0

1^Ayiyi

2^Aerer

3^Asansan

4^Asisi


利用這個文件創建一個外部表,使用\001為分隔符

0: jdbc:hive2://192.168.163.102:10000> create external table t5(c1 int,c2 string) row format delimited fields terminated by '\001' location '/user/hive/warehouse/tmp/';

0: jdbc:hive2://192.168.163.102:10000> select * from t5;

+--------+---------+--+

| t5.c1 | t5.c2 |

+--------+---------+--+

| 1 | yiyi |

| 2 | erer |

| 3 | sansan |

| 4 | sisi |

+--------+---------+--+


(2)從查詢結果導數據到本地

0: jdbc:hive2://192.168.163.102:10000> insert overwrite local directory '/tmp' select * from testdb1.t1;

[root@Darren2 tmp]# ls /tmp/000000_0

/tmp/000000_0


4 insert

(1) insert 插入數據的實質是建立一個文件

0: jdbc:hive2://192.168.163.102:10000> insert into t5 values(4,'sisi');

No rows affected (17.987 seconds)

0: jdbc:hive2://192.168.163.102:10000> dfs -ls /user/hive/warehouse/testdb1.db/t5 ;

+----------------------------------------------------------------------------------------------------------------+--+

| DFS Output |

+----------------------------------------------------------------------------------------------------------------+--+

| Found 2 items |

| -rwxr-xr-x 1 root supergroup 146 2017-11-26 03:03 /user/hive/warehouse/testdb1.db/t5/000000_0 |

| -rwxr-xr-x 1 root supergroup 106 2017-11-26 04:22 /user/hive/warehouse/testdb1.db/t5/000000_0_copy_1 |

+----------------------------------------------------------------------------------------------------------------+--+


Hive 簡單SQL