Hive中表的資料匯入(五種方式)
阿新 • • 發佈:2018-12-04
目錄
總結:
hive中一共有五種資料匯出的方式:
①:load data方式,如果路徑是local是追加,若為HDFS則為是覆蓋
②:insert into [ value() , select ]
③:as , like ;as會獲得資料,like只會獲得表結構
④:location:先網HDFS上上傳資料,在把該檔案所在的資料夾的路徑通過location的方式指定給表
⑤:import:匯入export出來的資料
load:
> load data [local] inpath '/opt/module/datas/student.txt' [overwrite] into table student [partition (partcol1=val1,…)]; (1)load data:表示載入資料 (2)local:表示從本地載入資料到hive表;否則從HDFS載入資料到hive表 (3)inpath:表示載入資料的路徑 (4)overwrite:表示覆蓋表中已有資料,否則表示追加 (5)into table:表示載入到哪張表 (6)student:表示具體的表 (7)partition:表示上傳到指定分割槽
1,建立一張表,並從本地向表中裝載資料 create table if not exists stu4(id int,name string) row format delimited fields terminated by '\t'; > select * from stu4; +----------+------------+--+ | stu4.id | stu4.name | +----------+------------+--+ +----------+------------+--+ > load data local inpath '/opt/module/hive/stu.txt' into table stu4; > select * from stu4; +----------+------------+--+ | stu4.id | stu4.name | +----------+------------+--+ | 1001 | zhangfei | | 1002 | liubei | | 1003 | guanyu | | 1004 | zhaoyun | | 1005 | caocao | | 1006 | zhouyu | +----------+------------+--+ 2,建立一張表,並從HDFS上向表中裝載資料: create table if not exists stu5(id int,name string) row format delimited fields terminated by '\t'; > !sh hadoop fs -put /opt/module/hive/stu.txt /stu.txt > select * from stu5; +----------+------------+--+ | stu5.id | stu5.name | +----------+------------+--+ +----------+------------+--+ > load data inpath '/stu.txt' into table stu5; > select * from stu5; +----------+------------+--+ | stu5.id | stu5.name | +----------+------------+--+ | 1001 | zhangfei | | 1002 | liubei | | 1003 | guanyu | | 1004 | zhaoyun | | 1005 | caocao | | 1006 | zhouyu | +----------+------------+--+ 3,載入資料覆蓋表中已有的資料: > select * from stu5; +----------+------------+--+ | stu5.id | stu5.name | +----------+------------+--+ | 1001 | zhangfei | | 1002 | liubei | | 1003 | guanyu | | 1004 | zhaoyun | | 1005 | caocao | | 1006 | zhouyu | +----------+------------+--+ > load data local inpath '/opt/module/hive/stu2.txt' overwrite into table stu5; > select * from stu5; +----------+------------+--+ | stu5.id | stu5.name | +----------+------------+--+ | 1001 | zhangfei | | 1002 | liubei | | 1003 | guanyu | +----------+------------+--+
insert:
1,建立一張分割槽表:insert進一些資料
> create table stu6(id int,name string)
partitioned by (month string)
row format delimited
fields terminated by '\t';
> insert into table stu6 partition(month = '12') values(1001,'zhangfei'),(1002,'liubei');
0: jdbc:hive2://hadoop108:10000> select * from stu6;
+----------+------------+-------------+--+
| stu6.id | stu6.name | stu6.month |
+----------+------------+-------------+--+
| 1001 | zhangfei | 12 |
| 1002 | liubei | 12 |
+----------+------------+-------------+--+
2,根據select的內容插入資料:
0: jdbc:hive2://hadoop108:10000> select * from stu6;
+----------+------------+-------------+--+
| stu6.id | stu6.name | stu6.month |
+----------+------------+-------------+--+
| 1001 | zhangfei | 12 |
| 1002 | liubei | 12 |
+----------+------------+-------------+--+
> insert overwrite table stu6 partition(month = '12') select id,name from stu_par1 where month = '12';
0: jdbc:hive2://hadoop108:10000> select * from stu6;
+----------+------------+-------------+--+
| stu6.id | stu6.name | stu6.month |
+----------+------------+-------------+--+
| 1001 | zhangfei | 12 |
| 1002 | liubei | 12 |
| 1003 | guanyu | 12 |
| 1004 | zhaoyun | 12 |
| 1005 | caocao | 12 |
| 1006 | zhouyu | 12 |
+----------+------------+-------------+--+
overwrite 對原來的資料進行了覆蓋:
3,多表插入模式:
from stu_par1
insert overwrite table stu6 partition(month = '11')
select id,name where month = '11'
insert overwrite table stu6 partition(month = '10')
select id,name where month = '10';
0: jdbc:hive2://hadoop108:10000> select * from stu6;
+----------+------------+-------------+--+
| stu6.id | stu6.name | stu6.month |
+----------+------------+-------------+--+
| 1001 | zhangfei | 10 |
| 1002 | liubei | 10 |
| 1003 | guanyu | 10 |
| 1004 | zhaoyun | 10 |
| 1005 | caocao | 10 |
| 1006 | zhouyu | 10 |
| 1001 | zhangfei | 11 |
| 1002 | liubei | 11 |
| 1003 | guanyu | 11 |
| 1004 | zhaoyun | 11 |
| 1005 | caocao | 11 |
| 1006 | zhouyu | 11 |
| 1001 | zhangfei | 12 |
| 1002 | liubei | 12 |
| 1003 | guanyu | 12 |
| 1004 | zhaoyun | 12 |
| 1005 | caocao | 12 |
| 1006 | zhouyu | 12 |
+----------+------------+-------------+--+
建立表並載入資料(As Select):
create table if not exists stu7
as select id,name from stu1;
0: jdbc:hive2://hadoop108:10000> select * from stu7;
+----------+------------+--+
| stu7.id | stu7.name |
+----------+------------+--+
| 1001 | zhangfei |
| 1002 | liubei |
| 1003 | guanyu |
| 1004 | zhaoyun |
| 1005 | caocao |
| 1006 | zhouyu |
+----------+------------+--+
6 rows selected (0.149 seconds)
location:
1,HDFS的路徑上有如下的內容:/ex 該目錄下有stu.txt檔案
create external table stu_ex2(id int,name string)
row format delimited
fields terminated by '\t'
location '/ex';
0: jdbc:hive2://hadoop108:10000> select * from stu_ex2;
+-------------+---------------+--+
| stu_ex2.id | stu_ex2.name |
+-------------+---------------+--+
| 1001 | zhangfei |
| 1002 | liubei |
| 1003 | guanyu |
| 1004 | zhaoyun |
| 1005 | caocao |
| 1006 | zhouyu |
+-------------+---------------+--+
6 rows selected (0.093 seconds)
import:
import匯入的 資料必須是export匯出的資料:
1,將資料匯出到HDFS上:
export table stu1 to '/export/data/stu1'
0: jdbc:hive2://hadoop108:10000> !sh hadoop fs -ls /export/data/stu1
Found 2 items
-rwxr-xr-x 3 isea supergroup 1329 2018-12-01 19:38 /export/data/stu1/_metadata
drwxr-xr-x - isea supergroup 0 2018-12-01 19:38 /export/data/stu1/data
發現stu1目錄下多了兩個檔案,資料儲存在data中
2,將HDFS上的資料匯入到stu8;
0: jdbc:hive2://hadoop108:10000> show tables;
+------------------------+--+
| tab_name |
+------------------------+--+
| stu1 |
| stu2 |
| stu3 |
| stu4 |
| stu5 |
| stu6 |
| stu7 |
| stu_ex1 |
| stu_ex2 |
| stu_par1 |
| stu_par2 |
| values__tmp__table__1 |
+------------------------+--+
0: jdbc:hive2://hadoop108:10000> import table stu8 from '/export/data/stu1';
0: jdbc:hive2://hadoop108:10000> select * from stu8;
+----------+------------+--+
| stu8.id | stu8.name |
+----------+------------+--+
| 1001 | zhangfei |
| 1002 | liubei |
| 1003 | guanyu |
| 1004 | zhaoyun |
| 1005 | caocao |
| 1006 | zhouyu |
+----------+------------+--+
總結:
hive中一共有五種資料匯出的方式:
①:load data方式,如果路徑是local是追加,若為HDFS則為是覆蓋
②:insert into [ value() , select ]
③:as , like ;as會獲得資料,like只會獲得表結構
④:location:先網HDFS上上傳資料,在把該檔案所在的資料夾的路徑通過location的方式指定給表
⑤:import:匯入export出來的資料 import table table_name from 'hdfs路徑';