hive分隔符_HIVE-預設分隔符的(linux系統的特殊字元)檢視,輸入和修改
#修改分隔符為逗號 ,
ALTER TABLE table_name SET SERDEPROPERTIES ('field.delim' = ',' , 'serialization.format'=',');
#修改分隔符為\001,在linux的vim中顯示為^A,是hive預設的分隔符
ALTER TABLE table_name SET SERDEPROPERTIES ('field.delim' = '\001' , 'serialization.format'='\001');
#修改分隔符為製表符\t
ALTER TABLE table_name SET SERDEPROPERTIES ('field.delim' = '\t' , 'serialization.format'='\t');
重點知識:
field.delim 指定表的兩個列欄位之間的檔案中的欄位分隔符.
serialization.format 指定資料檔案序列化時表中兩個列欄位之間的檔案中的欄位分隔符.
對於分割槽表,每個分割槽可以有不同的分隔符屬性
alter語法修改分割槽表的分隔符後,不會影響已有分割槽資料讀寫,只會對後續新寫入的資料生效。這一點非常友好
alter語法修改分隔符只針對於後續新增資料有效,拿分割槽表而言,比如現在有2個分割槽,day=2020-05-01,day=2020-05-02,分隔符是\t, 通過alter把分隔符改為\001,再寫入寫的分割槽day=2020-05-03
可以通過desc formatted tablename partition(key=value)語法檢視每個分割槽的分隔符,那麼2020-05-01,2020-05-02的分割槽依然是\t分隔符,2020-05-03分割槽的分隔符是\001;而且可以通過hive正常讀寫操作這三個分割槽而不會出現任何問題
通過desc formatted table檢視該表的分隔符,發現已經變為\001
sqoop中的--fields-terminated-by 引數指定分隔符發生變化後,必須同時通過上面結論中的語法修改對應表 field.delim,serialization.format二者的值。
sqoop中--fields-terminated-by 指定\01或者\001,效果是一樣的,對應到hive的 field.delim,serialization.format,都是\001
hive的預設分割符是\001,在desc formatted 下看到的值為\u0001 ,不要寫成其他的\01,\0001
操作:
1.建一張分割槽表,指定分隔符為\t
CREATE TABLE `tmp.test0506_sqoop`(
`id` bigint,
`seq_no` string,
`name` string,
`e_type` string,
`status` string)
PARTITIONED BY (`day` string) row format delimited fields terminated by '\t'
LINES TERMINATED BY '\n' STORED AS textfile;
2.通過sqoop導資料進來,指定分隔符為\t
sqoop import \
--mapreduce-job-name sqoop_table_xxx \
--hive-drop-import-delims \
--connect "${datasource_connect}" \
--username ${datasource_username} \
--password '${datasource_password}' \
--hive-overwrite \
--hive-import \
--split-by id \
--boundary-query 'select min(id),max(id) from xxx' \
--hive-table tmp.test0506_sqoop \
--query 'select id,seq_no,name,e_type,status from xxx where $CONDITIONS' \
--target-dir /tmp/sqoop_test0506_sqoop_`date +%s` \
--fields-terminated-by '\t' \
--hive-partition-key day \
--hive-partition-value '2020-05-01'
3.通過alter語法修改表的分隔符為\001
ALTER TABLE tmp.test0506_sqoop SET SERDEPROPERTIES ('field.delim' = '\001' , 'serialization.format'='\001');
4.繼續用sqoop導資料,指定分隔符為\001
sqoop import \
--mapreduce-job-name sqoop_table_xxx \
--hive-drop-import-delims \
--connect "${datasource_connect}" \
--username ${datasource_username} \
--password '${datasource_password}' \
--hive-overwrite \
--hive-import \
--split-by id \
--boundary-query 'select min(id),max(id) from xxx' \
--hive-table tmp.test0506_sqoop \
--query 'select id,seq_no,name,e_type,status from xxx where $CONDITIONS' \
--target-dir /tmp/sqoop_test0506_sqoop_`date +%s` \
--fields-terminated-by '\001' \
--hive-partition-key day \
--hive-partition-value '2020-05-02'
5.查看錶,分割槽的分隔符
desc formatted tmp.test0506_sqoop;
| Storage Desc Params:
| field.delim | \u0001
| line.delim | \n
| serialization.format | \u0001
desc formatted tmp.test0506_sqoop partition(day='2020-05-01');
| Storage Desc Params:
| field.delim | \t
| line.delim | \n
| serialization.format | \t
desc formatted tmp.test0506_sqoop partition(day='2020-05-02');
| Storage Desc Params:
| field.delim | \u0001
| line.delim | \n
| serialization.format | \u0001
6.查看錶資料,資料顯示正常
select * from tmp.test0506_sqoop where day='2020-05-01' limit 2;
select * from tmp.test0506_sqoop where day='2020-05-02' limit 2;
原文連結:https://blog.csdn.net/zbz1006572352/article/details/105976059