1. 程式人生 > >0010-Hive多分隔符支援示例

0010-Hive多分隔符支援示例

溫馨提示:要看高清無碼套圖,請使用手機開啟並單擊圖片放大檢視。

1.問題描述

如何將多個字元作為欄位分割符的資料檔案載入到Hive表中,事例資料如下:

欄位分隔符為“@#$”

[email protected]#[email protected]#$test2value
[email protected]#[email protected]#$test2value
[email protected]#[email protected]#$test4value

如何將上述事例資料載入到Hive表(multi_delimiter_test)中,表結構如下:

欄位名 欄位型別
s1 String
s2 String
s3 String

2.Hive多分隔符支援

Hive在0.14及以後版本支援欄位的多分隔符,參考https://cwiki.apache.org/confluence/display/Hive/MultiDelimitSerDe

3.實現方式

  • 測試環境說明

    測試環境為CDH5.11.1Hive版本為1.1.0作業系統為RedHat6.5

  • 操作步驟

1.準備多分隔符檔案並裝載到HDFS對應目錄

[[email protected]  ~]$ cat multi_delimiter_test.dat

[email protected]#[email protected]#$test2value

[email protected]#[email protected]#$test2value

[email protected]#[email protected]#$test4value  

[[email protected]  ~]$ hadoop dfs -put multi_delimiter_test.dat /fayson/multi_delimiter_test

[
[email protected]
~]$ hadoop dfs -ls /fayson/multi_delimiter_test DEPRECATED: Use of this script to execute hdfs command is deprecated. Instead use the hdfs command for it. Found 1 items -rw-r--r-- 3 user_r supergroup 93 2017-08-23 03:24 /fayson/multi_delimiter_test/multi_delimiter_test.dat [[email protected] ~]$

2.基於準備好的多分隔符檔案建表

create  external table multi_delimiter_test(

s1 string,

s2 string,

s3 string)

ROW FORMAT  SERDE 'org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe' WITH  SERDEPROPERTIES ("field.delim"="@#$")

stored as  textfile location '/fayson/multi_delimiter_test';

0010-Hive多分隔符支援示例

3.測試

2: jdbc:hive2://localhost:10000/default>  select * from multi_delimiter_test;
+--------------------------+--------------------------+--------------------------+--+
|  multi_delimiter_test.s1  |  multi_delimiter_test.s2  |  multi_delimiter_test.s3  |
+--------------------------+--------------------------+--------------------------+--+
| test1                    | test1name                | test2value               |
| test2                    | test2name                | test2value               |
| test3                    | test3name                | test4value               |
+--------------------------+--------------------------+--------------------------+--+

0010-Hive多分隔符支援示例

2:  jdbc:hive2://localhost:10000/default> select count(*) from  multi_delimiter_test;
INFO  : Ended Job = job_1503469952834_0006
INFO  : MapReduce Jobs Launched:
INFO  : Stage-Stage-1: Map: 1  Reduce: 1    Cumulative CPU: 3.25 sec   HDFS  Read: 6755 HDFS Write: 2 SUCCESS
INFO  : Total MapReduce CPU Time Spent: 3 seconds  250 msec
INFO  : Completed executing command(queryId=hive_20170823041818_ce58aae2-e6db-4eed-b6af-652235a6e66a);  Time taken: 33.286 seconds
INFO  : OK
+------+--+
| _c0  |
+------+--+
| 3    |
+------+--+
1 row selected (33.679  seconds)
2:  jdbc:hive2://localhost:10000/def

0010-Hive多分隔符支援示例

4.常見問題

1.執行count查詢時報錯

  • 異常日誌

通過beeline執行count查詢時報錯

2:  jdbc:hive2://localhost:10000/default> select count(*) from  multi_delimiter_test;

INFO  : Compiling  command(queryId=hive_20170823035959_f1b11a9b-757d-4d9b-b8a7-6d4ab1c00a97):  select count(*) from multi_delimiter_test

INFO  : Semantic Analysis Completed

INFO  : Returning Hive schema:  Schema(fieldSchemas:[FieldSchema(name:_c0, type:bigint, comment:null)],  properties:null)

INFO  : Completed compiling  command(queryId=hive_20170823035959_f1b11a9b-757d-4d9b-b8a7-6d4ab1c00a97);  Time taken: 0.291 seconds

INFO  : Executing  command(queryId=hive_20170823035959_f1b11a9b-757d-4d9b-b8a7-6d4ab1c00a97):  select count(*) from multi_delimiter_test

INFO  : Query ID =  hive_20170823035959_f1b11a9b-757d-4d9b-b8a7-6d4ab1c00a97

INFO  : Total jobs = 1

INFO  : Launching Job 1 out of 1

INFO  : Starting task [Stage-1:MAPRED] in serial  mode

INFO  : Number of reduce tasks determined at  compile time: 1

INFO  : In order to change the average load for a  reducer (in bytes):

INFO  :    set hive.exec.reducers.bytes.per.reducer=<number>

INFO  : In order to limit the maximum number of  reducers:

INFO  :    set hive.exec.reducers.max=<number>

INFO  : In order to set a constant number of  reducers:

INFO  :    set mapreduce.job.reduces=<number>

INFO  : number of splits:1

INFO  : Submitting tokens for job:  job_1503469952834_0002

INFO  : Kind: HDFS_DELEGATION_TOKEN, Service:  ha-hdfs:nameservice1, Ident: (token for hive: HDFS_DELEGATION_TOKEN  owner=hive/[email protected],  renewer=yarn, realUser=, issueDate=1503475160778, maxDate=1504079960778,  sequenceNumber=27, masterKeyId=9)

INFO  : The url to track the job:  http://ip-172-31-9-186.ap-southeast-1.compute.internal:8088/proxy/application_1503469952834_0002/

INFO  : Starting Job = job_1503469952834_0002,  Tracking URL =  http://ip-172-31-9-186.ap-southeast-1.compute.internal:8088/proxy/application_1503469952834_0002/

INFO  : Kill Command =  /opt/cloudera/parcels/CDH-5.10.2-1.cdh5.10.2.p0.5/lib/hadoop/bin/hadoop  job  -kill job_1503469952834_0002

INFO  : Hadoop job information for Stage-1:  number of mappers: 1; number of reducers: 1

INFO  : 2017-08-23 03:59:32,039 Stage-1 map =  0%,  reduce = 0%

INFO  : 2017-08-23 04:00:08,106 Stage-1 map =  100%,  reduce = 100%

ERROR : Ended Job =  job_1503469952834_0002 with errors

ERROR : FAILED: Execution  Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask

INFO  : MapReduce Jobs Launched:

INFO  : Stage-Stage-1: Map: 1  Reduce: 1    HDFS Read: 0 HDFS Write: 0 FAIL

INFO  : Total MapReduce CPU Time Spent: 0 msec

INFO  : Completed executing  command(queryId=hive_20170823035959_f1b11a9b-757d-4d9b-b8a7-6d4ab1c00a97);  Time taken: 48.737 seconds

Error: Error while  processing statement: FAILED: Execution Error, return code 2 from  org.apache.hadoop.hive.ql.exec.mr.MapRedTask (state=08S01,code=2)

使用Hive的shell操作報錯如下

Error:  java.lang.RuntimeException: Error in configuring object
        at  org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
        at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
        at  org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
        at  org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:449)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
        at  org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
        at  java.security.AccessController.doPrivileged(Native Method)
        at  javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920)
        at  org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by:  java.lang.reflect.InvocationTargetException
        at  sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at  sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at  sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at  java.lang.reflect.Method.invoke(Method.java:606)
        at  org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
        ... 9 more
Caused by:  java.lang.RuntimeException: Error in configuring object
        at  org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
        at  org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
        at  org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
        at  org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:38)
        ... 14 more
Caused by:  java.lang.reflect.InvocationTargetException
        at  sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at  sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at  sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at  java.lang.reflect.Method.invoke(Method.java:606)
        at  org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
        ... 17 more
Caused by:  java.lang.RuntimeException: Map operator initialization failed
        at  org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:147)
        ... 22 more
Caused by:  org.apache.hadoop.hive.ql.metadata.HiveException:  java.lang.ClassNotFoundException: Class  org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe not found
        at  org.apache.hadoop.hive.ql.exec.MapOperator.getConvertedOI(MapOperator.java:323)
        at  org.apache.hadoop.hive.ql.exec.MapOperator.setChildren(MapOperator.java:333)
        at  org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:116)
        ... 22 more
Caused by:  java.lang.ClassNotFoundException: Class  org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe not found
        at  org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2105)
        at  org.apache.hadoop.hive.ql.plan.PartitionDesc.getDeserializer(PartitionDesc.java:140)
        at  org.apache.hadoop.hive.ql.exec.MapOperator.getConvertedOI(MapOperator.java:297)
        ... 24 more
FAILED: Execution Error,  return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
MapReduce Jobs Launched:
Stage-Stage-1: Map:  1  Reduce: 1   HDFS Read: 0 HDFS Write: 0 FAIL
Total MapReduce CPU Time  Spent: 0 ms
  • 問題原因分析

org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe類是hive-contrib.jar包裡。

在執行非聚合類操作查詢時,sql能正常執行,在進行聚合類函式操作時報錯,說明在執行MapReduce任務時缺少jar依賴包;MapReduce屬於yarn作業,所以yarn執行環境缺少hive-contrib.jar的依賴包。

  • 解決方法

在CDH叢集的所有節點一下操作,將hive-contrib-1.1.0-cdh5.10.2.jar包拷貝到yarn的lib目錄下

sudo scp -r  /opt/cloudera/parcels/CDH/lib/hive/lib/hive-contrib-1.1.0-cdh5.10.2.jar  /opt/cloudera/parcels/CDH/lib/hadoop-yarn/lib/ 

重新執行count語句,執行成功

0010-Hive多分隔符支援示例

醉酒鞭名馬,少年多浮誇! 嶺南浣溪沙,嘔吐酒肆下!摯友不肯放,資料玩的花!
溫馨提示:要看高清無碼套圖,請使用手機開啟並單擊圖片放大檢視。
0010-Hive多分隔符支援示例