使用hive 對lzo資料分析時的報錯
之前建立map作業 將文字檔案通過combineInputFormat 合併 小檔案並壓縮為lzo檔案 ,作業設定: conf.setInt("mapred.min.split.size", 1); conf.setLong("mapred.max.split.size", 600000000); // 600MB,使得每個壓縮後文件120MB左右 conf.set("mapred.output.compression.codec", "com.hadoop.compression.lzo.LzopCodec"); conf.set("mapred.output.compression.type", "BLOCK"); conf.setBoolean("mapred.output.compress", true); 然後使用hive對 lzo目錄進行分析報: 2014-03-03 17:00:01,494 WARN com.hadoop.compression.lzo.LzopInputStream: IOException in getCompressedData; likely LZO corruption. java.io.IOException: Compressed length 2004251197 exceeds max block size 67108864 (probably corrupt file) at com.hadoop.compression.lzo.LzopInputStream.getCompressedData(LzopInputStream.java:286) at com.hadoop.compression.lzo.LzopInputStream.decompress(LzopInputStream.java:256) at org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:83) at java.io.InputStream.read(InputStream.java:82) at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:209) at org.apache.hadoop.util.LineReader.readLine(LineReader.java:173) at org.apache.hadoop.util.LineReader.readLine(LineReader.java:308) at com.hadoop.mapred.DeprecatedLzoLineRecordReader.<init>(DeprecatedLzoLineRecordReader.java:64) at com.hadoop.mapred.DeprecatedLzoTextInputFormat.getRecordReader(DeprecatedLzoTextInputFormat.java:158) at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.<init>(CombineHiveRecordReader.java:65) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:355) at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.<init>(HadoopShimsSecure.java:316) at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.getRecordReader(HadoopShimsSecure.java:430) at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:540) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:395) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:333) at org.apache.hadoop.mapred.Child$4.run(Child.java:268) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408) at org.apache.hadoop.mapred.Child.main(Child.java:262) 2014-03-03 17:00:01,501 INFO org.apache.hadoop.mapred.TaskLogsTruncater: Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1 2014-03-03 17:00:01,503 ERROR org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:hdfs (auth:SIMPLE) cause:java.io.IOException: java.lang.reflect.InvocationTargetException 2014-03-03 17:00:01,503 WARN org.apache.hadoop.mapred.Child: Error running child java.io.IOException: java.lang.reflect.InvocationTargetException at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97) at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57) at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:369) at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.<init>(HadoopShimsSecure.java:316) at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.getRecordReader(HadoopShimsSecure.java:430) at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:540) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:395) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:333) at org.apache.hadoop.mapred.Child$4.run(Child.java:268) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408) at org.apache.hadoop.mapred.Child.main(Child.java:262) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:355) ... 10 more Caused by: java.io.IOException: Compressed length 2004251197 exceeds max block size 67108864 (probably corrupt file) at com.hadoop.compression.lzo.LzopInputStream.getCompressedData(LzopInputStream.java:286) at com.hadoop.compression.lzo.LzopInputStream.decompress(LzopInputStream.java:256) at org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:83) at java.io.InputStream.read(InputStream.java:82) at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:209) at org.apache.hadoop.util.LineReader.readLine(LineReader.java:173) at org.apache.hadoop.util.LineReader.readLine(LineReader.java:308) at com.hadoop.mapred.DeprecatedLzoLineRecordReader.<init>(DeprecatedLzoLineRecordReader.java:64) at com.hadoop.mapred.DeprecatedLzoTextInputFormat.getRecordReader(DeprecatedLzoTextInputFormat.java:158) at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.<init>(CombineHiveRecordReader.java:65) ... 15 more 查了很多文章 最後發現 job.xml中配置: mapred.input.format.class=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat hive.hadoop.supports.splittable.combineinputformat=true 果斷 將
hive.hadoop.supports.splittable.combineinputformat設定為false 後 正常。 原因是 lzo 壓縮後 原生不支援分片,如果支援分片需要 建索引。而這裡每個lzo檔案相對比較小 120MB,所以 不需要建立索引 不支援分片即可。
相關推薦
使用hive 對lzo資料分析時的報錯
之前建立map作業 將文字檔案通過combineInputFormat 合併 小檔案並壓縮為lzo檔案 ,作業設定: conf.setInt("mapred.min.split.size", 1); conf.setLong("mapred
從0到1搭建基於Kafka、Flume和Hive的海量資料分析系統(一)資料收集應用
大資料時代,一大技術特徵是對海量資料採集、儲存和分析的多元件解決方案。而其中對來自於感測器、APP的SDK和各類網際網路應用的原生日誌資料的採集儲存則是基本中的基本。本系列文章將從0到1,概述一下搭建基於Kafka、Flume、Zookeeper、HDFS、Hive的海量資料分析系統的框架、核心應用和關鍵模組
rsync資料同步時報錯:rsync: mkstemp 或rsync: delete,Permission denied
場景: 用jekines自動構建java專案時,用到一個命令rsync -arqz --delete $WORKSPACE/target/yonghe-console/* [email protected]::yonghe-console --exclude-from=
Laravel做MySQL資料操作時報錯:SQLSTATE [HY000]: General error: 2036
今天在CentOS作業系統下搭建PHP環境,預設使用的是php-mysql的php資料庫操作驅動,準備讓Laravel專案跑起來時,沒成功,在通過Laravel操作MySQL資料庫時報錯:SQLSTATE [HY000]: General error: 2036 解決方法有兩
Python -- 使用os.remove刪除資料夾時報錯
os.remove不能用來刪除資料夾,否則拒絕訪問。 # -*- coding:utf-8 -*- import os if __name__ == "__main__": os.remove('D:\\test') 執行結果: 刪除空目
【FAQ問題記錄】建立資料夾時報錯java.io.FileNotFoundException:(系統找不到指定的路徑。)
在上傳檔案時,要儲存到指定目錄下,需要建立資料夾,系統報錯如下: 嚴重: Servlet.service() for servlet [dispatcherServlet] in context with path [] threw exception java.io.File
在myeclipse中對匯入js檔案時報錯的解決方法
一:今天在MyEclipse中加入jquery-1.4.2.js時,出現以下錯誤,附上圖片和錯誤程式碼:error1:DescriptionResourcePathLocationTypeSyntax error on token "undefined", invalid F
使用sqoop將mysql中的資料匯入Hive時報錯
1.使用sqoop在hive中建立一個和mysql中資料結構一樣的表時報錯 ERROR manager.SqlManager: Error reading from database: java.sql.SQLException: Streaming result set [em
遇見hive之記憶篇--運用sqoop對資料的同步的常見錯誤,及hive的儲存格式分析(壓縮格式)
前面所記載的差不多都涵蓋到了,但是總是覺得有很多知識點沒有記到,在這裡梳理一遍1、sqoop的匯入,這次測試完全分散式對sqoop的快速匯入的測試嘗試了cdh分散式下的hive的配置,及sqoop的配置,才發現和偽分散式的單節點的部署一模一樣,並沒有其他要注意的東西,就那個,
利用hive對微博資料統計分析案例
資料樣例: 欄位描述 總共19個欄位 beCommentWeiboId 是否評論 beForwardWeiboId 是否是轉發微博 catchTime 抓取時間 commentCount 評論次數 content 內容 createTime
啟動Hive時報錯
hand ogg plugin lib gen helper cli stat except 報錯信息如下 Logging initialized using configuration in jar:file:/home/hive/apache-hive-1.2.2-b
hive啟動時報錯${system:java.io.tmpdir
hive system:java.io.tmpdirException in thread "main" java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: ${syst
hive啟動時報錯: Relative path in absolute URI: ${system:java.io.t
hive system:java.io.tmpdirException in thread "main" java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: ${syst
R_Studio(關聯)對Groceries資料集進行關聯分析
RGui的arules程式包裡含有Groceries資料集,該資料集是某個雜貨店一個月真實的交易記錄,共有9835條消費記錄,169個商品 #install.packages("arules") libra
SAP MM ME21N 建立PO時報錯 - Net price in CNY becomes too large – 之原因分析
SAP MM ME21N 建立PO時報錯 - Net price in CNY becomes too large – 之原因分析 昨天筆者在微信公眾號裡釋出了一篇文章《SAP MM ME21N 建立PO時報錯 - Net price in CNY becomes too la
Apache Kudu: Hadoop生態系統的新成員實現對快速資料的快速分析
A new addition to the open source Apache Hadoop ecosystem, Apache Kudu completes Hadoop’s storage layer to enable fast analytics on fast dat
[譯]使用 Pandas 對 Kaggle 資料集進行統計資料分析
原文地址:EXPLORATORY STATISTICAL DATA ANALYSIS WITH A KAGGLE DATASET USING PANDAS 原文作者:Strikingloo 譯文出自:掘金翻譯計劃 本文永久連結:github.com/xitu/gold-m…
easyUI中easyui-datebox在提交資料時報錯400bad request
前端用easyUI中easyui-datebox向後臺提交時間,格式為yyyy-MM-dd HH:mm:ss,MySQL時間為datetime,專案採用springMVC,實體中欄位型別為date,mybatis中設定為TIMESTAMP,前端提交時報錯400bad request。 原因:
求助:scala的json4s把JValue轉為對象時報錯 java.lang.NoSuchMethodError: scala.Predef$.$conforms()Lscala/Predef$$less$colon$less;
err val streaming forms color ext import read default 測試代碼:import org.json4s._import org.json4s.JsonDSL._import org.json4s.jackson.JsonMe
kettle-java程式碼執行hive相關ktr時報錯: database type with plugin id [HIVE2] couldn't be found!
kettle-java程式碼執行hive相關ktr時報錯: database type with plugin id [HIVE2] couldn't be found! 轉 2018年08月13日 16:47:30 lisery1993 閱讀數:305 1.在jav