【已解決!】spark程式報錯:java.lang.IndexOutOfBoundsException: toIndex = 9
該篇文章意於記錄報錯,也給遇到相同問題的小夥伴提供排錯思路!但是本人也沒有什麼好的解決方法,如果有,我會更新此文章
問題已經解決,請大家拉到最下面↓↓↓↓↓
記錄下報錯:
寫了段spark程式碼,然後報錯了
2018-07-30 17:19:28,854 WARN [task-result-getter-2] scheduler.TaskSetManager (Logging.scala:logWarning(66)) - Lost task 83.0 in stage 2.0 (TID 237, 30號機器, executor 100): TaskKilled (stage cancelled) 2018-07-30 17:19:28,855 ERROR [dag-scheduler-event-loop] scheduler.LiveListenerBus (Logging.scala:logError(70)) - SparkListenerBus has already stopped! Dropping event SparkListenerTaskEnd(2,0,ShuffleMapTask,TaskKilled(stage cancelled),
[email protected],null) 2018-07-30 17:19:28,855 ERROR [dag-scheduler-event-loop] scheduler.LiveListenerBus (Logging.scala:logError(70)) - SparkListenerBus has already stopped! Dropping event SparkListenerTaskEnd(2,0,ShuffleMapTask,TaskKilled(stage cancelled),[email protected],null) 2018-07-30 17:19:28,855 WARN [task-result-getter-0] scheduler.TaskSetManager (Logging.scala:logWarning(66)) - Lost task 175.0 in stage 2.0 (TID 295, 43號機器, executor 1): TaskKilled (stage cancelled) 2018-07-30 17:19:28,856 WARN [task-result-getter-1] scheduler.TaskSetManager (Logging.scala:logWarning(66)) - Lost task 2.0 in stage 2.0 (TID 131, 43號機器, executor 1): TaskKilled (stage cancelled) 2018-07-30 17:19:28,856 ERROR [dag-scheduler-event-loop] scheduler.LiveListenerBus (Logging.scala:logError(70)) - SparkListenerBus has already stopped! Dropping event SparkListenerTaskEnd(2,0,ShuffleMapTask,ExceptionFailure(java.lang.IndexOutOfBoundsException,toIndex = 9,[Ljava.lang.StackTraceElement;@a6236c8,java.lang.IndexOutOfBoundsException: toIndex = 9 at java.util.ArrayList.subListRangeCheck(ArrayList.java:1004) at java.util.ArrayList.subList(ArrayList.java:996) at org.apache.hadoop.hive.ql.io.orc.RecordReaderFactory.getSchemaOnRead(RecordReaderFactory.java:161) at org.apache.hadoop.hive.ql.io.orc.RecordReaderFactory.createTreeReader(RecordReaderFactory.java:66) at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.<init>(RecordReaderImpl.java:202) at org.apache.hadoop.hive.ql.io.orc.ReaderImpl.rowsOptions(ReaderImpl.java:539) at org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger$ReaderPair.<init>(OrcRawRecordMerger.java:183) at org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger$OriginalReaderPair.<init>(OrcRawRecordMerger.java:226) at org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger.<init>(OrcRawRecordMerger.java:437) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getReader(OrcInputFormat.java:1215) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getRecordReader(OrcInputFormat.java:1113) at org.apache.spark.rdd.HadoopRDD$$anon$1.liftedTree1$1(HadoopRDD.scala:246) at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:245) at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:203) at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:94) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) at org.apache.spark.rdd.UnionRDD.compute(UnionRDD.scala:105) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53) at org.apache.spark.scheduler.Task.run(Task.scala:108) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:748)
在程式碼當中,我將三張表分別讀取出來,然後做join,但是我發現程式執行很快就報錯了,我一開始猜想的時候,以為是我spark程式的問題,調了半天。經過別人提醒,有可能是資料異常!!!
然後我就在讀取每一張表之後做count(強制觸發讀表操作,結果發現A表的count就出了問題)
因此我把讀取A表的sql拿到hive中執行,果然:
2018-07-30 17:32:59,516 Stage-1 map = 69%, reduce = 100%, Cumulative CPU 4831.3 sec 2018-07-30 17:33:00,583 Stage-1 map = 95%, reduce = 100%, Cumulative CPU 4831.3 sec 2018-07-30 17:33:01,633 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 4831.3 sec MapReduce Total cumulative CPU time: 0 days 1 hours 20 minutes 31 seconds 300 msec Ended Job = job_1525840870040_80806 with errors Error during job, obtaining debugging information... Examining task ID: task_1525840870040_80806_m_000011 (and more) from job job_1525840870040_80806 Examining task ID: task_1525840870040_80806_m_000085 (and more) from job job_1525840870040_80806 Examining task ID: task_1525840870040_80806_m_000312 (and more) from job job_1525840870040_80806 Examining task ID: task_1525840870040_80806_m_000133 (and more) from job job_1525840870040_80806 Examining task ID: task_1525840870040_80806_m_000041 (and more) from job job_1525840870040_80806 Examining task ID: task_1525840870040_80806_m_000445 (and more) from job job_1525840870040_80806 Examining task ID: task_1525840870040_80806_m_000298 (and more) from job job_1525840870040_80806 Examining task ID: task_1525840870040_80806_m_000291 (and more) from job job_1525840870040_80806 Examining task ID: task_1525840870040_80806_m_000108 (and more) from job job_1525840870040_80806 Examining task ID: task_1525840870040_80806_m_000065 (and more) from job job_1525840870040_80806 Examining task ID: task_1525840870040_80806_m_000058 (and more) from job job_1525840870040_80806 Examining task ID: task_1525840870040_80806_m_000182 (and more) from job job_1525840870040_80806 Examining task ID: task_1525840870040_80806_m_000389 (and more) from job job_1525840870040_80806 Examining task ID: task_1525840870040_80806_m_000438 (and more) from job job_1525840870040_80806 Examining task ID: task_1525840870040_80806_m_000488 (and more) from job job_1525840870040_80806 Examining task ID: task_1525840870040_80806_m_000238 (and more) from job job_1525840870040_80806 Examining task ID: task_1525840870040_80806_m_000343 (and more) from job job_1525840870040_80806 Examining task ID: task_1525840870040_80806_m_000149 (and more) from job job_1525840870040_80806 Examining task ID: task_1525840870040_80806_m_000451 (and more) from job job_1525840870040_80806 Examining task ID: task_1525840870040_80806_m_000141 (and more) from job job_1525840870040_80806 Examining task ID: task_1525840870040_80806_m_000484 (and more) from job job_1525840870040_80806 Examining task ID: task_1525840870040_80806_m_000245 (and more) from job job_1525840870040_80806 Examining task ID: task_1525840870040_80806_m_000377 (and more) from job job_1525840870040_80806 Examining task ID: task_1525840870040_80806_m_000360 (and more) from job job_1525840870040_80806 Examining task ID: task_1525840870040_80806_m_000000 (and more) from job job_1525840870040_80806 Examining task ID: task_1525840870040_80806_m_000059 (and more) from job job_1525840870040_80806 Examining task ID: task_1525840870040_80806_m_000426 (and more) from job job_1525840870040_80806 Task with the most failures(4): ----- Task ID: task_1525840870040_80806_m_000011 URL: http://0.0.0.0:8088/taskdetails.jsp?jobid=job_1525840870040_80806&tipid=task_1525840870040_80806_m_000011 ----- Diagnostic Messages for this Task: Error: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:179) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:459) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1917) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row at org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:52) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:170) ... 8 more Caused by: java.lang.ArrayIndexOutOfBoundsException: 10 at org.apache.hadoop.hive.ql.exec.vector.udf.VectorUDFAdaptor.evaluate(VectorUDFAdaptor.java:117) at org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpression.evaluateChildren(VectorExpression.java:118) at org.apache.hadoop.hive.ql.exec.vector.expressions.gen.FilterDoubleColumnBetween.evaluate(FilterDoubleColumnBetween.java:55) at org.apache.hadoop.hive.ql.exec.vector.VectorFilterOperator.processOp(VectorFilterOperator.java:100) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815) at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:98) at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:157) at org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:45) ... 9 more FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask MapReduce Jobs Launched: Stage-Stage-1: Map: 908 Reduce: 1099 Cumulative CPU: 4831.3 sec HDFS Read: 3284591273 HDFS Write: 0 FAIL Total MapReduce CPU Time Spent: 0 days 1 hours 20 minutes 31 seconds 300 msec
所以hive也報錯,也是陣列下標越界(這張表是ORC格式),我想應該是這個表中對應分割槽的部分資料損壞,導致欄位缺失等類似問題!
暫時還未想到解決方法,但願有好的方法,來填這個坑
==========================================華麗的分割線===========================================
時隔多久,我終於解決這個問題了,喜(喜聞樂見)大(大快人心)普(普天同慶)奔(奔走相告)
其實是這個錯誤拖了一段時間,最近相對閒了點所以才又開始思考這個問題,其實無非就以下幾種方法:
1、看報錯的程式碼為什麼錯,debug,看看下標越界之前都有哪些變數,變數的值都是什麼(但是,好像我們這裡的環境不太允許debug,或者說比較麻煩)
2、百度/google/必應,搜尋相關類的報錯(別搜尋報錯型別,搜尋報錯型別,只會告訴你下標越界是因為集合長度不夠),所以我搜索了RecordReaderFactory這個類
3、如果搜尋一段時間都找不到相關的問題,那麼就要大膽的猜測,小心的求證,復現bug很重要(其實百度就是為了提供猜測的方向,避免滿世界亂猜)!只要能復現,能知道問題所在,就能精準解決~
我搜索到的相關issue:
https://issues.apache.org/jira/browse/HIVE-14650
https://issues.apache.org/jira/browse/HIVE-13432
https://issues.apache.org/jira/browse/HIVE-14004
https://issues.apache.org/jira/browse/HIVE-13974
其實好像都和我的錯誤不那麼一樣,但是在14650這個issue中,我看到了一段留言
中文翻譯過來就是:這個錯誤可以很簡單的復現,當一個使用者將多個表指向同一個路徑的時候就有可能會出現這種問題,例如:一張manager表(內部表),一張external表(外部表),指向同一個路徑,然後在內部表中添加了新的欄位。這樣對於外部表做任何的select操作都會有這樣的報錯!
我覺得看到曙光了,因此我就實際測試了下,對的,我發現我查詢的那張表確實是external表,所以我查詢了那個路徑的內部表(查詢方式有兩種,如果你有許可權去hive的元資料庫中查,那要確認下命令是什麼,大概是通過location這個欄位去查表名,另一種查詢方式就是用嘴問你們公司相關開發人員,他們肯定會知道的),發現確實內部表比external表多了一個欄位!但是這種情況使用hive查詢不會報錯,但是使用spark查詢就各種不行,所以確認問題之後就簡單了~
解決方法:
在hive中新建一張欄位全的表,以後就用這張表了,或者直接把之前的external表刪了(external表刪除並不會刪除資料,所以不用擔心),然後新建一張名字一樣,但是欄位全的表就可以了。
但是建完表之後如果原來那張表是分割槽表,那還需要載入分割槽,載入分割槽通過兩種命令:一是一個分割槽一個分割槽的載入,二是分割槽修復命令,命令如下:
--分割槽修復(當然是這種方式比較簡單啊,但是要檢查下成功了沒有)
MSCK REPAIR TABLE 表名;
--成功了會有如下資訊:
Partitions not in metastore: 表名:date_id=20180905/XXX=xxx
Repair: Added partition to metastore
資料庫.表名:date_id=20180905/XXX=xxx
--一個一個分割槽手動新增(分割槽少,或者單純測試的時候可以用用,分割槽多還是算了吧)
alter table 表名 add partition (date_id='20180905',XXX=xxx);
--檢查分割槽情況
show partitions 資料庫.表名
到此就可以完美解決這個bug了,當然如果你要改原始碼重新打包編譯七七八八的也ok,但是還是要考慮到今後的維護,一定要用最簡單的方法來解決問題,我覺得我這個解決方法是最簡單的了!
希望以後遇到所有的坑,我也能像這次一樣填上!!!阿門~