1. 程式人生 > >CDH 5.3.0 一個小任務運行了12個小時的原因。

CDH 5.3.0 一個小任務運行了12個小時的原因。

2015-09-13 00:02:51,433 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down all scheduled reduces:0
2015-09-13 00:02:51,433 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Going to preempt 1 due to lack of space for maps
2015-09-13 00:02:51,434 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Recalculating schedule, headroom=<memory:180224, vCores:0>
2015-09-13 00:02:51,434 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Reduce slow start threshold not met. completedMapsForReduceSlowstart 1
2015-09-13 00:02:52,439 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down all scheduled reduces:0
2015-09-13 00:02:52,439 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Going to preempt 1 due to lack of space for maps
2015-09-13 00:02:52,439 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Recalculating schedule, headroom=<memory:180224, vCores:0>
2015-09-13 00:02:52,439 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Reduce slow start threshold not met. completedMapsForReduceSlowstart 1
2015-09-13 00:02:53,441 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down all scheduled reduces:0
分析原因可能container不足導致任務不能分配,於是檢視那一段時間的vcores,mem的分配情況:記憶體的使用情況正常,不過vcores的使用卻被沾滿了。
看了跑的任務,所有的任務並不需要的那麼多記憶體,但是有些spark-shell任務,指定了引數--num-executors   --executor-cores 引數過多,導致的vcores 一直被佔用著。
-num-executors 命令列引數或者spark.executor.instances 配置項控制需要的 executor 個數。從 CDH 5.4/Spark 1.3 開始,你可以避免使用這個引數,只要你通過設定 spark.dynamicAllocation.enabled 引數開啟 
動態分配 。動態分配可以使的 Spark 的應用在有後續積壓的在等待的 task 時請求 executor,並且在空閒時釋放這些 executor