1. 程式人生 > >sparksql讀parquet表執行報錯

sparksql讀parquet表執行報錯

叢集記憶體:1024G(資料量:400G)
在這裡插入圖片描述
(1)報錯資訊:
Job aborted due to stage failure: Serialized task 2231:2304 was 637417604 bytes, which exceeds max allowed: spark.rpc.message.maxSize (134217728 bytes). Consider increasing spark.rpc.message.maxSize or using broadcast variables for large values.

(2)原因:
Driver端傳送的資料太大導致超過spark預設的傳輸限制

(3)解決方案:
增加配置資訊 spark.rpc.message.maxSize=1024

spark2-submit \
--class com.lhx.test \
--master yarn \
--deploy-mode cluster \
--conf spark.rpc.message.maxSize=1024 \
--driver-memory 30g \
--executor-memory 12g \
--num-executors 12 \
--executor-cores 3 \
--conf spark.yarn.driver.memoryOverhead=4096m \
--conf spark.yarn.executor.memoryOverhead=4096m \
./test.jar