解決spark執行時Java heap space問題

阿新 • • 發佈：2019-02-07

問題描述：

在執行spark程式時，需要讀取200w資料作為快取。在利用.broadcast廣播這些資料時，遇到Exception in thread "main" java.lang.OutOfMemoryError: Java heap space問題。

報錯資訊如下：

15/09/15 05:26:09 INFO storage.BlockManagerInfo: Removed broadcast_3_piece0 on ip-172-31-10-136.ec2.internal:34472 in memory (size: 2.0 KB, free: 397.3 MB)
15/09/15 05:26:09 INFO spark.ContextCleaner: Cleaned broadcast 3
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
        at java.io.ObjectOutputStream$HandleTable.growEntries(ObjectOutputStream.java:2351)
        at java.io.ObjectOutputStream$HandleTable.assign(ObjectOutputStream.java:2276)
        at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1428)
        at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
        at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348)
        at java.util.ArrayList.writeObject(ArrayList.java:762)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:497)
        at java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:988)
        at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1496)
        at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
        at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
        at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348)
        at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:44)
        at org.apache.spark.broadcast.TorrentBroadcast$.blockifyObject(TorrentBroadcast.scala:202)
        at org.apache.spark.broadcast.TorrentBroadcast.writeBlocks(TorrentBroadcast.scala:101)
        at org.apache.spark.broadcast.TorrentBroadcast.<init>(TorrentBroadcast.scala:84)
        at org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:34)
        at org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:29)
        at org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:62)
        at org.apache.spark.SparkContext.broadcast(SparkContext.scala:1051)
        at org.apache.spark.api.java.JavaSparkContext.broadcast(JavaSparkContext.scala:648)
        at com.myspark.spark.task.Spark_task.main(Spark_task.java:77)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:497)
        at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:569)
        at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:166)
        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:189)

進一步地，檢視報錯位置之前的幾句資訊：

15/09/15 05:26:09 INFO storage.MemoryStore: Block broadcast_3 of size 3488 dropped from memory (free 280236528)
15/09/15 05:26:09 INFO storage.BlockManagerInfo: Removed broadcast_3_piece0 on ip-172-31-10-135.ec2.internal:51942 in memory (size: 2.0 KB, free: 398.1 MB)
15/09/15 05:26:09 INFO storage.BlockManagerInfo: Removed broadcast_3_piece0 on ip-172-31-10-136.ec2.internal:34472 in memory (size: 2.0 KB, free: 397.3 MB)
15/09/15 05:26:09 INFO spark.ContextCleaner: Cleaned broadcast 3

說明記憶體不夠了。

解決辦法：

spark不能通過java -Xms32m -Xmx800m className來新增記憶體，spark不支援該格式，從./bin/spark-submit --help中也沒有看到該格式。所以只能從spark本身入手。

檢視./bin/spark-submit --help，發現

 --driver-memory MEM         Memory for driver (e.g. 1000M, 2G) (Default: 512M).

於是，修改執行提交語句為，執行成功：

./bin/spark-submit 
 --class com.myspark.spark.task.Spark_task 
 --master yarn-client --driver-memory 1g
 /home/hadoop/myspark/spark-example-test-0.0.1-SNAPSHOT.jar
 s3://**********
 s3://***********
 /test/myspark/spark35

對於executor-memory，由於我是在基於yarn的spark上執行的，可能這個是有yarn自己來控制。這裡設定時，是無效的。可能在local模式時，可以設定。具體細節待實驗研究。

--executor-memory MEM       Memory per executor (e.g. 1000M, 2G) (Default: 1G)

【附】

./bin/spark-submit --help具體資訊如下：

Options:
  --master MASTER_URL         spark://host:port, mesos://host:port, yarn, or local.
  --deploy-mode DEPLOY_MODE   Whether to launch the driver program locally ("client") or
                              on one of the worker machines inside the cluster ("cluster")
                              (Default: client).
  --class CLASS_NAME          Your application's main class (for Java / Scala apps).
  --name NAME                 A name of your application.
  --jars JARS                 Comma-separated list of local jars to include on the driver
                              and executor classpaths.
  --packages                  Comma-separated list of maven coordinates of jars to include
                              on the driver and executor classpaths. Will search the local
                              maven repo, then maven central and any additional remote
                              repositories given by --repositories. The format for the
                              coordinates should be groupId:artifactId:version.
  --repositories              Comma-separated list of additional remote repositories to
                              search for the maven coordinates given with --packages.
  --py-files PY_FILES         Comma-separated list of .zip, .egg, or .py files to place
                              on the PYTHONPATH for Python apps.
  --files FILES               Comma-separated list of files to be placed in the working
                              directory of each executor.

  --conf PROP=VALUE           Arbitrary Spark configuration property.
  --properties-file FILE      Path to a file from which to load extra properties. If not
                              specified, this will look for conf/spark-defaults.conf.

  --driver-memory MEM         Memory for driver (e.g. 1000M, 2G) (Default: 512M).
  --driver-java-options       Extra Java options to pass to the driver.
  --driver-library-path       Extra library path entries to pass to the driver.
  --driver-class-path         Extra class path entries to pass to the driver. Note that
                              jars added with --jars are automatically included in the
                              classpath.

  --executor-memory MEM       Memory per executor (e.g. 1000M, 2G) (Default: 1G).

  --proxy-user NAME           User to impersonate when submitting the application.

  --help, -h                  Show this help message and exit
  --verbose, -v               Print additional debug output
  --version,                  Print the version of current Spark

 Spark standalone with cluster deploy mode only:
  --driver-cores NUM          Cores for driver (Default: 1).
  --supervise                 If given, restarts the driver on failure.
  --kill SUBMISSION_ID        If given, kills the driver specified.
  --status SUBMISSION_ID      If given, requests the status of the driver specified.

 Spark standalone and Mesos only:
  --total-executor-cores NUM  Total cores for all executors.

 YARN-only:
  --driver-cores NUM          Number of cores used by the driver, only in cluster mode
                              (Default: 1).
  --executor-cores NUM        Number of cores per executor (Default: 1).
  --queue QUEUE_NAME          The YARN queue to submit to (Default: "default").
  --num-executors NUM         Number of executors to launch (Default: 2).
  --archives ARCHIVES         Comma separated list of archives to be extracted into the
                              working directory of each executor.

解決spark執行時Java heap space問題

問題描述：在執行spark程式時，需要讀取200w資料作為快取。在利用.broadcast廣播這些資料時，遇到Exception in thread "main" java.lang.OutOfMemoryError: Java heap space問題。報錯資訊如下：

Spark OOM：java heap space，OOM:GC overhead limit exceeded解決方法

問題描述：在使用spark過程中，有時會因為資料增大，而出現下面兩種錯誤: java.lang.OutOfMemoryError: Java heap space java.lang.OutOfMemoryError：GC overhead limit exceeded 這兩種錯誤之前我一直認為是e

POI操作Excel時最大行、列數的問題及寫大量資料時Java heap space記憶體溢位解決

如果你從開始選單中啟動excel2007，預設是：1048576如果你儲存或者開啟型別為excel工作簿（.xlsx):1048576如果你儲存或者開啟型別為97-2003工作簿（.xls):65536 public class MaxRowsTest { publi

執行ant命令時出現java.lang.OutOfMemoryError: Java heap space

當使用Ant編譯大量的Java原始檔時，會出現java.lang.OutOfMemoryError:Java heap space異常，解決方法： <target name="compile"depends="init"> <javac srcdir="${src}"destd

spark執行中的java.lang.OutOfMemoryError: Java heap space錯誤

問題描述：我在執行我的spark程式碼過程中，出現瞭如標題所示的問題以下為我執行的主要程式碼： ss=e_Task_test.engine() diag_hos=l_patient.map(lambda x

Spark 執行出現java.lang.OutOfMemoryError: Java heap space

具體錯誤如截圖：主要就是java記憶體溢位。之前嘗試過很多方法：/conf中設定spark-java-opts 等，都沒有解決問題。其實問題就是JVM在執行時記憶體不夠導致。可以通過命令： ./spark-submit --help 可以看到

eclipse下用Ant編譯大量java檔案時出Java heap space異常解決方法

問題：Try to increase heap size. Can be done by defining ANT_OPTS=-Xmx640m 問題截圖解決辦法：當使用Ant編譯大量的Java原始檔時，會出現java.lang.OutOfMemoryEr

解決sqoop報錯：java.lang.OutOfMemoryError: Java heap space

keep image ces use ati size tex 問題 -- 報錯棧： 2017-06-15 16:24:50,449 INFO [main] org.apache.sqoop.mapreduce.db.DBRecordReader: Executing

jmeter出現java.lang.OutOfMemoryError: Java heap space的解決辦法

space jmeter blank 容易設備 lan xmx 腳本解決大並發或者循環次數過多的時候，jmeter容易出現 java.lang.OutOfMemoryError: Java heap space這樣的異常，其中修改jmeter.bat 或者jmeter

正確使用MySQL JDBC setFetchSize()方法解決JDBC處理大結果集 java.lang.OutOfMemoryError: Java heap space

() lai 設置從服務器 rest direction tools start 記錄昨天在項目中需要對日誌的查詢結果進行導出功能。日誌導出功能的實現是這樣的，輸入查詢條件，然後對查詢結果進行導出。由於日誌數據量比較大。多的時候，有上億條記錄。之前的

解決eclipse maven install 造成JVM 內存溢出(java.lang.OutOfMemoryError: Java heap space)

add eap tor pac task 報錯信息 efi trace alt maven install 報錯信息： The system is out of resources.Consult the following stack trace for details.

at java.util.Arrays.copyOfRange(Arrays.java:3209)導致的java.lang.OutOfMemoryError: Java heap space 錯誤的解決辦法

home 修改 arr 解決辦法 cal copy server tom space 手動設置Heap size 修改TOMCAT_HOME/bin/catalina.bat，在“echo "Using CATALINA_BASE: $CATALINA_BASE"”上面加入

Java 記憶體溢位（java.lang.OutOfMemoryError: Java heap space）分析與解決

說明：下面出現的問題為本人在myeclips開發過程中，開發工具時獲取海量資料時出現的問題報錯。由於本人開發電腦使用的4g記憶體，虛擬記憶體與myeclips相關記憶體設定無法滿足要求。問題分析：（網上資料整合與翻譯） java.lang.OutOfMemo

Maven編譯出現 java lang OutOfMemoryError Java heap space 問題及解決辦

分享一下我老師大神的人工智慧教程！零基礎，通俗易懂！http://blog.csdn.net/jiangjunshow 也歡迎大家轉載本篇文章。分享知識，造福人民，實現我們中華民族偉大復興！

Spark 執行問題 java.lang.NoSuchMethodError: scala.Predef 解決方案

idea中如果遇到這種問題，一般查詢和spark匹配的scala版本就能解決如果不能解決請開啟專案的iml檔案，去掉不同版本的scala的orderEntry就能解決。另在mac中通常會有問題no snappyjava in java.library.path 解決方案如下 1.

[轉]一次 java heap space 解決過程

轉自:https://blog.csdn.net/esuom_gib/article/details/79942273 對個人有用的地方， 1.排查head 堆溢位的思路 http://itindex.net/detail/51440-java-%E5%86%85%E5%AD%98-%E

Spark 執行時常見異常及資料傾斜的解決方法

spark執行異常：現象1：有時會出現的一種情況非常普遍，在spark的作業中；shuffle file not found。（spark作業中，非常非常常見的）而且，有的時候，它是偶爾才會出現的一種情況。有的時候，出現這種情況以後，會重新去

Java修改JVM記憶體大小整理。（java heap space 解決方案）

JAVA程式啟動時JVM都會分配一個初始記憶體和最大記憶體給這個應用程式。這個初始記憶體和最大記憶體在一定程度都會影響程式的效能；Tomcat預設可以使用的記憶體為128MB，在較大型的應用專案中，這點記憶體是不夠的，需要調大。有以下幾種方法可以選用：第一種環境myeclipse修改在選單windo

Java heap space 解決方法（轉）

因為程式要從資料讀取近10W行記錄處理，當讀到9W的時候就出現 java.lang.OutOfMemoryError: Java heap space 這樣的錯誤。在網上一查可能是JAVA的堆疊設定太小的原因。跟據網上的答案大致有這兩種解決方法： 1、設定環境變數 set JAVA_OPTS= -Xms

spark報錯java.lang.OutOfMemoryError: Java heap space

針對spark報錯： java.lang.OutOfMemoryError: Java heap space 解決方式：在spark/conf/spark-env.sh中加大SPARK_WORKER_MEMORY值，如下，我加大至6GB export SPAR

解決spark執行時Java heap space問題

問題描述：

解決辦法：

相關推薦