Spark執行在Standalone模式下產生的臨時目錄的問題

阿新 • • 發佈：2019-02-18

Spark 的Job任務在執行過程中產生大量的臨時目錄位置，導致某個分割槽磁碟寫滿，主要原因spark執行產生臨時目錄的預設路徑/tmp/spark* 專案中使用的版本情況 Hadoop: 2.7.1 Spark:1.6.0 JDK:1.8.0 1、專案運維需求線上的Spark的叢集相關/tmp/spark-* 日誌會把 /分割槽磁碟寫滿，建議優化應用程式或更改日誌路徑到/home/ 子目錄下 2、解決方案 2.1 方案1（不建議使用）可以通過crontab 定時執行rm -rf /tmp/spark*命令，缺點：當spark的任務執行，這個時候會生成/tmp/spark* 的臨時檔案，正好在這個時候 crontab 啟動rm命令，從而導致檔案找不到以至於spark任務執行失敗 2.2 方案2（推薦在spark-env.sh 中配置引數，不在spark-defaults.conf 中配置） spark環境配置spark.local.dir，其中 SPARK_LOCAL_DIRS ： storage directories to use on this node for shuffle and RDD data 修改 conf 目錄下的spark-defaults.conf 或者 conf 目錄下的spark-env.conf，下面我們來一一驗證哪個更好。（1）修改spark執行時臨時目錄的配置，增加如下一行 spark.local.dir /diskb/sparktmp,/diskc/sparktmp,/diskd/sparktmp,/diske/sparktmp,/diskf/sparktmp,/diskg/sparktmp 說明：可配置多個目錄，以 "," 分隔。（2）修改配置spark-env.sh下增加 export SPARK_LOCAL_DIRS=spark.local.dir /diskb/sparktmp,/diskc/sparktmp,/diskd/sparktmp,/diske/sparktmp,/diskf/sparktmp,/diskg/sparktmp 如果spark-env.sh與spark-defaults.conf都配置，則SPARK_LOCAL_DIRS覆蓋spark.local.dir 的配置生產環境我們按照這樣的思路去處理生產環境修改為：在spark-defaults.conf 下增加一行 spark.local.dir /home/hadoop/data/sparktmp

然後執行通過下面的命令驗證：

bin/spark-submit  --class  org.apache.spark.examples.SparkPi \
--master spark://10.4.1.1:7077 \
--total-executor-cores 4 \
--driver-memory 2g \
--executor-memory 2g \
--executor-cores 1 \
lib/spark-examples*.jar  10

執行完成後，有些work下executor的日誌發現會存在一些錯誤日誌，錯誤如下： 6/09/08 15:55:53 INFO util.Utils: Successfully started service 'sparkExecutorActorSystem' on port 50212. 16/09/08 15:55:53 ERROR storage.DiskBlockManager: Failed to create local dir in . Ignoring this directory. java.io.IOException: Failed to create a temp directory (under ) after 10 attempts! at org.apache.spark.util.Utils$.createDirectory(Utils.scala:217) at org.apache.spark.storage.DiskBlockManager$$anonfun$createLocalDirs$1.apply(DiskBlockManager.scala:135) at org.apache.spark.storage.DiskBlockManager$$anonfun$createLocalDirs$1.apply(DiskBlockManager.scala:133) at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251) at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108) at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:251) at scala.collection.mutable.ArrayOps$ofRef.flatMap(ArrayOps.scala:108) at org.apache.spark.storage.DiskBlockManager.createLocalDirs(DiskBlockManager.scala:133) at org.apache.spark.storage.DiskBlockManager.<init>(DiskBlockManager.scala:45) at org.apache.spark.storage.BlockManager.<init>(BlockManager.scala:76) at org.apache.spark.SparkEnv$.create(SparkEnv.scala:365) at org.apache.spark.SparkEnv$.createExecutorEnv(SparkEnv.scala:217) at org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$run$1.apply$mcV$sp(CoarseGrainedExecutorBackend.scala:186) at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:69) at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:68) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:68) at org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:151) at org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:253) at org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala) 16/09/08 15:55:53 ERROR storage.DiskBlockManager: Failed to create any local dir. 針對以上出錯的原因我們通過原始碼進行分析（1） DiskBlockManager類中的下面的方法通過日誌我們最終定位這塊出現的錯誤 /** * Create local directories for storing block data. These directories are * located inside configured local directories and won't * be deleted on JVM exit when using the external shuffle service. */ private def createLocalDirs(conf: SparkConf): Array[File] = { Utils.getConfiguredLocalDirs(conf).flatMap { rootDir => try { val localDir = Utils.createDirectory(rootDir, "blockmgr") logInfo(s"Created local directory at $localDir")

Some(localDir) } catch { case e: IOException => logError(s"Failed to create local dir in $rootDir. Ignoring this directory.", e) None } } } （2） SparkConf.scala 類中的方法這個方法告訴我們在spark-defaults.conf 中配置spark.local.dir引數在spark1.0 版本後已經過時。 /** Checks for illegal or deprecated config settings. Throws an exception for the former. Not * idempotent - may mutate this conf object to convert deprecated settings to supported ones. */ private[spark] def validateSettings() { if (contains("spark.local.dir")) { val msg = "In Spark 1.0 and later spark.local.dir will be overridden by the value set b

y " + "the cluster manager (via SPARK_LOCAL_DIRS in mesos/standalone and LOCAL_DIRS in YARN)." logWarning(msg) } val executorOptsKey = "spark.executor.extraJavaOptions" val executorClasspathKey = "spark.executor.extr 。。。。 } （3）Utils.scala 類中的方法通過分析下面的程式碼，我們發現不在spark-env.sh 下配置SPARK_LOCAL_DIRS的情況下，通過該conf.get("spark.local.dir", System.getProperty("java.io.tmpdir")).split(",")設定spark.local.dir，然後或根據路徑建立，導致上述錯誤。 故我們直接在spark-env.sh 中設定SPARK_LOCAL_DIRS 即可解決。

然後我們直接在spark-env.sh 中配置： export SPARK_LOCAL_DIRS=/home/hadoop/data/sparktmp /** * Return the configured local directories where Spark can write files. This * method does not create any directories on its own, it only encapsulates the * logic of locating the local directories according to deployment mode. */ def getConfiguredLocalDirs(conf: SparkConf): Array[String] = { val shuffleServiceEnabled = conf.getBoolean("spark.shuffle.service.enabled", false) if (isRunningInYarnContainer(conf)) { // If we are in yarn mode, systems can have different disk layouts so we must set it // to what Yarn on this system said was available. Note this assumes that Yarn has // created the directories already, and that they are secured so that only the // user has access to them. getYarnLocalDirs(conf).split(",") } else if (conf.getenv("SPARK_EXECUTOR_DIRS") != null) { conf.getenv("SPARK_EXECUTOR_DIRS").split(File.pathSeparator) } else if (conf.getenv("SPARK_LOCAL_DIRS") != null) { conf.getenv("SPARK_LOCAL_DIRS").split(",") } else if (conf.getenv("MESOS_DIRECTORY") != null && !shuffleServiceEnabled) { // Mesos already creates a directory per Mesos task. Spark should use that directory // instead so all temporary files are automatically cleaned up when the Mesos task ends. // Note that we don't want this if the shuffle service is enabled because we want to // continue to serve shuffle files after the executors that wrote them have already exited. Array(conf.getenv("MESOS_DIRECTORY")) } else { if (conf.getenv("MESOS_DIRECTORY") != null && shuffleServiceEnabled) { logInfo("MESOS_DIRECTORY available but not using provided Mesos sandbox because " + "spark.shuffle.service.enabled is enabled.") } // In non-Yarn mode (or for the driver in yarn-client mode), we cannot trust the user // configuration to point to a secure directory. So create a subdirectory with restricted // permissions under each listed directory. conf.get("spark.local.dir", System.getProperty("java.io.tmpdir")).split(",") } } 通過命令列視窗觀察日誌的生成情況，觀察Deleting directory行，發現確實改變了，終於成功了 16/09/08 14:56:19 INFO ui.SparkUI: Stopped Spark web UI at http://10.4.1.1:4040 16/09/08 14:56:19 INFO cluster.SparkDeploySchedulerBackend: Shutting down all executors 16/09/08 14:56:19 INFO cluster.SparkDeploySchedulerBackend: Asking each executor to shut down 16/09/08 14:56:19 INFO spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped! 16/09/08 14:56:19 INFO storage.MemoryStore: MemoryStore cleared 16/09/08 14:56:19 INFO storage.BlockManager: BlockManager stopped 16/09/08 14:56:19 INFO storage.BlockManagerMaster: BlockManagerMaster stopped 16/09/08 14:56:19 INFO scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped! 16/09/08 14:56:19 INFO spark.SparkContext: Successfully stopped SparkContext 16/09/08 14:56:19 INFO util.ShutdownHookManager: Shutdown hook called 16/09/08 14:56:19 INFO util.ShutdownHookManager: Deleting directory /home/hadoop/data/sparktmp/spark-a72435b2-71e7-4c07-9d60-b0dd41b71ecc 16/09/08 14:56:19 INFO remote.RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon. 16/09/08 14:56:19 INFO remote.RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports. 16/09/08 14:56:19 INFO util.ShutdownHookManager: Deleting directory /home/hadoop/data/sparktmp/spark-a72435b2-71e7-4c07-9d60-b0dd41b71ecc/httpd-7cd8762c-85b6-4e62-8e91-be668830b0a7

Spark執行在Standalone模式下產生的臨時目錄的問題

Spark執行在Standalone模式下產生的臨時目錄的問題

spark作業監控：standalone模式下檢視歷史作業

Flink在standalone模式下的打包執行常見問題總結

spark on yarn模式下內存資源管理（筆記2）

ZooKeeper原始碼學習筆記(2)--Standalone模式下的ZooKeeper

centos7下jetty臨時目錄被tmpwatch刪除導致資原始檔(css/js)無法載入的問題

spark on yarn模式下掃描帶有kerberos的hbase

c++多執行緒模式下的socket程式設計（執行緒池實現）

【Spark】Spark的Standalone模式安裝部署

解決簡單恢復模式下產生的日誌增長

vbox如何在NAT模式下訪問samba目錄

Spark的standalone模式部署

Spark的Standalone模式安裝部署

1-2、Spark的standalone模式安裝

【解決】Spark執行時產生的臨時目錄的問題

各模式下執行spark自帶例項SparkPi

Hadoop HA 模式下執行spark 程式

深入理解Spark 2.1 Core （六）：Standalone模式執行的原理與原始碼分析

spring-oauth-server入門（1-9）CLIENT_模式下 authorities的產生

linux xinetd模式下的ftp只允許某使用者切換目錄或只允許某使用者登陸

Spark執行在Standalone模式下產生的臨時目錄的問題

相關推薦