Spark-submit原始碼提交流程(spark版本2.2.0)

阿新 • • 發佈：2018-12-25

今天查看了一下spark任務提交任務原始碼，有點感想，來跟大家分享一下，有誤還請指出來，謝謝

1、先來看一下spark-submit的這個類的起使程式碼


  // Cluster managers
  private val YARN = 1
  private val STANDALONE = 2
  private val MESOS = 4
  private val LOCAL = 8
  private val ALL_CLUSTER_MGRS = YARN | STANDALONE | MESOS | LOCAL

  // Deploy modes
  private val CLIENT = 1
  private val CLUSTER = 2
  private val ALL_DEPLOY_MODES = CLIENT | CLUSTER
// Special primary resource names that represent shells rather than application jars.
  private val SPARK_SHELL = "spark-shell"
  private val PYSPARK_SHELL = "pyspark-shell"
  private val SPARKR_SHELL = "sparkr-shell"
  private val SPARKR_PACKAGE_ARCHIVE = "sparkr.zip"
  private val R_PACKAGE_ARCHIVE = "rpkg.zip"

  private val CLASS_NOT_FOUND_EXIT_STATUS = 101
// scalastyle:off println
  private[spark] def printVersionAndExit(): Unit = {
  //這個類是一個object，相當於java的一個單例類，在呼叫的時候會載入上面一些配置
    printStream.println("""Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version %s
      /_/
                        """.format(SPARK_VERSION))

2、找到這個類的主方法main

    override def main(args: Array[String]): Unit = {
    //解析引數 賦給appArgs
    val appArgs = new SparkSubmitArguments(args)
    if (appArgs.verbose) {
        //預設是false 如果引數太多的話，會達到一個閾值，則會退出程式
        //檢視這個printSteam 裡面返回一個錯誤System.err
      // scalastyle:off println
      printStream.println(appArgs)
      // scalastyle:on println
    }
    //判斷動作型別
    appArgs.action match {
        //提交任務 開始執行submit方法，現在我們進入sbumit的程式碼看看
      case SparkSubmitAction.SUBMIT => submit(appArgs)
      case SparkSubmitAction.KILL => kill(appArgs)
      case SparkSubmitAction.REQUEST_STATUS => requestStatus(appArgs)
    }
  }

3、看一下 printStream.println(appArgs)這個方法

private[spark] var printStream: PrintStream = System.err //返回一個錯誤

4、進入submit，檢視執行流程

  @tailrec
  private def submit(args: SparkSubmitArguments): Unit = {
    val (childArgs, childClasspath, sysProps, childMainClass) = prepareSubmitEnvironment(args)
  //重寫main方法 相當於執行了一個執行緒
    def doRunMain(): Unit = {
      if (args.proxyUser != null) {
        val proxyUser = UserGroupInformation.createProxyUser(args.proxyUser,
          UserGroupInformation.getCurrentUser())
        try {
          proxyUser.doAs(new PrivilegedExceptionAction[Unit]() {
            override def run(): Unit = {
              //執行main方法 在這裡執行runmain方法，點進去一看
              runMain(childArgs, childClasspath, sysProps, childMainClass, args.verbose)
            }
          })
        } catch {
          case e: Exception =>
            // Hadoop's AuthorizationException suppresses the exception's stack trace, which
            // makes the message printed to the output by the JVM not very helpful. Instead,
            // detect exceptions with empty stack traces here, and treat them differently.
            if (e.getStackTrace().length == 0) {
              // scalastyle:off println
              printStream.println(s"ERROR: ${e.getClass().getName()}: ${e.getMessage()}")
              // scalastyle:on println
              exitFn(1)
            } else {
              throw e
            }
        }
      } else {
        runMain(childArgs, childClasspath, sysProps, childMainClass, args.verbose)
      }
    }

5、檢視runmain方法

	private def runMain(
      childArgs: Seq[String],
      childClasspath: Seq[String],
      sysProps: Map[String, String],
      childMainClass: String,
      verbose: Boolean): Unit = {
    // scalastyle:off println
    if (verbose) {
      printStream.println(s"Main class:\n$childMainClass")
      printStream.println(s"Arguments:\n${childArgs.mkString("\n")}")
      // sysProps may contain sensitive information, so redact before printing
      printStream.println(s"System properties:\n${Utils.redact(sysProps).mkString("\n")}")
      printStream.println(s"Classpath elements:\n${childClasspath.mkString("\n")}")
      printStream.println("\n")
    }
    // scalastyle:on println

    val loader =
      if (sysProps.getOrElse("spark.driver.userClassPathFirst", "false").toBoolean) {
        new ChildFirstURLClassLoader(new Array[URL](0),
          Thread.currentThread.getContextClassLoader)
      } else {
        new MutableURLClassLoader(new Array[URL](0),
          Thread.currentThread.getContextClassLoader)
      }
    Thread.currentThread.setContextClassLoader(loader)

    for (jar <- childClasspath) {
      addJarToClasspath(jar, loader)
    }

    for ((key, value) <- sysProps) {
      System.setProperty(key, value)
    }

    var mainClass: Class[_] = null

    try {
      //通過反射拿到目標類  這裡是重要的  
      mainClass = Utils.classForName(childMainClass)
    } catch {
      case e: ClassNotFoundException =>
        e.printStackTrace(printStream)
        if (childMainClass.contains("thriftserver")) {
          // scalastyle:off println
          printStream.println(s"Failed to load main class $childMainClass.")
          printStream.println("You need to build Spark with -Phive and -Phive-thriftserver.")
          // scalastyle:on println
        }
        System.exit(CLASS_NOT_FOUND_EXIT_STATUS)
      case e: NoClassDefFoundError =>
        e.printStackTrace(printStream)
        if (e.getMessage.contains("org/apache/hadoop/hive")) {
          // scalastyle:off println
          printStream.println(s"Failed to load hive class.")
          printStream.println("You need to build Spark with -Phive and -Phive-thriftserver.")
          // scalastyle:on println
        }
        System.exit(CLASS_NOT_FOUND_EXIT_STATUS)
    }

    // SPARK-4170
    if (classOf[scala.App].isAssignableFrom(mainClass)) {
      printWarning("Subclasses of scala.App may not work correctly. Use a main() method instead.")
    }
  //通過目標列獲得main方法
    val mainMethod = mainClass.getMethod("main", new Array[String](0).getClass)
    if (!Modifier.isStatic(mainMethod.getModifiers)) {
      throw new IllegalStateException("The main method in the given main class must be static")
    }
     @tailrec
    def findCause(t: Throwable): Throwable = t match {
      case e: UndeclaredThrowableException =>
        if (e.getCause() != null) findCause(e.getCause()) else e
      case e: InvocationTargetException =>
        if (e.getCause() != null) findCause(e.getCause()) else e
      case e: Throwable =>
        e
    }

    try {
      //呼叫目標類的Main方法
      mainMethod.invoke(null, childArgs.toArray)
    } catch {
      case t: Throwable =>
        findCause(t) match {
          case SparkUserAppException(exitCode) =>
            System.exit(exitCode)

          case t: Throwable =>
            throw t
        }
    }

這裡主要做的就是通過反射拿到目標類，通過目標類獲取main方法，然後呼叫目標類的main方法，開始執行任務

Spark-submit原始碼提交流程(spark版本2.2.0)

今天查看了一下spark任務提交任務原始碼，有點感想，來跟大家分享一下，有誤還請指出來，謝謝 1、先來看一下spark-submit的這個類的起使程式碼 // Cluster managers private val YARN = 1 private val STAND

Spark-原始碼-Spark-Submit 任務提交

Spark 版本:1.3 呼叫shell, spark-submit.sh args[] 首先是進入 org.apache.spark.deploy.SparkSubmit 類中呼叫他的 main() 方法 def main(args: Array[String])

Spark修煉之道（高階篇）——Spark原始碼閱讀：第一節 Spark應用程式提交流程

作者：搖擺少年夢微訊號： zhouzhihubeyond spark-submit 指令碼應用程式提交流程在執行Spar應用程式時，會將spark應用程式打包後使用spark-submit指令碼提交到Spark中執行，執行提交命令如下： root@s

Spark-submit方式提交spark任務

1. 提交命令 export ALL_JARS='--jars /ljj/hbase/phoenix-4.10.0-HBase-1.1-client.jar,/ljj/hbase/phoenix-spark-4.10.0-HBase-1.1.jar,/ljj/hbase/phoen

第一節 Spark2.3原始碼解析之spark2-submit程式提交流程

本系列”spark2原始碼解析”,均以最新spark2.3.0版本為藍本進行編寫,轉載請註明出處 spark2-submit指令碼應用程式提交流程一目錄 1.打包程式提交的流程demo,注意cdh spark2.3.x已改為spark2-submi

【Spark核心原始碼】解析“spark-shell”（二）

接著【初探Spark核心】解析“spark-shell”（一）來看根據main的執行日誌來看，我們直接看一下org.apache.spark.repl.Main.main方法： main方法中建立了SparkILoop物件，作為引數傳遞給了doMain方法，並呼叫了doMain

【Spark核心原始碼】解析“spark-shell”（一）

目錄指令碼分析遠端監控之前使用spark-shell，編寫了一個word count程程式【初探Spark核心】Word Count程式的簡單分析，spark-shell究竟都為我們做了些什麼，下面就好好分析一下。指令碼分析當我們輸入指令“spark-shell”

Spark core原始碼分析之spark叢集的啟動（二）

2.2 Worker的啟動 org.apache.spark.deploy.worker 1 從Worker的伴生物件的main方法進入在main方法中首先是得到一個SparkConf例項conf，然後將conf和啟動Worker傳入的引數封裝得到Wor

spark-submit時上傳spark依賴到hdfs時間較長問題解決

spark-submit時，發現上傳spark依賴到hdfs 時間長達數分鐘，現象如下方截圖：這個日誌之後在上傳程式依賴的jar，根據不同網路負荷，需要耗時數十秒甚至數分鐘，導致任務提交速度超級慢，在官網上查到出現這種現象的原因：https://spark.apache.org/do

使用 IntelliJ IDEA 匯入 Spark 最新原始碼及編譯 Spark 原始碼（博主強烈推薦）

前言　其實啊，無論你是初學者還是具備了有一定spark程式設計經驗，都需要對spark原始碼足夠重視起來。　　本人，肺腑之己見，想要成為大資料的大牛和頂尖專家，多結合原始碼和操練程式設計。準備工作 1、scala 2.10.4(本地的安裝) 　　2、Jdk1.

discuz登入流程解析(版本X3.2)

discuz登入流程解析，最近在研究，Ucenter的同步登陸機制，就先從discuz的登入開始了 1.form表單提交 member.php?mod=logging&action=login&loginsubmit=yes&handlekey=l

Spark2.x原始碼分析---spark-submit提交流程

本文以spark on yarn的yarn-cluster模式進行原始碼解析，如有不妥之處，歡迎吐槽。步驟1.spark-submit提交任務指令碼 spark-submit --class 主類路徑 \ --master yarn \ --deploy-mode c

Spark原始碼走讀（一） —— Spark應用提交流程

Spark應用是使用spark-submit指令碼提交，指令碼內容如下，可知該指令碼把SparkSubmit類作為引數傳給spark-class指令碼略去spark-class指令碼上面的載入配

Spark提交應用程序之Spark-Submit分析

需要使用 please requested 建議 eas -m rfs export 1.提交應用程序在提交應用程序的時候，用到 spark-submit 腳本。我們來看下這個腳本： if [ -z "${SPARK_HOME}" ]; then export

大資料基礎之Spark（1）Spark Submit即Spark任務提交過程

Spark版本2.1.1 一 Spark Submit本地解析 1.1 現象提交命令： spark-submit --master local[10] --driver-memory 30g --class app.package.AppClass app-1

[Spark版本更新]--Spark-2.4.0 釋出說明

2018-11-02 Apache Spark 官方釋出了 2.4.0版本，以下是 Release Notes，供參考： Sub-task [ SPARK-6236 ] - 支援大於2G的快取塊 [ SPARK-6237 ] -

idea打jar包與spark-submit提交叢集

一、idea打jar包 project Structure中選擇Aritifacts 選擇+號新建一個要打的jar包刪除除了 compile output之外的叢集中已經存在的jar包，除非引入了叢集中不存在的jar包選擇設定主類，再build->

spark-submit 提交任務報錯 java.lang.ClassNotFoundException: Demo02

案例：把sparksql的程式提交到spark的單機模式下執行 package demo01 import org.apache.spark.SparkContext import org.apache.spark.sql.SQLContext import org.apache.spa

第二天 -- Spark叢集啟動流程 -- 任務提交流程 -- RDD依賴關係 -- RDD快取 -- 兩個案例

第二天 – Spark叢集啟動流程 – 任務提交流程 – RDD依賴關係 – RDD快取 – 兩個案例文章目錄第二天 -- Spark叢集啟動流程 -- 任務提交流程 -- RDD依賴關係 -- RDD快取 -- 兩個案例一、Spa

Spark Streaming實時流處理筆記（1）——Spark-2.2.0原始碼編譯

1 下載原始碼 https://spark.apache.org/downloads.html 解壓 2 編譯原始碼參考 https://www.imooc.com/article/18419 https://spark.apache.org/docs/2.2.2/bu

Spark-submit原始碼提交流程(spark版本2.2.0)

今天查看了一下spark任務提交任務原始碼，有點感想，來跟大家分享一下，有誤還請指出來，謝謝

1、先來看一下spark-submit的這個類的起使程式碼

2、找到這個類的主方法main

3、看一下 printStream.println(appArgs)這個方法

4、進入submit，檢視執行流程

5、檢視runmain方法

相關推薦