Spark源碼研讀-散篇記錄（一）：SparkConf

阿新 • • 發佈：2018-09-10

wstring unless prop acl point view prior exce same

0 關於散篇記錄

散篇記錄就是，我自己覺得有需要記錄一下以方便後來查找的內容，就記錄下來。

1 Spark版本

Spark 2.1.0。

2 說明

源碼過程中所涉及的許多Scala的知識，完全可以參考之前Scala的筆記文章，應該來說確實很多知識內容都涉及到了。

3 SparkConf源碼

SparkConf的源碼相對不難，主要是對Spark本身要有所理解，同時Scala也應該要有所掌握，那麽看起來就不太復雜，只看了比較核心的方法，整體有個思路，做了一些個人的備註，有些目前還沒有涉及到的用法自然就不會先去看，這裏作為記錄。

/*
 * Licensed to the Apache Software Foundation (ASF) under one or more
 * contributor license agreements.  See the NOTICE file distributed with
 * this work for additional information regarding copyright ownership.
 * The ASF licenses this file to You under the Apache License, Version 2.0
 * (the "License"); you may not use this file except in compliance with
 * the License.  You may obtain a copy of the License at
 *
 *    http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

package org.apache.spark

import java.util.concurrent.ConcurrentHashMap

import scala.collection.JavaConverters._
import scala.collection.mutable.LinkedHashSet

import org.apache.avro.{Schema, SchemaNormalization}

import org.apache.spark.internal.Logging
import org.apache.spark.internal.config._
import org.apache.spark.serializer.KryoSerializer
import org.apache.spark.util.Utils

/**
 * Configuration for a Spark application. Used to set various Spark parameters as key-value pairs.
 *
 * Most of the time, you would create a SparkConf object with `new SparkConf()`, which will load
 * values from any `spark.*` Java system properties set in your application as well. In this case,
 * parameters you set directly on the `SparkConf` object take priority over system properties.
 *
 * For unit tests, you can also call `new SparkConf(false)` to skip loading external settings and
 * get the same configuration no matter what the system properties are.
 *
 * All setter methods in this class support chaining. For example, you can write
 * `new SparkConf().setMaster("local").setAppName("My app")`.
 * Leaf Note: 之所以可以這麽優雅地設置這些屬性，觀察這些方法，最後調用set，並且返回了代表自身對象的this
 *
 * @param loadDefaults whether to also load values from Java system properties
 *
 * @note Once a SparkConf object is passed to Spark, it is cloned and can no longer be modified
 * by the user. Spark does not support modifying the configuration at runtime.
 */
class SparkConf(loadDefaults: Boolean) extends Cloneable with Logging with Serializable {
/** Leaf Note:
  * extends ... with ...，這裏繼承了多個特質，物質就類似於Java中的interface
  * Logging用來打日誌，Serializable，序列化，分布式環境，SparkConf對象傳來傳去的，當然需要序列化
  * 說一下Conleable，看到下面其重載了clone()方法，其實就是生成了一個配置一樣的SparkConf對象
  * 目的是避免多個組件共用同一個SparkConf對象時出現的並發問題，不同組件都使用，clone一個給你
  * 任何地方要使用SparkConf對象，調用clone方法復制一個，十分優雅
  * */

  import SparkConf._

  /** Create a SparkConf that loads defaults from system properties and the classpath */
  /** Leaf Note: 這是一個輔助構造器，其默認也是調用主構造器，也就是類定義中需要傳入參數的，它就是主構造器，這是scala的定義*/
  def this() = this(true)

  // Leaf Note: 線程安全的map，就是真正用來保存Spark的配置屬性的
  private val settings = new ConcurrentHashMap[String, String]()

  @transient private lazy val reader: ConfigReader = {
    val _reader = new ConfigReader(new SparkConfigProvider(settings))
    _reader.bindEnv(new ConfigProvider {
      override def get(key: String): Option[String] = Option(getenv(key))
    })
    _reader
  }

  if (loadDefaults) {
    loadFromSystemProperties(false)
  }

  /* Leaf Note: private[spark]中的spark,實際為org.apache.spark包，表示該方法只能在該包下使用 */
  private[spark] def loadFromSystemProperties(silent: Boolean): SparkConf = {
    // Load any spark.* system properties
    for ((key, value) <- Utils.getSystemProperties if key.startsWith("spark.")) {
      set(key, value, silent)
    }
    this
  }

  /** Set a configuration variable. */
  def set(key: String, value: String): SparkConf = {
    set(key, value, false)
  }

  /** Leaf Note: silent為false的話，就是不安靜了，設置為false，對於一些過時的配置，那麽就會給出warning */
  private[spark] def set(key: String, value: String, silent: Boolean): SparkConf = {
    if (key == null) {
      throw new NullPointerException("null key")
    }
    if (value == null) {
      throw new NullPointerException("null value for " + key)
    }
    if (!silent) {
      logDeprecationWarning(key)
    }
    settings.put(key, value)
    this
  }

  private[spark] def set[T](entry: ConfigEntry[T], value: T): SparkConf = {
    set(entry.key, entry.stringConverter(value))
    this
  }

  private[spark] def set[T](entry: OptionalConfigEntry[T], value: T): SparkConf = {
    set(entry.key, entry.rawStringConverter(value))
    this
  }

  /**
   * The master URL to connect to, such as "local" to run locally with one thread, "local[4]" to
   * run locally with 4 cores, or "spark://master:7077" to run on a Spark standalone cluster.
   */
  def setMaster(master: String): SparkConf = {
    set("spark.master", master)
  }

  /** Set a name for your application. Shown in the Spark web UI. */
  def setAppName(name: String): SparkConf = {
    set("spark.app.name", name)
  }

  /** Set JAR files to distribute to the cluster. */
  def setJars(jars: Seq[String]): SparkConf = {
    for (jar <- jars if (jar == null)) logWarning("null jar passed to SparkContext constructor")
    set("spark.jars", jars.filter(_ != null).mkString(","))
  }

  /** Set JAR files to distribute to the cluster. (Java-friendly version.) */
  def setJars(jars: Array[String]): SparkConf = {
    setJars(jars.toSeq)
  }

  /**
   * Set an environment variable to be used when launching executors for this application.
   * These variables are stored as properties of the form spark.executorEnv.VAR_NAME
   * (for example spark.executorEnv.PATH) but this method makes them easier to set.
   */
  def setExecutorEnv(variable: String, value: String): SparkConf = {
    set("spark.executorEnv." + variable, value)
  }

  /**
   * Set multiple environment variables to be used when launching executors.
   * These variables are stored as properties of the form spark.executorEnv.VAR_NAME
   * (for example spark.executorEnv.PATH) but this method makes them easier to set.
   */
  def setExecutorEnv(variables: Seq[(String, String)]): SparkConf = {
    for ((k, v) <- variables) {
      setExecutorEnv(k, v)
    }
    this
  }

  /**
   * Set multiple environment variables to be used when launching executors.
   * (Java-friendly version.)
   */
  def setExecutorEnv(variables: Array[(String, String)]): SparkConf = {
    setExecutorEnv(variables.toSeq)
  }

  /**
   * Set the location where Spark is installed on worker nodes.
   */
  def setSparkHome(home: String): SparkConf = {
    set("spark.home", home)
  }

  /** Set multiple parameters together */
  def setAll(settings: Traversable[(String, String)]): SparkConf = {
    settings.foreach { case (k, v) => set(k, v) }
    this
  }

  /** Set a parameter if it isn‘t already configured */
  def setIfMissing(key: String, value: String): SparkConf = {
    if (settings.putIfAbsent(key, value) == null) {
      logDeprecationWarning(key)
    }
    this
  }

  private[spark] def setIfMissing[T](entry: ConfigEntry[T], value: T): SparkConf = {
    if (settings.putIfAbsent(entry.key, entry.stringConverter(value)) == null) {
      logDeprecationWarning(entry.key)
    }
    this
  }

  private[spark] def setIfMissing[T](entry: OptionalConfigEntry[T], value: T): SparkConf = {
    if (settings.putIfAbsent(entry.key, entry.rawStringConverter(value)) == null) {
      logDeprecationWarning(entry.key)
    }
    this
  }

  /**
   * Use Kryo serialization and register the given set of classes with Kryo.
   * If called multiple times, this will append the classes from all calls together.
   */
  def registerKryoClasses(classes: Array[Class[_]]): SparkConf = {
    val allClassNames = new LinkedHashSet[String]()
    allClassNames ++= get("spark.kryo.classesToRegister", "").split(‘,‘).map(_.trim)
      .filter(!_.isEmpty)
    allClassNames ++= classes.map(_.getName)

    set("spark.kryo.classesToRegister", allClassNames.mkString(","))
    set("spark.serializer", classOf[KryoSerializer].getName)
    this
  }

  private final val avroNamespace = "avro.schema."

  /**
   * Use Kryo serialization and register the given set of Avro schemas so that the generic
   * record serializer can decrease network IO
   */
  def registerAvroSchemas(schemas: Schema*): SparkConf = {
    for (schema <- schemas) {
      set(avroNamespace + SchemaNormalization.parsingFingerprint64(schema), schema.toString)
    }
    this
  }

  /** Gets all the avro schemas in the configuration used in the generic Avro record serializer */
  def getAvroSchema: Map[Long, String] = {
    getAll.filter { case (k, v) => k.startsWith(avroNamespace) }
      .map { case (k, v) => (k.substring(avroNamespace.length).toLong, v) }
      .toMap
  }

  /** Remove a parameter from the configuration */
  def remove(key: String): SparkConf = {
    settings.remove(key)
    this
  }

  private[spark] def remove(entry: ConfigEntry[_]): SparkConf = {
    remove(entry.key)
  }

  /** Get a parameter; throws a NoSuchElementException if it‘s not set */
  def get(key: String): String = {
    getOption(key).getOrElse(throw new NoSuchElementException(key))
  }

  /** Get a parameter, falling back to a default if not set */
  def get(key: String, defaultValue: String): String = {
    getOption(key).getOrElse(defaultValue)
  }

  /**
   * Retrieves the value of a pre-defined configuration entry.
   *
   * - This is an internal Spark API.
   * - The return type if defined by the configuration entry.
   * - This will throw an exception is the config is not optional and the value is not set.
   */
  private[spark] def get[T](entry: ConfigEntry[T]): T = {
    entry.readFrom(reader)
  }

  /**
   * Get a time parameter as seconds; throws a NoSuchElementException if it‘s not set. If no
   * suffix is provided then seconds are assumed.
   * @throws java.util.NoSuchElementException
   */
  def getTimeAsSeconds(key: String): Long = {
    Utils.timeStringAsSeconds(get(key))
  }

  /**
   * Get a time parameter as seconds, falling back to a default if not set. If no
   * suffix is provided then seconds are assumed.
   */
  def getTimeAsSeconds(key: String, defaultValue: String): Long = {
    Utils.timeStringAsSeconds(get(key, defaultValue))
  }

  /**
   * Get a time parameter as milliseconds; throws a NoSuchElementException if it‘s not set. If no
   * suffix is provided then milliseconds are assumed.
   * @throws java.util.NoSuchElementException
   */
  def getTimeAsMs(key: String): Long = {
    Utils.timeStringAsMs(get(key))
  }

  /**
   * Get a time parameter as milliseconds, falling back to a default if not set. If no
   * suffix is provided then milliseconds are assumed.
   */
  def getTimeAsMs(key: String, defaultValue: String): Long = {
    Utils.timeStringAsMs(get(key, defaultValue))
  }

  /**
   * Get a size parameter as bytes; throws a NoSuchElementException if it‘s not set. If no
   * suffix is provided then bytes are assumed.
   * @throws java.util.NoSuchElementException
   */
  def getSizeAsBytes(key: String): Long = {
    Utils.byteStringAsBytes(get(key))
  }

  /**
   * Get a size parameter as bytes, falling back to a default if not set. If no
   * suffix is provided then bytes are assumed.
   */
  def getSizeAsBytes(key: String, defaultValue: String): Long = {
    Utils.byteStringAsBytes(get(key, defaultValue))
  }

  /**
   * Get a size parameter as bytes, falling back to a default if not set.
   */
  def getSizeAsBytes(key: String, defaultValue: Long): Long = {
    Utils.byteStringAsBytes(get(key, defaultValue + "B"))
  }

  /**
   * Get a size parameter as Kibibytes; throws a NoSuchElementException if it‘s not set. If no
   * suffix is provided then Kibibytes are assumed.
   * @throws java.util.NoSuchElementException
   */
  def getSizeAsKb(key: String): Long = {
    Utils.byteStringAsKb(get(key))
  }

  /**
   * Get a size parameter as Kibibytes, falling back to a default if not set. If no
   * suffix is provided then Kibibytes are assumed.
   */
  def getSizeAsKb(key: String, defaultValue: String): Long = {
    Utils.byteStringAsKb(get(key, defaultValue))
  }

  /**
   * Get a size parameter as Mebibytes; throws a NoSuchElementException if it‘s not set. If no
   * suffix is provided then Mebibytes are assumed.
   * @throws java.util.NoSuchElementException
   */
  def getSizeAsMb(key: String): Long = {
    Utils.byteStringAsMb(get(key))
  }

  /**
   * Get a size parameter as Mebibytes, falling back to a default if not set. If no
   * suffix is provided then Mebibytes are assumed.
   */
  def getSizeAsMb(key: String, defaultValue: String): Long = {
    Utils.byteStringAsMb(get(key, defaultValue))
  }

  /**
   * Get a size parameter as Gibibytes; throws a NoSuchElementException if it‘s not set. If no
   * suffix is provided then Gibibytes are assumed.
   * @throws java.util.NoSuchElementException
   */
  def getSizeAsGb(key: String): Long = {
    Utils.byteStringAsGb(get(key))
  }

  /**
   * Get a size parameter as Gibibytes, falling back to a default if not set. If no
   * suffix is provided then Gibibytes are assumed.
   */
  def getSizeAsGb(key: String, defaultValue: String): Long = {
    Utils.byteStringAsGb(get(key, defaultValue))
  }

  /** Get a parameter as an Option */
  def getOption(key: String): Option[String] = {
    Option(settings.get(key)).orElse(getDeprecatedConfig(key, this))
  }

  /** Get all parameters as a list of pairs */
  def getAll: Array[(String, String)] = {
    settings.entrySet().asScala.map(x => (x.getKey, x.getValue)).toArray
  }

  /**
   * Get all parameters that start with `prefix`
   */
  def getAllWithPrefix(prefix: String): Array[(String, String)] = {
    getAll.filter { case (k, v) => k.startsWith(prefix) }
      .map { case (k, v) => (k.substring(prefix.length), v) }
  }

  /** Get a parameter as an integer, falling back to a default if not set */
  def getInt(key: String, defaultValue: Int): Int = {
    getOption(key).map(_.toInt).getOrElse(defaultValue)
  }

  /** Get a parameter as a long, falling back to a default if not set */
  def getLong(key: String, defaultValue: Long): Long = {
    getOption(key).map(_.toLong).getOrElse(defaultValue)
  }

  /** Get a parameter as a double, falling back to a default if not set */
  def getDouble(key: String, defaultValue: Double): Double = {
    getOption(key).map(_.toDouble).getOrElse(defaultValue)
  }

  /** Get a parameter as a boolean, falling back to a default if not set */
  def getBoolean(key: String, defaultValue: Boolean): Boolean = {
    getOption(key).map(_.toBoolean).getOrElse(defaultValue)
  }

  /** Get all executor environment variables set on this SparkConf */
  def getExecutorEnv: Seq[(String, String)] = {
    getAllWithPrefix("spark.executorEnv.")
  }

  /**
   * Returns the Spark application id, valid in the Driver after TaskScheduler registration and
   * from the start in the Executor.
   */
  def getAppId: String = get("spark.app.id")

  /** Does the configuration contain a given parameter? */
  def contains(key: String): Boolean = {
    settings.containsKey(key) ||
      configsWithAlternatives.get(key).toSeq.flatten.exists { alt => contains(alt.key) }
  }

  private[spark] def contains(entry: ConfigEntry[_]): Boolean = contains(entry.key)

  /** Copy this object */
  /** 克隆本SparkConf對象中的配置到一個新的SparkConf對象中 */
  override def clone: SparkConf = {
    val cloned = new SparkConf(false)
    settings.entrySet().asScala.foreach { e =>
      cloned.set(e.getKey(), e.getValue(), true)
    }
    cloned
  }

  /**
   * By using this instead of System.getenv(), environment variables can be mocked
   * in unit tests.
   */
  private[spark] def getenv(name: String): String = System.getenv(name)

  /**
   * Checks for illegal or deprecated config settings. Throws an exception for the former. Not
   * idempotent - may mutate this conf object to convert deprecated settings to supported ones.
   */
  private[spark] def validateSettings() {
    if (contains("spark.local.dir")) {
      val msg = "In Spark 1.0 and later spark.local.dir will be overridden by the value set by " +
        "the cluster manager (via SPARK_LOCAL_DIRS in mesos/standalone and LOCAL_DIRS in YARN)."
      logWarning(msg)
    }

    val executorOptsKey = "spark.executor.extraJavaOptions"
    val executorClasspathKey = "spark.executor.extraClassPath"
    val driverOptsKey = "spark.driver.extraJavaOptions"
    val driverClassPathKey = "spark.driver.extraClassPath"
    val driverLibraryPathKey = "spark.driver.extraLibraryPath"
    val sparkExecutorInstances = "spark.executor.instances"

    // Used by Yarn in 1.1 and before
    sys.props.get("spark.driver.libraryPath").foreach { value =>
      val warning =
        s"""
          |spark.driver.libraryPath was detected (set to ‘$value‘).
          |This is deprecated in Spark 1.2+.
          |
          |Please instead use: $driverLibraryPathKey
        """.stripMargin
      logWarning(warning)
    }

    // Validate spark.executor.extraJavaOptions
    getOption(executorOptsKey).foreach { javaOpts =>
      if (javaOpts.contains("-Dspark")) {
        val msg = s"$executorOptsKey is not allowed to set Spark options (was ‘$javaOpts‘). " +
          "Set them directly on a SparkConf or in a properties file when using ./bin/spark-submit."
        throw new Exception(msg)
      }
      if (javaOpts.contains("-Xmx")) {
        val msg = s"$executorOptsKey is not allowed to specify max heap memory settings " +
          s"(was ‘$javaOpts‘). Use spark.executor.memory instead."
        throw new Exception(msg)
      }
    }

    // Validate memory fractions
    val deprecatedMemoryKeys = Seq(
      "spark.storage.memoryFraction",
      "spark.shuffle.memoryFraction",
      "spark.shuffle.safetyFraction",
      "spark.storage.unrollFraction",
      "spark.storage.safetyFraction")
    val memoryKeys = Seq(
      "spark.memory.fraction",
      "spark.memory.storageFraction") ++
      deprecatedMemoryKeys
    for (key <- memoryKeys) {
      val value = getDouble(key, 0.5)
      if (value > 1 || value < 0) {
        throw new IllegalArgumentException(s"$key should be between 0 and 1 (was ‘$value‘).")
      }
    }

    // Warn against deprecated memory fractions (unless legacy memory management mode is enabled)
    val legacyMemoryManagementKey = "spark.memory.useLegacyMode"
    val legacyMemoryManagement = getBoolean(legacyMemoryManagementKey, false)
    if (!legacyMemoryManagement) {
      val keyset = deprecatedMemoryKeys.toSet
      val detected = settings.keys().asScala.filter(keyset.contains)
      if (detected.nonEmpty) {
        logWarning("Detected deprecated memory fraction settings: " +
          detected.mkString("[", ", ", "]") + ". As of Spark 1.6, execution and storage " +
          "memory management are unified. All memory fractions used in the old model are " +
          "now deprecated and no longer read. If you wish to use the old memory management, " +
          s"you may explicitly enable `$legacyMemoryManagementKey` (not recommended).")
      }
    }

    // Check for legacy configs
    sys.env.get("SPARK_JAVA_OPTS").foreach { value =>
      val warning =
        s"""
          |SPARK_JAVA_OPTS was detected (set to ‘$value‘).
          |This is deprecated in Spark 1.0+.
          |
          |Please instead use:
          | - ./spark-submit with conf/spark-defaults.conf to set defaults for an application
          | - ./spark-submit with --driver-java-options to set -X options for a driver
          | - spark.executor.extraJavaOptions to set -X options for executors
          | - SPARK_DAEMON_JAVA_OPTS to set java options for standalone daemons (master or worker)
        """.stripMargin
      logWarning(warning)

      for (key <- Seq(executorOptsKey, driverOptsKey)) {
        if (getOption(key).isDefined) {
          throw new SparkException(s"Found both $key and SPARK_JAVA_OPTS. Use only the former.")
        } else {
          logWarning(s"Setting ‘$key‘ to ‘$value‘ as a work-around.")
          set(key, value)
        }
      }
    }

    sys.env.get("SPARK_CLASSPATH").foreach { value =>
      val warning =
        s"""
          |SPARK_CLASSPATH was detected (set to ‘$value‘).
          |This is deprecated in Spark 1.0+.
          |
          |Please instead use:
          | - ./spark-submit with --driver-class-path to augment the driver classpath
          | - spark.executor.extraClassPath to augment the executor classpath
        """.stripMargin
      logWarning(warning)

      for (key <- Seq(executorClasspathKey, driverClassPathKey)) {
        if (getOption(key).isDefined) {
          throw new SparkException(s"Found both $key and SPARK_CLASSPATH. Use only the former.")
        } else {
          logWarning(s"Setting ‘$key‘ to ‘$value‘ as a work-around.")
          set(key, value)
        }
      }
    }

    if (!contains(sparkExecutorInstances)) {
      sys.env.get("SPARK_WORKER_INSTANCES").foreach { value =>
        val warning =
          s"""
             |SPARK_WORKER_INSTANCES was detected (set to ‘$value‘).
             |This is deprecated in Spark 1.0+.
             |
             |Please instead use:
             | - ./spark-submit with --num-executors to specify the number of executors
             | - Or set SPARK_EXECUTOR_INSTANCES
             | - spark.executor.instances to configure the number of instances in the spark config.
        """.stripMargin
        logWarning(warning)

        set("spark.executor.instances", value)
      }
    }

    if (contains("spark.master") && get("spark.master").startsWith("yarn-")) {
      val warning = s"spark.master ${get("spark.master")} is deprecated in Spark 2.0+, please " +
        "instead use \"yarn\" with specified deploy mode."

      get("spark.master") match {
        case "yarn-cluster" =>
          logWarning(warning)
          set("spark.master", "yarn")
          set("spark.submit.deployMode", "cluster")
        case "yarn-client" =>
          logWarning(warning)
          set("spark.master", "yarn")
          set("spark.submit.deployMode", "client")
        case _ => // Any other unexpected master will be checked when creating scheduler backend.
      }
    }

    if (contains("spark.submit.deployMode")) {
      get("spark.submit.deployMode") match {
        case "cluster" | "client" =>
        case e => throw new SparkException("spark.submit.deployMode can only be \"cluster\" or " +
          "\"client\".")
      }
    }
  }

  /**
   * Return a string listing all keys and values, one per line. This is useful to print the
   * configuration out for debugging.
   */
  def toDebugString: String = {
    getAll.sorted.map{case (k, v) => k + "=" + v}.mkString("\n")
  }

}

private[spark] object SparkConf extends Logging {

  /**
   * Maps deprecated config keys to information about the deprecation.
   *
   * The extra information is logged as a warning when the config is present in the user‘s
   * configuration.
   */
  private val deprecatedConfigs: Map[String, DeprecatedConfig] = {
    val configs = Seq(
      DeprecatedConfig("spark.cache.class", "0.8",
        "The spark.cache.class property is no longer being used! Specify storage levels using " +
        "the RDD.persist() method instead."),
      DeprecatedConfig("spark.yarn.user.classpath.first", "1.3",
        "Please use spark.{driver,executor}.userClassPathFirst instead."),
      DeprecatedConfig("spark.kryoserializer.buffer.mb", "1.4",
        "Please use spark.kryoserializer.buffer instead. The default value for " +
          "spark.kryoserializer.buffer.mb was previously specified as ‘0.064‘. Fractional values " +
          "are no longer accepted. To specify the equivalent now, one may use ‘64k‘."),
      DeprecatedConfig("spark.rpc", "2.0", "Not used any more."),
      DeprecatedConfig("spark.scheduler.executorTaskBlacklistTime", "2.1.0",
        "Please use the new blacklisting options, spark.blacklist.*")
    )

    Map(configs.map { cfg => (cfg.key -> cfg) } : _*)
  }

  /**
   * Maps a current config key to alternate keys that were used in previous version of Spark.
   *
   * The alternates are used in the order defined in this map. If deprecated configs are
   * present in the user‘s configuration, a warning is logged.
   */
  private val configsWithAlternatives = Map[String, Seq[AlternateConfig]](
    "spark.executor.userClassPathFirst" -> Seq(
      AlternateConfig("spark.files.userClassPathFirst", "1.3")),
    "spark.history.fs.update.interval" -> Seq(
      AlternateConfig("spark.history.fs.update.interval.seconds", "1.4"),
      AlternateConfig("spark.history.fs.updateInterval", "1.3"),
      AlternateConfig("spark.history.updateInterval", "1.3")),
    "spark.history.fs.cleaner.interval" -> Seq(
      AlternateConfig("spark.history.fs.cleaner.interval.seconds", "1.4")),
    "spark.history.fs.cleaner.maxAge" -> Seq(
      AlternateConfig("spark.history.fs.cleaner.maxAge.seconds", "1.4")),
    "spark.yarn.am.waitTime" -> Seq(
      AlternateConfig("spark.yarn.applicationMaster.waitTries", "1.3",
        // Translate old value to a duration, with 10s wait time per try.
        translation = s => s"${s.toLong * 10}s")),
    "spark.reducer.maxSizeInFlight" -> Seq(
      AlternateConfig("spark.reducer.maxMbInFlight", "1.4")),
    "spark.kryoserializer.buffer" ->
        Seq(AlternateConfig("spark.kryoserializer.buffer.mb", "1.4",
          translation = s => s"${(s.toDouble * 1000).toInt}k")),
    "spark.kryoserializer.buffer.max" -> Seq(
      AlternateConfig("spark.kryoserializer.buffer.max.mb", "1.4")),
    "spark.shuffle.file.buffer" -> Seq(
      AlternateConfig("spark.shuffle.file.buffer.kb", "1.4")),
    "spark.executor.logs.rolling.maxSize" -> Seq(
      AlternateConfig("spark.executor.logs.rolling.size.maxBytes", "1.4")),
    "spark.io.compression.snappy.blockSize" -> Seq(
      AlternateConfig("spark.io.compression.snappy.block.size", "1.4")),
    "spark.io.compression.lz4.blockSize" -> Seq(
      AlternateConfig("spark.io.compression.lz4.block.size", "1.4")),
    "spark.rpc.numRetries" -> Seq(
      AlternateConfig("spark.akka.num.retries", "1.4")),
    "spark.rpc.retry.wait" -> Seq(
      AlternateConfig("spark.akka.retry.wait", "1.4")),
    "spark.rpc.askTimeout" -> Seq(
      AlternateConfig("spark.akka.askTimeout", "1.4")),
    "spark.rpc.lookupTimeout" -> Seq(
      AlternateConfig("spark.akka.lookupTimeout", "1.4")),
    "spark.streaming.fileStream.minRememberDuration" -> Seq(
      AlternateConfig("spark.streaming.minRememberDuration", "1.5")),
    "spark.yarn.max.executor.failures" -> Seq(
      AlternateConfig("spark.yarn.max.worker.failures", "1.5")),
    "spark.memory.offHeap.enabled" -> Seq(
      AlternateConfig("spark.unsafe.offHeap", "1.6")),
    "spark.rpc.message.maxSize" -> Seq(
      AlternateConfig("spark.akka.frameSize", "1.6")),
    "spark.yarn.jars" -> Seq(
      AlternateConfig("spark.yarn.jar", "2.0"))
    )

  /**
   * A view of `configsWithAlternatives` that makes it more efficient to look up deprecated
   * config keys.
   *
   * Maps the deprecated config name to a 2-tuple (new config name, alternate config info).
   */
  private val allAlternatives: Map[String, (String, AlternateConfig)] = {
    configsWithAlternatives.keys.flatMap { key =>
      configsWithAlternatives(key).map { cfg => (cfg.key -> (key -> cfg)) }
    }.toMap
  }

  /**
   * Return whether the given config should be passed to an executor on start-up.
   *
   * Certain authentication configs are required from the executor when it connects to
   * the scheduler, while the rest of the spark configs can be inherited from the driver later.
   */
  def isExecutorStartupConf(name: String): Boolean = {
    (name.startsWith("spark.auth") && name != SecurityManager.SPARK_AUTH_SECRET_CONF) ||
    name.startsWith("spark.ssl") ||
    name.startsWith("spark.rpc") ||
    isSparkPortConf(name)
  }

  /**
   * Return true if the given config matches either `spark.*.port` or `spark.port.*`.
   */
  def isSparkPortConf(name: String): Boolean = {
    (name.startsWith("spark.") && name.endsWith(".port")) || name.startsWith("spark.port.")
  }

  /**
   * Looks for available deprecated keys for the given config option, and return the first
   * value available.
   */
  def getDeprecatedConfig(key: String, conf: SparkConf): Option[String] = {
    configsWithAlternatives.get(key).flatMap { alts =>
      alts.collectFirst { case alt if conf.contains(alt.key) =>
        val value = conf.get(alt.key)
        if (alt.translation != null) alt.translation(value) else value
      }
    }
  }

  /**
   * Logs a warning message if the given config key is deprecated.
   */
  def logDeprecationWarning(key: String): Unit = {
    deprecatedConfigs.get(key).foreach { cfg =>
      logWarning(
        s"The configuration key ‘$key‘ has been deprecated as of Spark ${cfg.version} and " +
        s"may be removed in the future. ${cfg.deprecationMessage}")
      return
    }

    allAlternatives.get(key).foreach { case (newKey, cfg) =>
      logWarning(
        s"The configuration key ‘$key‘ has been deprecated as of Spark ${cfg.version} and " +
        s"may be removed in the future. Please use the new key ‘$newKey‘ instead.")
      return
    }
    if (key.startsWith("spark.akka") || key.startsWith("spark.ssl.akka")) {
      logWarning(
        s"The configuration key $key is not supported any more " +
          s"because Spark doesn‘t use Akka since 2.0")
    }
  }

  /**
   * Holds information about keys that have been deprecated and do not have a replacement.
   *
   * @param key The deprecated key.
   * @param version Version of Spark where key was deprecated.
   * @param deprecationMessage Message to include in the deprecation warning.
   */
  private case class DeprecatedConfig(
      key: String,
      version: String,
      deprecationMessage: String)

  /**
   * Information about an alternate configuration key that has been deprecated.
   *
   * @param key The deprecated config key.
   * @param version The Spark version in which the key was deprecated.
   * @param translation A translation function for converting old config values into new ones.
   */
  private case class AlternateConfig(
      key: String,
      version: String,
      translation: String => String = null)

}

Spark源碼研讀-散篇記錄（一）：SparkConf

wstring unless prop acl point view prior exce same 0 關於散篇記錄散篇記錄就是，我自己覺得有需要記錄一下以方便後來查找的內容，就記錄下來。 1 Spark版本 Spark 2.1.0。 2 說明源碼過程中所涉及的許多S

談談源碼管理那點事兒（一）——源碼管理十誡（轉）

我不 evel .html 文件夾 jetbrains enable thum XML 構建引言：若是還有能夠毫無偏見地涉及各個編程語言。比源碼管理軟件更必要的工具。我倒是非常想見識一下。源碼管理軟件是我們工作的必備工具，是很多開發團隊的血液。那為什麽我們都

Spring源碼分析之IOC容器（一）

util 感覺不能 end bsp initial 博文要掌握 sof 　　Spring作為當今風靡世界的Web領域的第一框架，作為一名Java開發程序員是一定要掌握的，除了需要掌握基本的使用之外，更需要掌握其實現原理，因為我們往往在開發的過程中，會出現各種各樣的異常問

Spring源碼閱讀之Springs-beans（一）容器的基本實現

beans 閱讀 gin com -i add wid ans lock 一、Spring-beans Spring源碼閱讀之Springs-beans（一）容器的基本實現

nsq源碼閱讀筆記之nsqd（一）——nsqd的配置解析和初始化

con views pos 直接 rgba 函數調用程序 spa 重命名配置解析nsqd的主函數位於apps/nsqd.go中的main函數首先main函數調用nsqFlagset和Parse進行命令行參數集初始化，然後判斷version參數是否存在，若存在，則打印版

java基礎記錄（一）：開發環境的配置

一、JDK的安裝與環境變數配置 1.jdk下載與安裝。 jdk1.8.0_192下載地址下載完成後，雙擊執行安裝檔案。可以選擇你要安裝的位置或者直接下一步，等待安裝完成，最後關閉。 2.配置環境變數。選中我的電腦右鍵-->屬性-->高階系統設定-->環境變數，

OpenCV學習記錄（一）：使用haar分類器進行人臉識別

https://blog.csdn.net/hongbin_xu/article/details/74202193 OpenCV支援的目標檢測的方法是利用樣本的Haar特徵進行的分類器訓練，得到的級聯boosted分類器（Cascade Classification）。

Quick BI功能篇之（一）：20分鐘入門

Quick BI功能篇之<一>20分鐘入門前言：最近小編幫助隔壁團隊一個小姐姐解決了個大難題：給老闆彙報業績分析，頻次提高、效率提升，還得保證團隊中的小夥伴們都得有點大資料時代的基本資料能力。小編覺得這麼好的經驗可以分享給更多志同道合的朋友們，所以

Selenium Webdriver學習記錄（一）：環境搭建（Java+Maven+Eclipse+Selenium3.x）+第一個測試demo+部分問題解決

1.Selenium的學習網站：官網：http://www.seleniumhq.org/docs/ 中文網站：易百教程-->Selenium教程：http://www.yiibai.com/selenium/ 2.搭建環境準備：安裝了Ma

MacBook Pro使用記錄（一）：手動清理記憶體

本帖最後由 luciiferre 於 13-5-11 12:58 編輯硬碟空間用盡是一件很讓人頭疼的事情，尤其是MacBook Air等裝置上的固態硬碟可用的儲存空間很少。下面為大家介紹7個高階技巧來釋放大量的硬碟空間，當然這些高階技巧更改了系統功能和檔案，必須通過使用命令列實現，還需要使用rm或rf這樣

tensorflow學習記錄（一）：在windows下的安裝

接觸了caffe之後，想學習一下tensorflow，兩者結合使用。在幾天之前，Tensorflow官方出了0.12RC版本，改版本支援在windows下的pip一鍵安裝。我們可以通過安裝Python3.5和pip或者用Anaconda 3進行安裝。

Docker學習記錄（一）：Windows7下Docker的安裝

本文主要記錄Docker在Windows7系統下的安裝過程，分享出來和大家一起學習，不足之處請批評指正。一、Docker在Windows10系統下的安裝首先簡單介紹一下Docker在Windows10系統下的安裝方法，比較簡單，直接下載DOCKER CE FOR WINDOW

Spark筆記整理（一）：spark單機安裝部署、分布式集群與HA安裝部署+spark源碼編譯

大數據 Spark [TOC] spark單機安裝部署 1.安裝scala 解壓：tar -zxvf soft/scala-2.10.5.tgz -C app/ 重命名：mv scala-2.10.5/ scala 配置到環境變量： export SCALA_HOME=/home/uplooking

Vue源碼後記-vFor列表渲染（2）

property per share turn logs eno ext 形參 dsl 這一節爭取搞完！　　　　回頭來看看那個render代碼，為了便於分析，做了更細致的註釋； (function() { // 這裏this指向vue對象下面

Vue源碼後記-vFor列表渲染（3）

undefined ++ 源碼 blog back war 什麽 tns check 　　這一節肯定能完！　　　　經過DOM字符串的AST轉化，再通過render變成vnode，最後就剩下patch到頁面上了。　　render函數跑完應該是在這裏： func

nsq源碼閱讀筆記之nsqd（三）——diskQueue

hit emp files tro interact 一次導致 store text diskQueue是backendQueue接口的一個實現。backendQueue的作用是在實現在內存go channel緩沖區滿的情況下對消息的處理的對象。除了diskQueue外

Java中String、StringBuilder、StringBuffer常用源碼分析及比較（一）：String源碼分析

array string類都是 epo sys 匹配字符串 bound 地址簡單 String：一、成員變量： /** The value is used for character storage. */ private final char value[

redis源碼分析之事務Transaction（上）

訂閱 exe uri 興趣閱讀包含如果舉例 blog 這周學習了一下redis事務功能的實現原理，本來是想用一篇文章進行總結的，寫完以後發現這塊內容比較多，而且多個命令之間又互相依賴，放在一篇文章裏一方面篇幅會比較大，另一方面文章組織結構會比較亂，不容易閱讀。因此

『計算機視覺』SSD源碼學習_基於TensorFlow（待續）

ID 使用結構 odi AS pap pts blank sets 原項目地址：SSD-Tensorflow 根據README的介紹，該項目收到了tf-slim項目中包含了多種經典網絡結構（分類用）的啟發，使用了模塊化的編程思想，可以替換檢查網絡的結構，其模塊組織如下：

NGINX源碼安裝配置詳解（./configure），最全解析

unzip roo without rpc服務所有 googl 版本並且大文件 NGINX ./configure詳解在"./configure"配置中，"--with"表示啟用模塊，也就是說這些模塊在編譯時不會自動構建&qu

Spark源碼研讀-散篇記錄（一）：SparkConf

1 Spark版本

2 說明

3 SparkConf源碼

相關推薦