1. 程式人生 > >如何檢視SparkSQL 生成的抽象語法樹?

如何檢視SparkSQL 生成的抽象語法樹?

前言

    在《Spark SQL核心剖析》書中4.3章節,談到Catalyst體系中生成的抽象語法樹的節點都是以Context來結尾,在ANLTR4以及生成的SqlBaseParser解析SQL生成,其原始碼部分就是語法解析,其生成的抽象語法樹的節點都是ParserRuleContext的子類。

提出問題

    ANLTR4解析SQL生成抽象語法樹,最終這顆樹長成什麼樣子,如何檢視?

原始碼分析

測試示例

spark.sql("select id, count(name) from student group by id").show()

原始碼入口

SparkSession的sql 方法如下:

def sql(sqlText: String): DataFrame = {
    // TODO 1. 生成LogicalPlan
    // sqlParser 為 SparkSqlParser
    val logicalPlan: LogicalPlan = sessionState.sqlParser.parsePlan(sqlText)
    // 根據 LogicalPlan
    val frame: DataFrame = Dataset.ofRows(self, logicalPlan)
    frame // sqlParser
  }

定位SparkSqlParser

入口原始碼涉及到SessionState這個關鍵類,其初始化程式碼如下:

lazy val sessionState: SessionState = {
    parentSessionState
      .map(_.clone(this))
      .getOrElse {
        // 構建 org.apache.spark.sql.internal.SessionStateBuilder
        val state = SparkSession.instantiateSessionState(
          SparkSession.sessionStateClassName(sparkContext.conf),
          self)
        initialSessionOptions.foreach { case (k, v) => state.conf.setConfString(k, v) }
        state
      }
  }

org.apache.spark.sql.SparkSession$#sessionStateClassName 方法具體如下:

private def sessionStateClassName(conf: SparkConf): String = {
    // spark.sql.catalogImplementation, 分為 hive 和 in-memory模式,預設為 in-memory 模式
    conf.get(CATALOG_IMPLEMENTATION) match {
      case "hive" => HIVE_SESSION_STATE_BUILDER_CLASS_NAME // hive 實現 org.apache.spark.sql.hive.HiveSessionStateBuilder
      case "in-memory" => classOf[SessionStateBuilder].getCanonicalName // org.apache.spark.sql.internal.SessionStateBuilder
    }
  }

其中,這裡用到了builder模式,org.apache.spark.sql.internal.SessionStateBuilder就是用來構建 SessionState的。在 SparkSession.instantiateSessionState 中有具體說明,如下:

/**
   * Helper method to create an instance of `SessionState` based on `className` from conf.
   * The result is either `SessionState` or a Hive based `SessionState`.
   */
  private def instantiateSessionState(
      className: String,
      sparkSession: SparkSession): SessionState = {
    try {
      // org.apache.spark.sql.internal.SessionStateBuilder
      // invoke `new [Hive]SessionStateBuilder(SparkSession, Option[SessionState])`
      val clazz = Utils.classForName(className)
      val ctor = clazz.getConstructors.head
      ctor.newInstance(sparkSession, None).asInstanceOf[BaseSessionStateBuilder].build()
    } catch {
      case NonFatal(e) =>
        throw new IllegalArgumentException(s"Error while instantiating '$className':", e)
    }
  }

其中,BaseSessionStateBuilder下面有兩個主要實現,分別為 org.apache.spark.sql.hive.HiveSessionStateBuilder(hive模式) 和 org.apache.spark.sql.internal.SessionStateBuilder(in-memory模式,預設)

org.apache.spark.sql.internal.BaseSessionStateBuilder#build 方法,原始碼如下:

/**
   * Build the [[SessionState]].
   */
  def build(): SessionState = {
    new SessionState(
      session.sharedState,
      conf,
      experimentalMethods,
      functionRegistry,
      udfRegistration,
      () => catalog,
      sqlParser,
      () => analyzer,
      () => optimizer,
      planner,
      streamingQueryManager,
      listenerManager,
      () => resourceLoader,
      createQueryExecution,
      createClone)
  }

SessionState中,包含了很多的引數,關鍵引數介紹如下:

conf:SparkConf物件,對SparkSession的配置

functionRegistry:FunctionRegistry物件,負責函式的註冊,其內部維護了一個map物件用於維護註冊的函式。

UDFRegistration:UDFRegistration物件,用於註冊UDF函式,其依賴於FunctionRegistry

catalogBuilder: () => SessionCatalog:返回SessionCatalog物件,其主要用於管理SparkSession的Catalog

sqlParser: ParserInterface, 實際為 SparkSqlParser 例項,其內部呼叫ASTBuilder將SQL解析為抽象語法樹

analyzerBuilder: () => Analyzer, org.apache.spark.sql.internal.BaseSessionStateBuilder.analyzer 自定義 org.apache.spark.sql.catalyst.analysis.Analyzer.Analyzer

optimizerBuilder: () => Optimizer, // org.apache.spark.sql.internal.BaseSessionStateBuilder.optimizer --> 自定義 org.apache.spark.sql.execution.SparkOptimizer.SparkOptimizer

planner: SparkPlanner, // org.apache.spark.sql.internal.BaseSessionStateBuilder.planner --> 自定義 org.apache.spark.sql.execution.SparkPlanner.SparkPlanner

resourceLoaderBuilder: () => SessionResourceLoader,返回資源載入器,主要用於載入函式的jar或資源

createQueryExecution: LogicalPlan => QueryExecution:根據LogicalPlan生成QueryExecution物件

parsePlan方法

SparkSqlParser沒有該方法的實現,具體是現在其父類 AbstractSqlParser中,如下:

/** Creates LogicalPlan for a given SQL string. */
    // TODO 根據 sql語句生成 邏輯計劃 LogicalPlan
  override def parsePlan(sqlText: String): LogicalPlan = parse(sqlText) { parser =>
      val singleStatementContext: SqlBaseParser.SingleStatementContext = parser.singleStatement()
    astBuilder.visitSingleStatement(singleStatementContext) match {
      case plan: LogicalPlan => plan
      case _ =>
        val position = Origin(None, None)
        throw new ParseException(Option(sqlText), "Unsupported SQL statement", position, position)
    }
  }

其中 parse 方法後面的方法是一個回撥函式,它在parse 方法中被呼叫,如下:

org.apache.spark.sql.execution.SparkSqlParser#parse原始碼如下:

private val substitutor = new VariableSubstitution(conf) // 引數替換器

  protected override def parse[T](command: String)(toResult: SqlBaseParser => T): T = {
    super.parse(substitutor.substitute(command))(toResult)
  }

其中,substitutor是一個引數替換器,用於把SQL中的引數都替換掉,繼續看其父類AbstractSqlParser的parse 方法:

protected def parse[T](command: String)(toResult: SqlBaseParser => T): T = {
    logDebug(s"Parsing command: $command")

    // 詞法分析
    val lexer = new SqlBaseLexer(new UpperCaseCharStream(CharStreams.fromString(command)))
    lexer.removeErrorListeners()
    lexer.addErrorListener(ParseErrorListener)
    lexer.legacy_setops_precedence_enbled = SQLConf.get.setOpsPrecedenceEnforced

    // 語法分析
    val tokenStream = new CommonTokenStream(lexer)
    val parser = new SqlBaseParser(tokenStream)
    parser.addParseListener(PostProcessor)
    parser.removeErrorListeners()
    parser.addErrorListener(ParseErrorListener)
    parser.legacy_setops_precedence_enbled = SQLConf.get.setOpsPrecedenceEnforced

    try {
      try {
        // first, try parsing with potentially faster SLL mode
        parser.getInterpreter.setPredictionMode(PredictionMode.SLL)
        // 使用 AstBuilder 生成 Unresolved LogicalPlan
        toResult(parser)
      }
      catch {
        case e: ParseCancellationException =>
          // if we fail, parse with LL mode
          tokenStream.seek(0) // rewind input stream
          parser.reset()

          // Try Again.
          parser.getInterpreter.setPredictionMode(PredictionMode.LL)
          toResult(parser)
      }
    }
    catch {
      case e: ParseException if e.command.isDefined =>
        throw e
      case e: ParseException =>
        throw e.withCommand(command)
      case e: AnalysisException =>
        val position = Origin(e.line, e.startPosition)
        throw new ParseException(Option(command), e.message, position, position)
    }
  }

在這個方法中呼叫ANLTR4的API將SQL轉換為AST抽象語法樹,然後呼叫 toResult(parser) 方法,這個 toResult 方法就是parsePlan 方法的回撥方法。

截止到呼叫astBuilder.visitSingleStatement 方法之前, AST抽象語法樹已經生成。

列印生成的AST

修改原始碼

下面,看 astBuilder.visitSingleStatement  方法:

override def visitSingleStatement(ctx: SingleStatementContext): LogicalPlan = withOrigin(ctx) {
    val statement: StatementContext = ctx.statement
    printRuleContextInTreeStyle(statement, 1)
    // 呼叫accept 生成 邏輯運算元樹AST
    visit(statement).asInstanceOf[LogicalPlan]
  }

在使用訪問者模式訪問AST節點生成UnResolved LogicalPlan之前,我定義了一個方法用來列印剛解析生成的抽象語法樹, printRuleContextInTreeStyle 程式碼如下:

/**
   * 樹形列印抽象語法樹
   */
  private def printRuleContextInTreeStyle(ctx: ParserRuleContext, level:Int): Unit = {
    val prefix:String = "|"
    val curLevelStr: String = "-" * level
    val childLevelStr: String = "-" * (level + 1)
    println(s"${prefix}${curLevelStr} ${ctx.getClass.getCanonicalName}")
    val children: util.List[ParseTree] = ctx.children
    if( children == null || children.size() == 0) {
      return
    }
    children.iterator().foreach {
      case context: ParserRuleContext => printRuleContextInTreeStyle(context, level + 1)
      case _ => println(s"${prefix}${childLevelStr} ${ctx.getClass.getCanonicalName}")
    }
  }

三種SQL列印示例

SQL示例1(帶where)

select name from student where age > 18

其生成的AST如下:

|- org.apache.spark.sql.catalyst.parser.SqlBaseParser.StatementDefaultContext
|-- org.apache.spark.sql.catalyst.parser.SqlBaseParser.QueryContext
|--- org.apache.spark.sql.catalyst.parser.SqlBaseParser.SingleInsertQueryContext
|---- org.apache.spark.sql.catalyst.parser.SqlBaseParser.QueryTermDefaultContext
|----- org.apache.spark.sql.catalyst.parser.SqlBaseParser.QueryPrimaryDefaultContext
|------ org.apache.spark.sql.catalyst.parser.SqlBaseParser.QuerySpecificationContext
|------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.QuerySpecificationContext
|------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.NamedExpressionSeqContext
|-------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.NamedExpressionContext
|--------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.ExpressionContext
|---------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.PredicatedContext
|----------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.ValueExpressionDefaultContext
|------------ org.apache.spark.sql.catalyst.parser.SqlBaseParser.ColumnReferenceContext
|------------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.IdentifierContext
|-------------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.UnquotedIdentifierContext
|--------------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.UnquotedIdentifierContext
|------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.FromClauseContext
|-------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.FromClauseContext
|-------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.RelationContext
|--------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.TableNameContext
|---------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.TableIdentifierContext
|----------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.IdentifierContext
|------------ org.apache.spark.sql.catalyst.parser.SqlBaseParser.UnquotedIdentifierContext
|------------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.UnquotedIdentifierContext
|---------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.TableAliasContext
|------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.QuerySpecificationContext
|------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.PredicatedContext
|-------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.ComparisonContext
|--------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.ValueExpressionDefaultContext
|---------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.ColumnReferenceContext
|----------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.IdentifierContext
|------------ org.apache.spark.sql.catalyst.parser.SqlBaseParser.UnquotedIdentifierContext
|------------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.UnquotedIdentifierContext
|--------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.ComparisonOperatorContext
|---------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.ComparisonOperatorContext
|--------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.ValueExpressionDefaultContext
|---------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.ConstantDefaultContext
|----------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.NumericLiteralContext
|------------ org.apache.spark.sql.catalyst.parser.SqlBaseParser.IntegerLiteralContext
|------------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.IntegerLiteralContext
|---- org.apache.spark.sql.catalyst.parser.SqlBaseParser.QueryOrganizationContext

SQL示例2(帶排序)

select name from student where age > 18 order by id desc

其生成的AST如下:

|- org.apache.spark.sql.catalyst.parser.SqlBaseParser.StatementDefaultContext
|-- org.apache.spark.sql.catalyst.parser.SqlBaseParser.QueryContext
|--- org.apache.spark.sql.catalyst.parser.SqlBaseParser.SingleInsertQueryContext
|---- org.apache.spark.sql.catalyst.parser.SqlBaseParser.QueryTermDefaultContext
|----- org.apache.spark.sql.catalyst.parser.SqlBaseParser.QueryPrimaryDefaultContext
|------ org.apache.spark.sql.catalyst.parser.SqlBaseParser.QuerySpecificationContext
|------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.QuerySpecificationContext
|------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.NamedExpressionSeqContext
|-------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.NamedExpressionContext
|--------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.ExpressionContext
|---------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.PredicatedContext
|----------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.ValueExpressionDefaultContext
|------------ org.apache.spark.sql.catalyst.parser.SqlBaseParser.ColumnReferenceContext
|------------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.IdentifierContext
|-------------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.UnquotedIdentifierContext
|--------------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.UnquotedIdentifierContext
|------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.FromClauseContext
|-------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.FromClauseContext
|-------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.RelationContext
|--------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.TableNameContext
|---------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.TableIdentifierContext
|----------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.IdentifierContext
|------------ org.apache.spark.sql.catalyst.parser.SqlBaseParser.UnquotedIdentifierContext
|------------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.UnquotedIdentifierContext
|---------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.TableAliasContext
|------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.QuerySpecificationContext
|------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.PredicatedContext
|-------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.ComparisonContext
|--------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.ValueExpressionDefaultContext
|---------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.ColumnReferenceContext
|----------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.IdentifierContext
|------------ org.apache.spark.sql.catalyst.parser.SqlBaseParser.UnquotedIdentifierContext
|------------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.UnquotedIdentifierContext
|--------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.ComparisonOperatorContext
|---------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.ComparisonOperatorContext
|--------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.ValueExpressionDefaultContext
|---------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.ConstantDefaultContext
|----------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.NumericLiteralContext
|------------ org.apache.spark.sql.catalyst.parser.SqlBaseParser.IntegerLiteralContext
|------------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.IntegerLiteralContext
|---- org.apache.spark.sql.catalyst.parser.SqlBaseParser.QueryOrganizationContext
|----- org.apache.spark.sql.catalyst.parser.SqlBaseParser.QueryOrganizationContext
|----- org.apache.spark.sql.catalyst.parser.SqlBaseParser.QueryOrganizationContext
|----- org.apache.spark.sql.catalyst.parser.SqlBaseParser.SortItemContext
|------ org.apache.spark.sql.catalyst.parser.SqlBaseParser.ExpressionContext
|------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.PredicatedContext
|-------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.ValueExpressionDefaultContext
|--------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.ColumnReferenceContext
|---------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.IdentifierContext
|----------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.UnquotedIdentifierContext
|------------ org.apache.spark.sql.catalyst.parser.SqlBaseParser.UnquotedIdentifierContext
|------ org.apache.spark.sql.catalyst.parser.SqlBaseParser.SortItemContext

SQL示例2(帶分組)

select id, count(name) from student group by id

其生成的AST如下:

|- org.apache.spark.sql.catalyst.parser.SqlBaseParser.StatementDefaultContext
|-- org.apache.spark.sql.catalyst.parser.SqlBaseParser.QueryContext
|--- org.apache.spark.sql.catalyst.parser.SqlBaseParser.SingleInsertQueryContext
|---- org.apache.spark.sql.catalyst.parser.SqlBaseParser.QueryTermDefaultContext
|----- org.apache.spark.sql.catalyst.parser.SqlBaseParser.QueryPrimaryDefaultContext
|------ org.apache.spark.sql.catalyst.parser.SqlBaseParser.QuerySpecificationContext
|------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.QuerySpecificationContext
|------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.NamedExpressionSeqContext
|-------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.NamedExpressionContext
|--------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.ExpressionContext
|---------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.PredicatedContext
|----------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.ValueExpressionDefaultContext
|------------ org.apache.spark.sql.catalyst.parser.SqlBaseParser.ColumnReferenceContext
|------------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.IdentifierContext
|-------------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.UnquotedIdentifierContext
|--------------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.UnquotedIdentifierContext
|-------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.NamedExpressionSeqContext
|-------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.NamedExpressionContext
|--------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.ExpressionContext
|---------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.PredicatedContext
|----------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.ValueExpressionDefaultContext
|------------ org.apache.spark.sql.catalyst.parser.SqlBaseParser.FunctionCallContext
|------------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.QualifiedNameContext
|-------------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.IdentifierContext
|--------------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.UnquotedIdentifierContext
|---------------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.UnquotedIdentifierContext
|------------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.FunctionCallContext
|------------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.ExpressionContext
|-------------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.PredicatedContext
|--------------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.ValueExpressionDefaultContext
|---------------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.ColumnReferenceContext
|----------------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.IdentifierContext
|------------------ org.apache.spark.sql.catalyst.parser.SqlBaseParser.UnquotedIdentifierContext
|------------------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.UnquotedIdentifierContext
|------------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.FunctionCallContext
|------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.FromClauseContext
|-------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.FromClauseContext
|-------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.RelationContext
|--------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.TableNameContext
|---------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.TableIdentifierContext
|----------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.IdentifierContext
|------------ org.apache.spark.sql.catalyst.parser.SqlBaseParser.UnquotedIdentifierContext
|------------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.UnquotedIdentifierContext
|---------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.TableAliasContext
|------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.AggregationContext
|-------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.AggregationContext
|-------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.AggregationContext
|-------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.ExpressionContext
|--------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.PredicatedContext
|---------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.ValueExpressionDefaultContext
|----------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.ColumnReferenceContext
|------------ org.apache.spark.sql.catalyst.parser.SqlBaseParser.IdentifierContext
|------------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.UnquotedIdentifierContext
|-------------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.UnquotedIdentifierContext
|---- org.apache.spark.sql.catalyst.parser.SqlBaseParser.QueryOrganizationContext

總結

在本篇文章中,主要從測試程式碼出發,到如何呼叫ANTLR4解析SQL得到生成AST,並且修改了原始碼來列印這個AST樹。儘管現在看來,使用ANTLR解析SQL生成AST是一個black box,但對於Spark SQL來說,其後續流程的輸入已經得