[6] Hive3.x SemanticAnalyzer and CalcitePlanner 物化檢視相關原始碼-02

阿新 • • 發佈：2018-12-09

接Hive3.x SemanticAnalyzer and CalcitePlanner 物化檢視相關原始碼

SemanticAnalyzer

void analyzeInternal(ASTNode ast, PlannerContextFactory pcf) {
     ....
     // 1. Generate Resolved Parse tree from syntax tree
    boolean needsTransform = needsTransform();
    
    // 2. Gen OP Tree from resolved Parse Tree 

    Operator sinkOp = genOPTree(ast, plannerCtx);//進入CalcitePlanner::getOPTree
    //---待續
}

CalcitePlanner::getOPTree

這裡入參為hive的ASTNode
Operator genOPTree(ASTNode ast, PlannerContext plannerCtx){

      ......
      // 1. Gen Optimized AST
         ASTNode newAST = getOptimizedAST();
      

}

CalcitePlanner::getOptimizedAST

 ASTNode getOptimizedAST() throws SemanticException {
    //用calcite優化查詢，生成calcite的RelNode
    RelNode optimizedOptiqPlan = logicalPlan();
    //將RelNode轉化為hive的ASTNode
    ASTNode optiqOptimizedAST = ASTConverter.convert(optimizedOptiqPlan, resultSchema,
            HiveConf. 
getBoolVar(conf, HiveConf.ConfVars.HIVE_COLUMN_ALIGNMENT));
    return optiqOptimizedAST;
  }

CalcitePlanner:: logicalPlan

RelNode logicalPlan() throws SemanticException {
    RelNode optimizedOptiqPlan = null;
    CalcitePlannerAction calcitePlannerAction = null;
  
    /**
     * Map of table name to names of accessed columns
     */
    Map<String, Set<String>>  this.columnAccessInfo = new ColumnAccessInfo();
    /**
     * CalcitePlannerAction is code responsible for Calcite plan generation and optimization.
     */
    calcitePlannerAction = new CalcitePlannerAction(
        prunedPartitions,
        ctx.getOpContext().getColStatsCache(),
        this.columnAccessInfo);
        
    //calcite 優化plan，這裡會調起優化工作CalcitePlanner::CalcitePlannerAction::apply()
    optimizedOptiqPlan = Frameworks.withPlanner(calcitePlannerAction, Frameworks
          .newConfigBuilder().typeSystem(new HiveTypeSystemImpl()).build());
    return optimizedOptiqPlan;
  }

CalcitePlanner::CalcitePlannerAction

CalcitePlannerAction is code responsible for Calcite plan generation and optimization.

CalcitePlanner::CalcitePlannerAction::apply(

    @Override
    public RelNode apply(RelOptCluster cluster, RelOptSchema relOptSchema, SchemaPlus rootSchema) {
      RelNode calciteGenPlan = null;
      RelNode calcitePreCboPlan = null;
      RelNode calciteOptimizedPlan = null;
      subqueryId = -1;

      /*
       * recreate cluster, so that it picks up the additional traitDef
       * this is to keep track if a subquery is correlated and contains aggregatesince this is special cased when it  
       * is rewritten in SubqueryRemoveRule
       * Set<RelNode> corrScalarRexSQWithAgg = new HashSet<RelNode>();
       * Set<RelNode> scalarAggNoGbyNoWin = new HashSet<RelNode>();
       * conf 為hive的配置檔案
       */
      // HiveVolcanoPlanner
      RelOptPlanner planner = createPlanner(conf, corrScalarRexSQWithAgg, scalarAggNoGbyNoWin);
      
      final RexBuilder rexBuilder = cluster.getRexBuilder();
      final RelOptCluster optCluster = RelOptCluster.create(planner, rexBuilder);
      this.cluster = optCluster;
      this.relOptSchema = relOptSchema;

      // 1. Gen Calcite Plan
      perfLogger.PerfLogBegin(this.getClass().getName(), PerfLogger.OPTIMIZER);
      
      ...
      // getQB()--QB: Implementation of the query block. 語法分析的結果
      // 根據QB構建（RelNode）calciteGenPlan 
      calciteGenPlan = genLogicalPlan(getQB(), true, null, null);
      ...

      // Validate query materialization (materialized views, query results caching.
      // This check needs to occur before constant folding, which may remove some
      // function calls from the query plan.
      HiveRelOpMaterializationValidator matValidator = new HiveRelOpMaterializationValidator();
      matValidator.validateQueryMaterialization(calciteGenPlan);
      if (!matValidator.isValidMaterialization()) {
        String reason = matValidator.getInvalidMaterializationReason();
        setInvalidQueryMaterializationReason(reason);
      }

      // Create executor
      RexExecutor executorProvider = new HiveRexExecutorImpl(optCluster);
      calciteGenPlan.getCluster().getPlanner().setExecutor(executorProvider);

      // We need to get the ColumnAccessInfo and viewToTableSchema for views.
      HiveRelFieldTrimmer fieldTrimmer = new HiveRelFieldTrimmer(null,
          HiveRelFactories.HIVE_BUILDER.create(optCluster, null), this.columnAccessInfo,
          this.viewProjectToTableSchema);

      fieldTrimmer.trim(calciteGenPlan);

      // Create and set MD provider
      HiveDefaultRelMetadataProvider mdProvider = new HiveDefaultRelMetadataProvider(conf);
      RelMetadataQuery.THREAD_PROVIDERS.set(
              JaninoRelMetadataProvider.of(mdProvider.getMetadataProvider()));

      //Remove subquery
      calciteGenPlan = hepPlan(calciteGenPlan, false, mdProvider.getMetadataProvider(), null,
              new HiveSubQueryRemoveRule(conf));
    
      calciteGenPlan = HiveRelDecorrelator.decorrelateQuery(calciteGenPlan);
      LOG.debug("Plan after decorrelation:\n" + RelOptUtil.toString(calciteGenPlan));

      // 2. Apply pre-join order optimizations
      calcitePreCboPlan = applyPreJoinOrderingTransforms(calciteGenPlan,
              mdProvider.getMetadataProvider(), executorProvider);

      // 3. Materialized view based rewriting 
      //---待續 。。。。
      // We disable it for CTAS and MV creation queries (trying to avoid any problem
      // due to data freshness)a
      if (conf.getBoolVar(ConfVars.HIVE_MATERIALIZED_VIEW_ENABLE_AUTO_REWRITING) &&
              !getQB().isMaterializedView() && !ctx.isLoadingMaterializedView() && !getQB().isCTAS()) {
        calcitePreCboPlan = applyMaterializedViewRewriting(planner,
            calcitePreCboPlan, mdProvider.getMetadataProvider(), executorProvider);
      }

      // 4. Apply join order optimizations: reordering MST algorithm
      
      // 5. Run other optimizations that do not need stats

      // 6. Run aggregate-join transpose (cost based)
      //    If it failed because of missing stats, we continue with
      //    the rest of optimizations
      
      // 7.convert Join + GBy to semijoin
      // run this rule at later stages, since many calcite rules cant deal with semijoin
      
      // 8. convert SemiJoin + GBy to SemiJoin
     
      // 9. Get rid of sq_count_check if group by key is constant (HIVE-)
      
      // 10. Run rule to fix windowing issue when it is done over
      // aggregation columns (HIVE-10627)
      
      // 11. Apply Druid transformation rules
      
      // 12. Run rules to aid in translation from Calcite tree to Hive tree
        // 12.2.  Introduce exchange operators below join/multijoin operators
        
      return calciteOptimizedPlan;
    }

createPlanner

private static RelOptPlanner createPlanner(
      HiveConf conf, Set<RelNode> corrScalarRexSQWithAgg, Set<RelNode> scalarAggNoGbyNoWin) {
    //  Split Memory配置引數，不關注
    final Double maxSplitSize = (double) HiveConf.getLongVar(
            conf, HiveConf.ConfVars.MAPREDMAXSPLITSIZE);
    final Double maxMemory = (double) HiveConf.getLongVar(
            conf, HiveConf.ConfVars.HIVECONVERTJOINNOCONDITIONALTASKTHRESHOLD);
    HiveAlgorithmsConf algorithmsConf = new HiveAlgorithmsConf(maxSplitSize, maxMemory);
    
    //重寫規則注入器
    /***
    public class HiveRulesRegistry {
       private SetMultimap<RelOptRule, RelNode> registryVisited;
       private ListMultimap<RelNode,Set<String>> registryPushedPredicates;
       }
    */
    HiveRulesRegistry registry = new HiveRulesRegistry();
    
    // 配置引數 "timeZone" -> "Asia/Shanghai" ; "materializationsEnabled" -> "false"
    Properties calciteConfigProperties = new Properties();
    calciteConfigProperties.setProperty(
        CalciteConnectionProperty.TIME_ZONE.camelName(),
        conf.getLocalTimeZone().getId());
    calciteConfigProperties.setProperty(
        CalciteConnectionProperty.MATERIALIZATIONS_ENABLED.camelName(),
        Boolean.FALSE.toString());
        
    // CalciteConnectionConfig   
    CalciteConnectionConfig calciteConfig = new CalciteConnectionConfigImpl(calciteConfigProperties);
    
    // 配置引數  isCorrelatedColumns = true ； heuristicMaterializationStrategy= true
    boolean isCorrelatedColumns = HiveConf.getBoolVar(conf, HiveConf.ConfVars.HIVE_STATS_CORRELATED_MULTI_KEY_JOINS);
    boolean heuristicMaterializationStrategy = HiveConf.getVar(conf,
        HiveConf.ConfVars.HIVE_MATERIALIZED_VIEW_REWRITING_SELECTION_STRATEGY).equals("heuristic");
    // 上下文    
    HivePlannerContext confContext = new HivePlannerContext(algorithmsConf, registry, calciteConfig,
        corrScalarRexSQWithAgg, scalarAggNoGbyNoWin,
        new HiveConfPlannerContext(isCorrelatedColumns, heuristicMaterializationStrategy));
        
    return HiveVolcanoPlanner.createPlanner(confContext);
  }

HiveVolcanoPlanner

public class HiveVolcanoPlanner extends VolcanoPlanner {
  private static final boolean ENABLE_COLLATION_TRAIT = true;

  private final boolean isHeuristic;

  /** Creates a HiveVolcanoPlanner. */
  public HiveVolcanoPlanner(HivePlannerContext conf) {
    // 設定cost，具體參見HiveCost
    super(HiveCost.FACTORY, conf);
    isHeuristic = conf.unwrap(HiveConfPlannerContext.class).isHeuristicMaterializationStrategy();
  }

  public static RelOptPlanner createPlanner(HivePlannerContext conf) {
    final VolcanoPlanner planner = new HiveVolcanoPlanner(conf);
    planner.addRelTraitDef(ConventionTraitDef.INSTANCE);
    if (ENABLE_COLLATION_TRAIT) {
      planner.addRelTraitDef(RelCollationTraitDef.INSTANCE);
    }
    return planner;
  }

  @Override
  public void registerClass(RelNode node) {
    if (node instanceof DruidQuery) {
      // Special handling for Druid rules here as otherwise
      // planner will add Druid rules with logical builder
      addRule(HiveDruidRules.FILTER);
      ....
      return;
    }
    super.registerClass(node);
  }

  /**
   * The method extends the logic of the super method to decrease
   * the cost of the plan if it contains materialized views
   * (heuristic).
   */
  public RelOptCost getCost(RelNode rel, RelMetadataQuery mq) {
    .......
    return cost;
  }
}

genLogicalPlan

根據語法分析的結果QB，構建calcite的RelNode

private RelNode genLogicalPlan(QB qb, boolean outerMostQB,
                                   ImmutableMap<String, Integer> outerNameToPosMap,
                                   RowResolver outerRR) throws SemanticException {
      RelNode srcRel = null;
      RelNode filterRel = null;
      RelNode gbRel = null;
      RelNode gbHavingRel = null;
      RelNode selectRel = null;
      RelNode obRel = null;
      RelNode limitRel = null;
      
    // 1. Build Rel For Src (SubQuery, TS, Join)
    // 1.1. Recurse over the subqueries to fill the subquery part of the plan 
    // 1.2 Recurse over all the source tables
    // 1.3 process join
    // 1.3.1 process hints
    // 1.3.2 process the actual join
    // 2. Build Rel for where Clause
    // 3. Build Rel for GB Clause
    // 4. Build Rel for GB Having Clause
    // 5. Build Rel for Select Clause
    // 6. Build Rel for OB Clause
    // 7. Build Rel for Limit Clause
    // 8. Incase this QB corresponds to subquery then modify its RR to point
    return srcRel;
 }

接上genLogicalPlan

1.2 Recurse over all the source tables

RelNode op = genTableLogicalPlan(tableAlias, qb);

genTableLogicalPlan

獲取元資料，構建TableLogicalPlan

private RelNode genTableLogicalPlan(String tableAlias, QB qb) throws SemanticException {
   // 1. If the table has a Sample specified, bail from Calcite path.
   // 2. if returnpath is on and hivetestmode is on bail
   // 2. Get Table Metadata
   // 備註：在這裡獲取元資料
    Table tabMetaData = qb.getMetaData().getSrcForAlias(tableAlias);

    // 3. Get Table Logical Schema (Row Type)
    // NOTE: Table logical schema = Non Partition Cols + Partition Cols +
    // Virtual Cols

    // 3.1 Add Column info for non partion cols (Object Inspector fields)
    // 3.2 Add column info corresponding to partition columns
    // 3.3 Add column info corresponding to virtual columns
    // 4. Build operator
    // 5. Build Hive Table Scan Rel
      tableRel = new HiveTableScan(cluster, cluster.traitSetOf(HiveRelNode.CONVENTION),optTable,
              null == tableAlias ? tabMetaData.getTableName() : tableAlias,
              getAliasId(tableAlias, qb), HiveConf.getBoolVar(conf,
                  HiveConf.ConfVars.HIVE_CBO_RETPATH_HIVEOP), qb.isInsideView()
                  || qb.getAliasInsideView().contains(tableAlias.toLowerCase()));
    // 6. Add Schema(RR) to RelNode-Schema map
    return tableRel；

}

[6] Hive3.x SemanticAnalyzer and CalcitePlanner 物化檢視相關原始碼-02

接Hive3.x SemanticAnalyzer and CalcitePlanner 物化檢視相關原始碼 SemanticAnalyzer void analyzeInternal(ASTNode ast, PlannerContextFactory pcf) {

[4] Hive3.x SemanticAnalyzer and CalcitePlanner 物化檢視相關原始碼-01

接上文Hive3.x 查詢流程原始碼-Cli端-01 查詢Hive3.x Materialized view中構建的物化檢視的例子, debug檢視詳細執行過程 1）查詢語句 SET hive.txn.manager=org.apache.hadoop.hive.ql.lockm

【ORACLE】常用物化檢視相關元資料查詢語句

對物化檢視的狀態等資訊進行查詢，監控和管理時，需要對系統檢視進行查詢，以下列出了常用的物化檢視狀態、依賴關聯，批量維護時能用到的查詢語句，根據具體情況進行適當修改。基本資訊查詢 -- 物化檢視基本資訊 SELECT OWNER

X - Vasya and Socks

morn div size ear six NPU names AR when Problem description Vasya has n pairs of socks. In the morning of each day Vasya has to put on a

ELK之ElasticSearch 6.4.x安全認證Search Guard6

The getclass false not for internal 安全 mod word ELK的安裝非常簡單，解壓稍加配置即能用。今天的重點是記錄ELK的安全認證，ELK安裝好後，ElasticSearch是可以直接瀏覽訪問的，非常不安全，Kibana也是一樣直接

物化檢視應用

--建立物化檢視 create materialized view MV_LVY_LEVYDETAILDATA TABLESPACE ZGMV_DATA --儲存表空間 BUILD DEFERRED --延遲重新整理不立即重新整理 refresh force --如果可以快速重新整理則進行快

計算機系統結構考試：Chapter 6:Limits to ILP and SMT

ILP的限制在哪裡？ 1、大量相互矛盾的研究基準不同（比如Fortran語言和C語言的不同）、硬體的複雜度、編譯器的複雜度 2、隨著硬體預算的增加，ILP的可用性正逐漸減小 3、我們是否需要發明新的硬體/軟體機制來保持處理器的效能曲線？理論上：編譯器技術的進步+顯著的新的和不同的硬

暫停正在自動重新整理的oracle物化檢視的方法

原理：要暫停正在重新整理的物化檢視只需要殺死對應的會話就好了，但是物化檢視重新整理是通過oracle的job定時執行，會話被kill之後，job會重新的執行重新整理物化檢視操作，因此在kill會話之前需要把job的狀態broken設定為Y，表

Oracle遠端資料建物化檢視（materialized）建立簡單記錄，以及DBLINK的建立

目的：實現遠端資料庫訪問及其相應表的定時同步一、遠端資料庫dblink的建立 select * from dba_db_links; select * from user_sys_privs;--查詢使用者許可權 1、檢視scott使用者是否具備建立database link 許可權

actitivi 6.0.X工作流引擎整合app-editor詳細操作

1、修改activiti-ui包下不必要的程式碼；詳細參見教程。後面附上 2、新建啟動類。 package com.cy.ops.workflow; import org.activiti.app.conf.ApplicationConfiguration; impo

[5] Hive3.x Query Results Caching

Hive Query Results Caching DesignDocs Query Results Caching Hive Query Results Caching related setting parameters <property>

6-8 Percolate Up and Down （20 分）

Write the routines to do a "percolate up" and a "percolate down" in a binary min-heap. Format of functions: void PercolateUp( int p, PriorityQueue

大資料叢集：CDH 6.0.X 完整版安裝

CDH 6.0.x 安裝步驟前沿一CDH6新功能介紹二：下面開始進行CDH6安裝前的準備工作： 1、配置主機名和hosts解析(所有節點) 2、關閉防火牆 3、關閉SELinux 4、新增定時任務 5、禁用

CentOS 7.x安裝MySQL 5.6的詳細步驟、基本配置及相關說明

MySQL安裝包內自帶了一個安裝指令碼，此處以mysql-5.6.42-linux-glibc2.12-x86_64.tar.gz為例。這裡使用的作業系統為CentOS-7-x86_64-Minimal-1810，執行前需要安裝依賴： yum install perl autoco

物化檢視的定義,建立,重新整理,刪除等

一．物化檢視概述 Oracle的物化檢視是包括一個查詢結果的資料庫對像，它是遠端資料的的本地副本，或者用來生成基於資料表求和的彙總表。物化檢視儲存基於遠端表的資料，也可以稱為快照。物化檢視可以用於預先計算並儲存表連線或聚集等耗時較多的操作的結果，這樣，在執行查詢時

oracle物化檢視的簡單應用

根據網上資料以及自身實踐，使用場景： 1：可以用於服務於應用讀寫分離 2：查詢邏輯複雜，資料量比較大，導致每次查詢檢視或表的時候，查詢速度慢，效率低下操作步驟一、授權用system登入oracle,給你需要的使用者user1授予(oracle中的使用者對應表空

Spring Security（二十）：6.2.3 Form and Basic Login Options

You might be wondering where the login form came from when you were prompted to log in, since we made no mention of any HTML files or JSPs. In fact, since

利用Oracle物化檢視日誌訂閱增量

物化檢視的快速重新整理需要先構造物化檢視日誌，而物化檢視日誌中會記錄表的dml操作，因此可以通過物化檢視日誌訂閱Oracle增量。 1.物化檢視日誌名物化檢視日誌名為MLOG$_表名。當表名長度超過20時，只取前20位；當出現截短後名稱重複時，會自動在物化檢視日誌名後面新增數字。 2.物化

Logstash5.2.x升級至6.5.x

第1章環境說明：現有架構為elk+kafka+filebeat，elk各元件為5.2.x版本 [[email protected] ~]# rpm -qa |grep logstash logstash-5.2.2-1.noarch [[email protect

The beta is out now. Try the beta of X Cloud and X Core

The beta is out now. Try the beta of X Cloud and X CoreThe day has arrived. Today is the most prominent day for Internxt since we started the company, just

[6] Hive3.x SemanticAnalyzer and CalcitePlanner 物化檢視相關原始碼-02

SemanticAnalyzer

CalcitePlanner::getOPTree

CalcitePlanner::getOptimizedAST

CalcitePlanner:: logicalPlan

CalcitePlanner::CalcitePlannerAction

createPlanner

HiveVolcanoPlanner

genLogicalPlan

genTableLogicalPlan

相關推薦