1. 程式人生 > 實用技巧 >Apache Calcite 優化器詳解(三)

Apache Calcite 優化器詳解(三)

VolcanoPlanner

介紹完 HepPlanner 之後,接下來再來看下基於成本優化(CBO)模型在 Calcite 中是如何實現、如何落地的,關於 Volcano 理論內容建議先看下相關理論知識,否則直接看實現的話可能會有一些頭大。從 Volcano 模型的理論落地到實踐是有很大區別的,這裡先看一張 VolcanoPlanner 整體實現圖,如下所示(圖片來自Cost-based Query Optimization in Apache Phoenix using Apache Calcite):

640?wx_fmt=png

上面基本展現了 VolcanoPlanner 內部實現的流程,也簡單介紹了 VolcanoPlanner 在實現中的一些關鍵點(有些概念暫時不瞭解也不要緊,後面會介紹):

  1. Add Rule matches to Queue:向 Rule Match Queue 中新增相應的 Rule Match;

  2. Apply Rule match transformations to plan gragh:應用 Rule Match 對 plan graph 做 transformation 優化(Rule specifies an Operator sub-graph to match and logic to generate equivalent better sub-graph);

  3. Iterate for fixed iterations or until cost doesn’t change:進行相應的迭代,直到 cost 不再變化或者 Rule Match Queue 中 rule match 已經全部應用完成;

  4. Match importance based on cost of RelNode and height:Rule Match 的 importance 依賴於 RelNode 的 cost 和深度。

使用 VolcanoPlanner 實現的完整程式碼見SqlVolcanoTest。

下面來看下 VolcanoPlanner 實現具體的細節。

VolcanoPlanner 在實現中引入了一些基本概念,先明白這些概念對於理解 VolcanoPlanner 的實現非常有幫助。

RelSet

關於 RelSet,原始碼中介紹如下:

RelSet is an equivalence-set of expressions that is, a set of expressions which haveidentical semantics

.
We are generally interested in using the expression which hasthe lowest cost.
All of the expressions in an RelSet have thesame calling convention.

它有以下特點:

  1. 描述一組等價 Relation Expression,所有的 RelNode 會記錄在rels中;

  2. have the same calling convention;

  3. 具有相同物理屬性的 Relational Expression 會記錄在其成員變數List<RelSubset> subsets中.

RelSet 中比較重要成員變數如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
class RelSet {
   // 記錄屬於這個 RelSet 的所有 RelNode
  final List<RelNode> rels = new ArrayList<>();
  /**
   * Relational expressions that have a subset in this set as a child. This
   * is a multi-set. If multiple relational expressions in this set have the
   * same parent, there will be multiple entries.
   */
  final List<RelNode> parents = new ArrayList<>();
  //note: 具體相同物理屬性的子集合(本質上 RelSubset 並不記錄 RelNode,也是通過 RelSet 按物理屬性過濾得到其 RelNode 子集合,見下面的 RelSubset 部分)
  final List<RelSubset> subsets = new ArrayList<>();

  /**
   * List of {@link AbstractConverter} objects which have not yet been
   * satisfied.
   */
  final List<AbstractConverter> abstractConverters = new ArrayList<>();

  /**
   * Set to the superseding set when this is found to be equivalent to another
   * set.
   * note:當發現與另一個 RelSet 有相同的語義時,設定為替代集合
   */
  RelSet equivalentSet;
  RelNode rel;

  /**
   * Variables that are set by relational expressions in this set and available for use by parent and child expressions.
   * note:在這個集合中 relational expression 設定的變數,父類和子類 expression 可用的變數
   */
  final Set<CorrelationId> variablesPropagated;

  /**
   * Variables that are used by relational expressions in this set.
   * note:在這個集合中被 relational expression 使用的變數
   */
  final Set<CorrelationId> variablesUsed;
  final int id;

  /**
   * Reentrancy flag.
   */
  boolean inMetadataQuery;
}

關於 RelSubset,原始碼中介紹如下:

Subset of an equivalence class where all relational expressions have the same physical properties.

它的特點如下:

  1. 描述一組物理屬性相同的等價 Relation Expression,即它們具有相同的 Physical Properties;

  2. 每個 RelSubset 都會記錄其所屬的 RelSet;

  3. RelSubset 繼承自 AbstractRelNode,它也是一種 RelNode,物理屬性記錄在其成員變數 traitSet 中。

RelSubset 一些比較重要的成員變數如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
public class RelSubset extends AbstractRelNode {
  /**
   * cost of best known plan (it may have improved since)
   * note: 已知最佳 plan 的 cost
   */
  RelOptCost bestCost;

  /**
   * The set this subset belongs to.
   * RelSubset 所屬的 RelSet,在 RelSubset 中並不記錄具體的 RelNode,直接記錄在 RelSet 的 rels 中
   */
  final RelSet set;

  /**
   * best known plan
   * note: 已知的最佳 plan
   */
  RelNode best;

  /**
   * Flag indicating whether this RelSubset's importance was artificially
   * boosted.
   * note: 標誌這個 RelSubset 的 importance 是否是人為地提高了
   */
  boolean boosted;

  //~ Constructors -----------------------------------------------------------
  RelSubset(
      RelOptCluster cluster,
      RelSet set,
      RelTraitSet traits) {
    super(cluster, traits); // 繼承自 AbstractRelNode,會記錄其相應的 traits 資訊
    this.set = set;
    this.boosted = false;
    assert traits.allSimple();
    computeBestCost(cluster.getPlanner()); //note: 計算 best
    recomputeDigest(); //note: 計算 digest
  }
}

每個 RelSubset 都將會記錄其最佳 plan(best)和最佳 plan 的 cost(bestCost)資訊。

RuleMatch

RuleMatch 是這裡對 Rule 和 RelSubset 關係的一個抽象,它會記錄這兩者的資訊。

A match of a rule to a particular set of target relational expressions, frozen in time.

importance

importance 決定了在進行 Rule 優化時 Rule 應用的順序,它是一個相對概念,在 VolcanoPlanner 中有兩個 importance,分別是 RelSubset 和 RuleMatch 的 importance,這裡先提前介紹一下。

RelSubset 的 importance

RelSubset importance 計算方法見其 api 定義(圖中的 sum 改成 Math.max{}這個地方有誤):

640?wx_fmt=png

omputeImportance

舉個例子:假設一個 RelSubset(記為s0s0) 的 cost 是3,對應的 importance 是0.5,這個 RelNode 有兩個輸入(inputs),對應的 RelSubset 記為s1s1、s2s2(假設s1s1、s2s2不再有輸入 RelNode),其 cost 分別為 2和5,那麼s1s1的 importance 為

Importance ofs1s1=23+2+523+2+5⋅⋅0.5 = 0.1

Importance ofs2s2=53+2+553+2+5⋅⋅0.5 = 0.25

其中,2代表的是s1s1的 cost,3+2+53+2+5代表的是s0s0的 cost(本節點的 cost 加上其所有 input 的 cost)。下面看下其具體的程式碼實現(呼叫 RuleQueue 中的recompute()計算其 importance):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
//org.apache.calcite.plan.volcano.RuleQueue
/**
 * Recomputes the importance of the given RelSubset.
 * note:重新計算指定的 RelSubset 的 importance
 * note:如果為 true,即使 subset 沒有註冊,也會強制 importance 更新
 *
 * @param subset RelSubset whose importance is to be recomputed
 * @param force  if true, forces an importance update even if the subset has
 *               not been registered
 */
public void recompute(RelSubset subset, boolean force) {
  Double previousImportance = subsetImportances.get(subset);
  if (previousImportance == null) { //note: subset 還沒有註冊的情況下
    if (!force) { //note: 如果不是強制,可以直接先返回
      // Subset has not been registered yet. Don't worry about it.
      return;
    }

    previousImportance = Double.NEGATIVE_INFINITY;
  }

  //note: 計算器 importance 值
  double importance = computeImportance(subset);
  if (previousImportance == importance) {
    return;
  }

  //note: 快取中更新其 importance
  updateImportance(subset, importance);
}


// 計算一個節點的 importance
double computeImportance(RelSubset subset) {
  double importance;
  if (subset == planner.root) {
    // The root always has importance = 1
    //note: root RelSubset 的 importance 為1
    importance = 1.0;
  } else {
    final RelMetadataQuery mq = subset.getCluster().getMetadataQuery();

    // The importance of a subset is the max of its importance to its
    // parents
    //note: 計算其相對於 parent 的最大 importance,多個 parent 的情況下,選擇一個最大值
    importance = 0.0;
    for (RelSubset parent : subset.getParentSubsets(planner)) {
      //note: 計算這個 RelSubset 相對於 parent 的 importance
      final double childImportance =
          computeImportanceOfChild(mq, subset, parent);
      //note: 選擇最大的 importance
      importance = Math.max(importance, childImportance);
    }
  }
  LOGGER.trace("Importance of [{}] is {}", subset, importance);
  return importance;
}

//note:根據 cost 計算 child 相對於 parent 的 importance(這是個相對值)
private double computeImportanceOfChild(RelMetadataQuery mq, RelSubset child,
    RelSubset parent) {
  //note: 獲取 parent 的 importance
  final double parentImportance = getImportance(parent);
  //note: 獲取對應的 cost 資訊
  final double childCost = toDouble(planner.getCost(child, mq));
  final double parentCost = toDouble(planner.getCost(parent, mq));
  double alpha = childCost / parentCost;
  if (alpha >= 1.0) {
    // child is always less important than parent
    alpha = 0.99;
  }
  //note: 根據 cost 比列計算其 importance
  final double importance = parentImportance * alpha;
  LOGGER.trace("Importance of [{}] to its parent [{}] is {} (parent importance={}, child cost={},"
      + " parent cost={})", child, parent, importance, parentImportance, childCost, parentCost);
  return importance;
}

computeImportanceOfChild()中計算 RelSubset 相對於 parent RelSubset 的 importance 時,一個比較重要的地方就是如何計算 cost,關於 cost 的計算見:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
//org.apache.calcite.plan.volcano.VolcanoPlanner
//note: Computes the cost of a RelNode.
public RelOptCost getCost(RelNode rel, RelMetadataQuery mq) {
  assert rel != null : "pre-condition: rel != null";
  if (rel instanceof RelSubset) { //note: 如果是 RelSubset,證明是已經計算 cost 的 subset
    return ((RelSubset) rel).bestCost;
  }
  if (rel.getTraitSet().getTrait(ConventionTraitDef.INSTANCE)
      == Convention.NONE) {
    return costFactory.makeInfiniteCost(); //note: 這種情況下也會返回 infinite Cost
  }
  //note: 計算其 cost
  RelOptCost cost = mq.getNonCumulativeCost(rel);
  if (!zeroCost.isLt(cost)) { //note: cost 比0還小的情況
    // cost must be positive, so nudge it
    cost = costFactory.makeTinyCost();
  }
  //note: RelNode 的 cost 會把其 input 全部加上
  for (RelNode input : rel.getInputs()) {
    cost = cost.plus(getCost(input, mq));
  }
  return cost;
}

上面就是 RelSubset importance 計算的程式碼實現,從實現中可以發現這個特點:

  1. 越靠近 root 的 RelSubset,其 importance 越大,這個帶來的好處就是在優化時,會盡量先優化靠近 root 的 RelNode,這樣帶來的收益也會最大。

RuleMatch 的 importance

RuleMatch 的 importance 定義為以下兩個中比較大的一個(如果對應的 RelSubset 有 importance 的情況下):

  1. 這個 RuleMatch 對應 RelSubset(這個 rule match 的 RelSubset)的 importance;

  2. 輸出的 RelSubset(taget RelSubset)的 importance(如果這個 RelSubset 在 VolcanoPlanner 的快取中存在的話)。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
//org.apache.calcite.plan.volcano.VolcanoRuleMatch
/**
 * Computes the importance of this rule match.
 * note:計算 rule match 的 importance
 *
 * @return importance of this rule match
 */
double computeImportance() {
  assert rels[0] != null; //note: rels[0] 這個 Rule Match 對應的 RelSubset
  RelSubset subset = volcanoPlanner.getSubset(rels[0]);
  double importance = 0;
  if (subset != null) {
    //note: 獲取 RelSubset 的 importance
    importance = volcanoPlanner.ruleQueue.getImportance(subset);
  }
  //note: Returns a guess as to which subset the result of this rule will belong to.
  final RelSubset targetSubset = guessSubset();
  if ((targetSubset != null) && (targetSubset != subset)) {
    // If this rule will generate a member of an equivalence class
    // which is more important, use that importance.
    //note: 獲取 targetSubset 的 importance
    final double targetImportance =
        volcanoPlanner.ruleQueue.getImportance(targetSubset);
    if (targetImportance > importance) {
      importance = targetImportance;

      // If the equivalence class is cheaper than the target, bump up
      // the importance of the rule. A converter is an easy way to
      // make the plan cheaper, so we'd hate to miss this opportunity.
      //
      // REVIEW: jhyde, 2007/12/21: This rule seems to make sense, but
      // is disabled until it has been proven.
      //
      // CHECKSTYLE: IGNORE 3
      if ((subset != null)
          && subset.bestCost.isLt(targetSubset.bestCost)
          && false) { //note: 肯定不會進入
        importance *=
            targetSubset.bestCost.divideBy(subset.bestCost);
        importance = Math.min(importance, 0.99);
      }
    }
  }

  return importance;
}

RuleMatch 的 importance 主要是決定了在選擇 RuleMatch 時,應該先處理哪一個?它本質上還是直接用的 RelSubset 的 importance。

VolcanoPlanner 處理流程

還是以前面的示例,只不過這裡把優化器換成 VolcanoPlanner 來實現,通過這個示例來詳細看下 VolcanoPlanner 內部的實現邏輯。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
//1. 初始化 VolcanoPlanner 物件,並新增相應的 Rule
VolcanoPlanner planner = new VolcanoPlanner();
planner.addRelTraitDef(ConventionTraitDef.INSTANCE);
planner.addRelTraitDef(RelDistributionTraitDef.INSTANCE);
// 新增相應的 rule
planner.addRule(FilterJoinRule.FilterIntoJoinRule.FILTER_ON_JOIN);
planner.addRule(ReduceExpressionsRule.PROJECT_INSTANCE);
planner.addRule(PruneEmptyRules.PROJECT_INSTANCE);
// 新增相應的 ConverterRule
planner.addRule(EnumerableRules.ENUMERABLE_MERGE_JOIN_RULE);
planner.addRule(EnumerableRules.ENUMERABLE_SORT_RULE);
planner.addRule(EnumerableRules.ENUMERABLE_VALUES_RULE);
planner.addRule(EnumerableRules.ENUMERABLE_PROJECT_RULE);
planner.addRule(EnumerableRules.ENUMERABLE_FILTER_RULE);
//2. Changes a relational expression to an equivalent one with a different set of traits.
RelTraitSet desiredTraits =
    relNode.getCluster().traitSet().replace(EnumerableConvention.INSTANCE);
relNode = planner.changeTraits(relNode, desiredTraits);
//3. 通過 VolcanoPlanner 的 setRoot 方法註冊相應的 RelNode,並進行相應的初始化操作
planner.setRoot(relNode);
//4. 通過動態規劃演算法找到 cost 最小的 plan
relNode = planner.findBestExp();

優化後的結果為:

1
2
3
4
5
6
7
EnumerableSort(sort0=[$0], dir0=[ASC])
  EnumerableProject(USER_ID=[$0], USER_NAME=[$1], USER_COMPANY=[$5], USER_AGE=[$2])
    EnumerableMergeJoin(condition=[=($0, $3)], joinType=[inner])
      EnumerableFilter(condition=[>($2, 30)])
        EnumerableTableScan(table=[[USERS]])
      EnumerableFilter(condition=[>($0, 10)])
        EnumerableTableScan(table=[[JOBS]])

在應用 VolcanoPlanner 時,整體分為以下四步:

  1. 初始化 VolcanoPlanner,並新增相應的 Rule(包括 ConverterRule);

  2. 對 RelNode 做等價轉換,這裡只是改變其物理屬性(Convention);

  3. 通過 VolcanoPlanner 的setRoot()方法註冊相應的 RelNode,並進行相應的初始化操作;

  4. 通過動態規劃演算法找到 cost 最小的 plan;

下面來分享一下上面的詳細流程。

1. VolcanoPlanner 初始化

在這裡總共有三步,分別是 VolcanoPlanner 初始化,addRelTraitDef()新增 RelTraitDef,addRule()新增 rule,先看下 VolcanoPlanner 的初始化:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
//org.apache.calcite.plan.volcano.VolcanoPlanner
/**
 * Creates a uninitialized <code>VolcanoPlanner</code>. To fully initialize it, the caller must register the desired set of relations, rules, and calling conventions.
 * note: 建立一個沒有初始化的 VolcanoPlanner,如果要進行初始化,呼叫者必須註冊 set of relations、rules、calling conventions.
 */
public VolcanoPlanner() {
  this(null, null);
}

/**
 * Creates a {@code VolcanoPlanner} with a given cost factory.
 * note: 建立 VolcanoPlanner 例項,並制定 costFactory(預設為 VolcanoCost.FACTORY)
 */
public VolcanoPlanner(RelOptCostFactory costFactory, //
    Context externalContext) {
  super(costFactory == null ? VolcanoCost.FACTORY : costFactory, //
      externalContext);
  this.zeroCost = this.costFactory.makeZeroCost();
}

這裡其實並沒有做什麼,只是做了一些簡單的初始化,如果要想設定相應 RelTraitDef 的話,需要呼叫addRelTraitDef()進行新增,其實現如下:

1
2
3
4
5
//org.apache.calcite.plan.volcano.VolcanoPlanner
//note: 新增 RelTraitDef
@Override public boolean addRelTraitDef(RelTraitDef relTraitDef) {
  return !traitDefs.contains(relTraitDef) && traitDefs.add(relTraitDef);
}

如果要給 VolcanoPlanner 新增 Rule 的話,需要呼叫addRule()進行新增,在這個方法裡重點做的一步是將具體的 RelNode 與 RelOptRuleOperand 之間的關係記錄下來,記錄到classOperands,相當於在優化時,哪個 RelNode 可以應用哪些 Rule 都是記錄在這個快取裡的。其實現如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
//org.apache.calcite.plan.volcano.VolcanoPlanner
//note: 新增 rule
public boolean addRule(RelOptRule rule) {
  if (locked) {
    return false;
  }
  if (ruleSet.contains(rule)) {
    // Rule already exists.
    return false;
  }
  final boolean added = ruleSet.add(rule);
  assert added;

  final String ruleName = rule.toString();
  //note: 這裡的 ruleNames 允許重複的 key 值,但是這裡還是要求 rule description 保持唯一的,與 rule 一一對應
  if (ruleNames.put(ruleName, rule.getClass())) {
    Set<Class> x = ruleNames.get(ruleName);
    if (x.size() > 1) {
      throw new RuntimeException("Rule description '" + ruleName
          + "' is not unique; classes: " + x);
    }
  }

  //note: 註冊一個 rule 的 description(儲存在 mapDescToRule 中)
  mapRuleDescription(rule);

  // Each of this rule's operands is an 'entry point' for a rule call. Register each operand against all concrete sub-classes that could match it.
  //note: 記錄每個 sub-classes 與 operand 的關係(如果能 match 的話,就記錄一次)。一個 RelOptRuleOperand 只會有一個 class 與之對應,這裡找的是 subclass
  for (RelOptRuleOperand operand : rule.getOperands()) {
    for (Class<? extends RelNode> subClass
        : subClasses(operand.getMatchedClass())) {
      classOperands.put(subClass, operand);
    }
  }

  // If this is a converter rule, check that it operates on one of the
  // kinds of trait we are interested in, and if so, register the rule
  // with the trait.
  //note: 對於 ConverterRule 的操作,如果其 ruleTraitDef 型別包含在我們初始化的 traitDefs 中,
  //note: 就註冊這個 converterRule 到 ruleTraitDef 中
  //note: 如果不包含 ruleTraitDef,這個 ConverterRule 在本次優化的過程中是用不到的
  if (rule instanceof ConverterRule) {
    ConverterRule converterRule = (ConverterRule) rule;

    final RelTrait ruleTrait = converterRule.getInTrait();
    final RelTraitDef ruleTraitDef = ruleTrait.getTraitDef();
    if (traitDefs.contains(ruleTraitDef)) { //note: 這裡註冊好像也沒有用到
      ruleTraitDef.registerConverterRule(this, converterRule);
    }
  }

  return true;
}

2. RelNode changeTraits

這裡分為兩步:

  1. 通過 RelTraitSet 的replace()方法,將 RelTraitSet 中對應的 RelTraitDef 做對應的更新,其他的 RelTrait 不變;

  2. 這一步簡單來說就是:Changes a relational expression to an equivalent one with a different set of traits,對相應的 RelNode 做 converter 操作,這裡實際上也會做很多的內容,這部分會放在第三步講解,主要是registerImpl()方法的實現。

3. VolcanoPlanner setRoot

VolcanoPlanner 會呼叫setRoot()方法註冊相應的 Root RelNode,並進行一系列 Volcano 必須的初始化操作,很多的操作都是在這裡實現的,這裡來詳細看下其實現。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
//org.apache.calcite.plan.volcano.VolcanoPlanner
public void setRoot(RelNode rel) {
  // We're registered all the rules, and therefore RelNode classes,
  // we're interested in, and have not yet started calling metadata providers.
  // So now is a good time to tell the metadata layer what to expect.
  registerMetadataRels();

  //note: 註冊相應的 RelNode,會做一系列的初始化操作, RelNode 會有對應的 RelSubset
  this.root = registerImpl(rel, null);
  if (this.originalRoot == null) {
    this.originalRoot = rel;
  }

  // Making a node the root changes its importance.
  //note: 重新計算 root subset 的 importance
  this.ruleQueue.recompute(this.root);
  //Ensures that the subset that is the root relational expression contains converters to all other subsets in its equivalence set.
  ensureRootConverters();
}

對於setRoot()方法來說,核心的處理流程是在registerImpl()方法中,在這個方法會進行相應的初始化操作(包括 RelNode 到 RelSubset 的轉換、計算 RelSubset 的 importance 等),其他的方法在上面有相應的備註,這裡我們看下registerImpl()具體做了哪些事情:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
//org.apache.calcite.plan.volcano.VolcanoPlanner
/**
 * Registers a new expression <code>exp</code> and queues up rule matches.
 * If <code>set</code> is not null, makes the expression part of that
 * equivalence set. If an identical expression is already registered, we
 * don't need to register this one and nor should we queue up rule matches.
 *
 * note:註冊一個新的 expression;對 rule match 進行排隊;
 * note:如果 set 不為 null,那麼就使 expression 成為等價集合(RelSet)的一部分
 * note:rel:必須是 RelSubset 或者未註冊的 RelNode
 * @param rel relational expression to register. Must be either a
 *         {@link RelSubset}, or an unregistered {@link RelNode}
 * @param set set that rel belongs to, or <code>null</code>
 * @return the equivalence-set
 */
private RelSubset registerImpl(
    RelNode rel,
    RelSet set) {
  if (rel instanceof RelSubset) { //note: 如果是 RelSubset 型別,已經註冊過了
    return registerSubset(set, (RelSubset) rel); //note: 做相應的 merge
  }

  assert !isRegistered(rel) : "already been registered: " + rel;
  if (rel.getCluster().getPlanner() != this) { //note: cluster 中 planner 與這裡不同
    throw new AssertionError("Relational expression " + rel
        + " belongs to a different planner than is currently being used.");
  }

  // Now is a good time to ensure that the relational expression
  // implements the interface required by its calling convention.
  //note: 確保 relational expression 可以實施其 calling convention 所需的介面
  //note: 獲取 RelNode 的 RelTraitSet
  final RelTraitSet traits = rel.getTraitSet();
  //note: 獲取其 ConventionTraitDef
  final Convention convention = traits.getTrait(ConventionTraitDef.INSTANCE);
  assert convention != null;
  if (!convention.getInterface().isInstance(rel)
      && !(rel instanceof Converter)) {
    throw new AssertionError("Relational expression " + rel
        + " has calling-convention " + convention
        + " but does not implement the required interface '"
        + convention.getInterface() + "' of that convention");
  }
  if (traits.size() != traitDefs.size()) {
    throw new AssertionError("Relational expression " + rel
        + " does not have the correct number of traits: " + traits.size()
        + " != " + traitDefs.size());
  }

  // Ensure that its sub-expressions are registered.
  //note: 其實現在 AbstractRelNode 對應的方法中,實際上呼叫的還是 ensureRegistered 方法進行註冊
  //note: 將 RelNode 的所有 inputs 註冊到 planner 中
  //note: 這裡會遞迴呼叫 registerImpl 註冊 relNode 與 RelSet,直到其 inputs 全部註冊
  //note: 返回的是一個 RelSubset 型別
  rel = rel.onRegister(this);

  // Record its provenance. (Rule call may be null.)
  //note: 記錄 RelNode 的來源
  if (ruleCallStack.isEmpty()) { //note: 不知道來源時
    provenanceMap.put(rel, Provenance.EMPTY);
  } else { //note: 來自 rule 觸發的情況
    final VolcanoRuleCall ruleCall = ruleCallStack.peek();
    provenanceMap.put(
        rel,
        new RuleProvenance(
            ruleCall.rule,
            ImmutableList.copyOf(ruleCall.rels),
            ruleCall.id));
  }

  // If it is equivalent to an existing expression, return the set that
  // the equivalent expression belongs to.
  //note: 根據 RelNode 的 digest(摘要,全域性唯一)判斷其是否已經有對應的 RelSubset,有的話直接放回
  String key = rel.getDigest();
  RelNode equivExp = mapDigestToRel.get(key);
  if (equivExp == null) { //note: 還沒註冊的情況
    // do nothing
  } else if (equivExp == rel) {//note: 已經有其快取資訊
    return getSubset(rel);
  } else {
    assert RelOptUtil.equal(
        "left", equivExp.getRowType(),
        "right", rel.getRowType(),
        Litmus.THROW);
    RelSet equivSet = getSet(equivExp); //note: 有 RelSubset 但對應的 RelNode 不同時,這裡對其 RelSet 做下 merge
    if (equivSet != null) {
      LOGGER.trace(
          "Register: rel#{} is equivalent to {}", rel.getId(), equivExp.getDescription());
      return registerSubset(set, getSubset(equivExp));
    }
  }

  //note: Converters are in the same set as their children.
  if (rel instanceof Converter) {
    final RelNode input = ((Converter) rel).getInput();
    final RelSet childSet = getSet(input);
    if ((set != null)
        && (set != childSet)
        && (set.equivalentSet == null)) {
      LOGGER.trace(
          "Register #{} {} (and merge sets, because it is a conversion)",
          rel.getId(), rel.getDigest());
      merge(set, childSet);
      registerCount++;

      // During the mergers, the child set may have changed, and since
      // we're not registered yet, we won't have been informed. So
      // check whether we are now equivalent to an existing
      // expression.
      if (fixUpInputs(rel)) {
        rel.recomputeDigest();
        key = rel.getDigest();
        RelNode equivRel = mapDigestToRel.get(key);
        if ((equivRel != rel) && (equivRel != null)) {
          assert RelOptUtil.equal(
              "rel rowtype",
              rel.getRowType(),
              "equivRel rowtype",
              equivRel.getRowType(),
              Litmus.THROW);

          // make sure this bad rel didn't get into the
          // set in any way (fixupInputs will do this but it
          // doesn't know if it should so it does it anyway)
          set.obliterateRelNode(rel);

          // There is already an equivalent expression. Use that
          // one, and forget about this one.
          return getSubset(equivRel);
        }
      }
    } else {
      set = childSet;
    }
  }

  // Place the expression in the appropriate equivalence set.
  //note: 把 expression 放到合適的 等價集 中
  //note: 如果 RelSet 不存在,這裡會初始化一個 RelSet
  if (set == null) {
    set = new RelSet(
        nextSetId++,
        Util.minus(
            RelOptUtil.getVariablesSet(rel),
            rel.getVariablesSet()),
        RelOptUtil.getVariablesUsed(rel));
    this.allSets.add(set);
  }

  // Chain to find 'live' equivalent set, just in case several sets are
  // merging at the same time.
  //note: 遞迴查詢,一直找到最開始的 語義相等的集合,防止不同集合同時被 merge
  while (set.equivalentSet != null) {
    set = set.equivalentSet;
  }

  // Allow each rel to register its own rules.
  registerClass(rel);

  registerCount++;
  //note: 初始時是 0
  final int subsetBeforeCount = set.subsets.size();
  //note: 向等價集中新增相應的 RelNode,並更新其 best 資訊
  RelSubset subset = addRelToSet(rel, set);

  //note: 快取相關資訊,返回的 key 之前對應的 value
  final RelNode xx = mapDigestToRel.put(key, rel);
  assert xx == null || xx == rel : rel.getDigest();

  LOGGER.trace("Register {} in {}", rel.getDescription(), subset.getDescription());

  // This relational expression may have been registered while we
  // recursively registered its children. If this is the case, we're done.
  if (xx != null) {
    return subset;
  }

  // Create back-links from its children, which makes children more
  // important.
  //note: 如果是 root,初始化其 importance 為 1.0
  if (rel == this.root) {
    ruleQueue.subsetImportances.put(
        subset,
        1.0); // todo: remove
  }
  //note: 將 Rel 的 input 對應的 RelSubset 的 parents 設定為當前的 Rel
  //note: 也就是說,一個 RelNode 的 input 為其對應 RelSubset 的 children 節點
  for (RelNode input : rel.getInputs()) {
    RelSubset childSubset = (RelSubset) input;
    childSubset.set.parents.add(rel);

    // Child subset is more important now a new parent uses it.
    //note: 重新計算 RelSubset 的 importance
    ruleQueue.recompute(childSubset);
  }
  if (rel == this.root) {// TODO: 2019-03-11 這裡為什麼要刪除呢?
    ruleQueue.subsetImportances.remove(subset);
  }

  // Remember abstract converters until they're satisfied
  //note: 如果是 AbstractConverter 示例,新增到 abstractConverters 集合中
  if (rel instanceof AbstractConverter) {
    set.abstractConverters.add((AbstractConverter) rel);
  }

  // If this set has any unsatisfied converters, try to satisfy them.
  //note: check set.abstractConverters
  checkForSatisfiedConverters(set, rel);

  // Make sure this rel's subset importance is updated
  //note: 強制更新(重新計算) subset 的 importance
  ruleQueue.recompute(subset, true);

  //note: 觸發所有匹配的 rule,這裡是新增到對應的 RuleQueue 中
  // Queue up all rules triggered by this relexp's creation.
  fireRules(rel, true);

  // It's a new subset.
  //note: 如果是一個 new subset,再做一次觸發
  if (set.subsets.size() > subsetBeforeCount) {
    fireRules(subset, true);
  }

  return subset;
}

registerImpl()處理流程比較複雜,其方法實現,可以簡單總結為以下幾步:

  1. 在經過最上面的一些驗證之後,會通過rel.onRegister(this)這步操作,遞迴地呼叫 VolcanoPlanner 的ensureRegistered()方法對其inputsRelNode 進行註冊,最後還是呼叫registerImpl()方法先註冊葉子節點,然後再父節點,最後到根節點;

  2. 根據 RelNode 的 digest 資訊(一般這個對於 RelNode 來說是全域性唯一的),判斷其是否已經存在mapDigestToRel快取中,如果存在的話,那麼判斷會 RelNode 是否相同,如果相同的話,證明之前已經註冊過,直接通過getSubset()返回其對應的 RelSubset 資訊,否則就對其 RelSubset 做下 merge;

  3. 如果 RelNode 對應的 RelSet 為 null,這裡會新建一個 RelSet,並通過addRelToSet()將 RelNode 新增到 RelSet 中,並且更新 VolcanoPlanner 的mapRel2Subset快取記錄(RelNode 與 RelSubset 的對應關係),在addRelToSet()的最後還會更新 RelSubset 的 best plan 和 best cost(每當往一個 RelSubset 新增相應的 RelNode 時,都會判斷這個 RelNode 是否代表了 best plan,如果是的話,就更新);

  4. 將這個 RelNode 的 inputs 設定為其對應 RelSubset 的 children 節點(實際的操作時,是在 RelSet 的parents中記錄其父節點);

  5. 強制重新計算當前 RelNode 對應 RelSubset 的 importance;

  6. 如果這個 RelSubset 是新建的,會再觸發一次fireRules()方法(會先對 RelNode 觸發一次),遍歷找到所有可以 match 的 Rule,對每個 Rule 都會建立一個 VolcanoRuleMatch 物件(會記錄 RelNode、RelOptRuleOperand 等資訊,RelOptRuleOperand 中又會記錄 Rule 的資訊),並將這個 VolcanoRuleMatch 新增到對應的 RuleQueue 中(就是前面圖中的那個 RuleQueue)。

這裡,來看下fireRules()方法的實現,它的目的是把配置的 RuleMatch 新增到 RuleQueue 中,其實現如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
//org.apache.calcite.plan.volcano.VolcanoPlanner
/**
 * Fires all rules matched by a relational expression.
 * note: 觸發滿足這個 relational expression 的所有 rules
 *
 * @param rel      Relational expression which has just been created (or maybe
 *                 from the queue)
 * @param deferred If true, each time a rule matches, just add an entry to
 *                 the queue.
 */
void fireRules(
    RelNode rel,
    boolean deferred) {
  for (RelOptRuleOperand operand : classOperands.get(rel.getClass())) {
    if (operand.matches(rel)) { //note: rule 匹配的情況
      final VolcanoRuleCall ruleCall;
      if (deferred) { //note: 這裡預設都是 true,會把 RuleMatch 新增到 queue 中
        ruleCall = new DeferringRuleCall(this, operand);
      } else {
        ruleCall = new VolcanoRuleCall(this, operand);
      }
      ruleCall.match(rel);
    }
  }
}

/**
 * A rule call which defers its actions. Whereas {@link RelOptRuleCall}
 * invokes the rule when it finds a match, a <code>DeferringRuleCall</code>
 * creates a {@link VolcanoRuleMatch} which can be invoked later.
 */
private static class DeferringRuleCall extends VolcanoRuleCall {
  DeferringRuleCall(
      VolcanoPlanner planner,
      RelOptRuleOperand operand) {
    super(planner, operand);
  }

  /**
   * Rather than invoking the rule (as the base method does), creates a
   * {@link VolcanoRuleMatch} which can be invoked later.
   * note:不是直接觸發 rule,而是建立一個後續可以被觸發的 VolcanoRuleMatch
   */
  protected void onMatch() {
    final VolcanoRuleMatch match =
        new VolcanoRuleMatch(
            volcanoPlanner,
            getOperand0(), //note: 其實就是 operand
            rels,
            nodeInputs);
    volcanoPlanner.ruleQueue.addMatch(match);
  }
}

在上面的方法中,對於匹配的 Rule,將會建立一個 VolcanoRuleMatch 物件,之後再把這個 VolcanoRuleMatch 物件新增到對應的 RuleQueue 中。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
//org.apache.calcite.plan.volcano.RuleQueue
/**
 * Adds a rule match. The rule-matches are automatically added to all
 * existing {@link PhaseMatchList per-phase rule-match lists} which allow
 * the rule referenced by the match.
 * note:新增一個 rule match(新增到所有現存的 match phase 中)
 */
void addMatch(VolcanoRuleMatch match) {
  final String matchName = match.toString();
  for (PhaseMatchList matchList : matchListMap.values()) {
    if (!matchList.names.add(matchName)) {
      // Identical match has already been added.
      continue;
    }

    String ruleClassName = match.getRule().getClass().getSimpleName();

    Set<String> phaseRuleSet = phaseRuleMapping.get(matchList.phase);
    //note: 如果 phaseRuleSet 不為 ALL_RULES,並且 phaseRuleSet 不包含這個 ruleClassName 時,就跳過(其他三個階段都屬於這個情況)
    //note: 在新增 rule match 時,phaseRuleSet 可以控制哪些 match 可以新增、哪些不能新增
    //note: 這裡的話,預設只有處在 OPTIMIZE 階段的 PhaseMatchList 可以新增相應的 rule match
    if (phaseRuleSet != ALL_RULES) {
      if (!phaseRuleSet.contains(ruleClassName)) {
        continue;
      }
    }

    LOGGER.trace("{} Rule-match queued: {}", matchList.phase.toString(), matchName);

    matchList.list.add(match);

    matchList.matchMap.put(
        planner.getSubset(match.rels[0]), match);
  }
}

到這裡 VolcanoPlanner 需要初始化的內容都初始化完成了,下面就到了具體的優化部分。

4. VolcanoPlanner findBestExp

VolcanoPlanner 的findBestExp()是具體進行優化的地方,先介紹一下這裡的優化策略(每進行一次迭代,cumulativeTicks加1,它記錄了總的迭代次數):

  1. 第一次找到可執行計劃的迭代次數記為firstFiniteTick,其對應的 Cost 暫時記為 BestCost;

  2. 制定下一次優化要達到的目標為BestCost*0.9,再根據firstFiniteTick及當前的迭代次數計算giveUpTick,這個值代表的意思是:如果迭代次數超過這個值還沒有達到優化目標,那麼將會放棄迭代,認為當前的 plan 就是 best plan;

  3. 如果 RuleQueue 中 RuleMatch 為空,那麼也會退出迭代,認為當前的 plan 就是 best plan;

  4. 在每次迭代時都會從 RuleQueue 中選擇一個 RuleMatch,策略是選擇一個最高 importance 的 RuleMatch,可以保證在每次規則優化時都是選擇當前優化效果最好的 Rule 去優化;

  5. 最後根據 best plan,構建其對應的 RelNode。

上面就是findBestExp()主要設計理念,這裡來看其具體的實現:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
//org.apache.calcite.plan.volcano.VolcanoPlanner
/**
 * Finds the most efficient expression to implement the query given via
 * {@link org.apache.calcite.plan.RelOptPlanner#setRoot(org.apache.calcite.rel.RelNode)}.
 *
 * note:找到最有效率的 relational expression,這個演算法包含一系列階段,每個階段被觸發的 rules 可能不同
 * <p>The algorithm executes repeatedly in a series of phases. In each phase
 * the exact rules that may be fired varies. The mapping of phases to rule
 * sets is maintained in the {@link #ruleQueue}.
 *
 * note:在每個階段,planner 都會初始化這個 RelSubset 的 importance,planner 會遍歷 rule queue 中 rules 直到:
 * note:1. rule queue 變為空;
 * note:2. 對於 ambitious planner,最近 cost 不再提高時(具體來說,第一次找到一個可執行計劃時,需要達到需要迭代總數的10%或更大);
 * note:3. 對於 non-ambitious planner,當找到一個可執行的計劃就行;
 * <p>In each phase, the planner sets the initial importance of the existing
 * RelSubSets ({@link #setInitialImportance()}). The planner then iterates
 * over the rule matches presented by the rule queue until:
 *
 * <ol>
 * <li>The rule queue becomes empty.</li>
 * <li>For ambitious planners: No improvements to the plan have been made
 * recently (specifically within a number of iterations that is 10% of the
 * number of iterations necessary to first reach an implementable plan or 25
 * iterations whichever is larger).</li>
 * <li>For non-ambitious planners: When an implementable plan is found.</li>
 * </ol>
 *
 * note:此外,如果每10次迭代之後,沒有一個可實現的計劃,包含 logical RelNode 的 RelSubSets 將會通過 injectImportanceBoost 給一個 importance;
 * <p>Furthermore, after every 10 iterations without an implementable plan,
 * RelSubSets that contain only logical RelNodes are given an importance
 * boost via {@link #injectImportanceBoost()}. Once an implementable plan is
 * found, the artificially raised importance values are cleared (see
 * {@link #clearImportanceBoost()}).
 *
 * @return the most efficient RelNode tree found for implementing the given
 * query
 */
public RelNode findBestExp() {
  //note: 確保 root relational expression 的 subset(RelSubset)在它的等價集(RelSet)中包含所有 RelSubset 的 converter
  //note: 來保證 planner 從其他的 subsets 找到的實現方案可以轉換為 root,否則可能因為 convention 不同,無法實施
  ensureRootConverters();
  //note: materialized views 相關,這裡可以先忽略~
  registerMaterializations();
  int cumulativeTicks = 0; //note: 四個階段通用的變數
  //note: 不同的階段,總共四個階段,實際上只有 OPTIMIZE 這個階段有效,因為其他階段不會有 RuleMatch
  for (VolcanoPlannerPhase phase : VolcanoPlannerPhase.values()) {
    //note: 在不同的階段,初始化 RelSubSets 相應的 importance
    //note: root 節點往下子節點的 importance 都會被初始化
    setInitialImportance();

    //note: 預設是 VolcanoCost
    RelOptCost targetCost = costFactory.makeHugeCost();
    int tick = 0;
    int firstFiniteTick = -1;
    int splitCount = 0;
    int giveUpTick = Integer.MAX_VALUE;

    while (true) {
      ++tick;
      ++cumulativeTicks;
      //note: 第一次執行是 false,兩個不是一個物件,一個是 costFactory.makeHugeCost, 一個是 costFactory.makeInfiniteCost
      //note: 如果低於目標 cost,這裡再重新設定一個新目標、新的 giveUpTick
      if (root.bestCost.isLe(targetCost)) {
        //note: 本階段第一次執行,目的是為了呼叫 clearImportanceBoost 方法,清除相應的 importance 資訊
        if (firstFiniteTick < 0) {
          firstFiniteTick = cumulativeTicks;

          //note: 對於那些手動提高 importance 的 RelSubset 進行重新計算
          clearImportanceBoost();
        }
        if (ambitious) {
          // Choose a slightly more ambitious target cost, and
          // try again. If it took us 1000 iterations to find our
          // first finite plan, give ourselves another 100
          // iterations to reduce the cost by 10%.
          //note: 設定 target 為當前 best cost 的 0.9,調整相應的目標,再進行優化
          targetCost = root.bestCost.multiplyBy(0.9);
          ++splitCount;
          if (impatient) {
            if (firstFiniteTick < 10) {
              // It's possible pre-processing can create
              // an implementable plan -- give us some time
              // to actually optimize it.
              //note: 有可能在 pre-processing 階段就實現一個 implementable plan,所以先設定一個值,後面再去優化
              giveUpTick = cumulativeTicks + 25;
            } else {
              giveUpTick =
                  cumulativeTicks
                      + Math.max(firstFiniteTick / 10, 25);
            }
          }
        } else {
          break;
        }
      //note: 最近沒有任何進步(超過 giveUpTick 限制,還沒達到目標值),直接採用當前的 best plan
      } else if (cumulativeTicks > giveUpTick) {
        // We haven't made progress recently. Take the current best.
        break;
      } else if (root.bestCost.isInfinite() && ((tick % 10) == 0)) {
        injectImportanceBoost();
      }

      LOGGER.debug("PLANNER = {}; TICK = {}/{}; PHASE = {}; COST = {}",
          this, cumulativeTicks, tick, phase.toString(), root.bestCost);

      VolcanoRuleMatch match = ruleQueue.popMatch(phase);
      //note: 如果沒有規則,會直接退出當前的階段
      if (match == null) {
        break;
      }

      assert match.getRule().matches(match);
      //note: 做相應的規則匹配
      match.onMatch();

      // The root may have been merged with another
      // subset. Find the new root subset.
      root = canonize(root);
    }

    //note: 當期階段完成,移除 ruleQueue 中記錄的 rule-match list
    ruleQueue.phaseCompleted(phase);
  }
  if (LOGGER.isTraceEnabled()) {
    StringWriter sw = new StringWriter();
    final PrintWriter pw = new PrintWriter(sw);
    dump(pw);
    pw.flush();
    LOGGER.trace(sw.toString());
  }
  //note: 根據 plan 構建其 RelNode 樹
  RelNode cheapest = root.buildCheapestPlan(this);
  if (LOGGER.isDebugEnabled()) {
    LOGGER.debug(
        "Cheapest plan:\n{}", RelOptUtil.toString(cheapest, SqlExplainLevel.ALL_ATTRIBUTES));

    LOGGER.debug("Provenance:\n{}", provenance(cheapest));
  }
  return cheapest;
}

整體的流程正如前面所述,這裡來看下 RuleQueue 中popMatch()方法的實現,它的目的是選擇 the highest importance 的 RuleMatch,這個方法的實現如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
//org.apache.calcite.plan.volcano.RuleQueue
/**
 * Removes the rule match with the highest importance, and returns it.
 *
 * note:返回最高 importance 的 rule,並從 Rule Match 中移除(處理過後的就移除)
 * note:如果集合為空,就返回 null
 * <p>Returns {@code null} if there are no more matches.</p>
 *
 * <p>Note that the VolcanoPlanner may still decide to reject rule matches
 * which have become invalid, say if one of their operands belongs to an
 * obsolete set or has importance=0.
 *
 * @throws java.lang.AssertionError if this method is called with a phase
 *                              previously marked as completed via
 *                              {@link #phaseCompleted(VolcanoPlannerPhase)}.
 */
VolcanoRuleMatch popMatch(VolcanoPlannerPhase phase) {
  dump();

  //note: 選擇當前階段對應的 PhaseMatchList
  PhaseMatchList phaseMatchList = matchListMap.get(phase);
  if (phaseMatchList == null) {
    throw new AssertionError("Used match list for phase " + phase
        + " after phase complete");
  }

  final List<VolcanoRuleMatch> matchList = phaseMatchList.list;
  VolcanoRuleMatch match;
  for (;;) {
    //note: 按照前面的邏輯只有在 OPTIMIZE 階段,PhaseMatchList 才不為空,其他階段都是空
    // 參考 addMatch 方法
    if (matchList.isEmpty()) {
      return null;
    }
    if (LOGGER.isTraceEnabled()) {
      matchList.sort(MATCH_COMPARATOR);
      match = matchList.remove(0);

      StringBuilder b = new StringBuilder();
      b.append("Sorted rule queue:");
      for (VolcanoRuleMatch match2 : matchList) {
        final double importance = match2.computeImportance();
        b.append("\n");
        b.append(match2);
        b.append(" importance ");
        b.append(importance);
      }

      LOGGER.trace(b.toString());
    } else { //note: 直接遍歷找到 importance 最大的 match(上面先做排序,是為了輸出日誌)
      // If we're not tracing, it's not worth the effort of sorting the
      // list to find the minimum.
      match = null;
      int bestPos = -1;
      int i = -1;
      for (VolcanoRuleMatch match2 : matchList) {
        ++i;
        if (match == null
            || MATCH_COMPARATOR.compare(match2, match) < 0) {
          bestPos = i;
          match = match2;
        }
      }
      match = matchList.remove(bestPos);
    }

    if (skipMatch(match)) {
      LOGGER.debug("Skip match: {}", match);
    } else {
      break;
    }
  }

  // A rule match's digest is composed of the operand RelNodes' digests,
  // which may have changed if sets have merged since the rule match was
  // enqueued.
  //note: 重新計算一下這個 RuleMatch 的 digest
  match.recomputeDigest();

  //note: 從 phaseMatchList 移除這個 RuleMatch
  phaseMatchList.matchMap.remove(
      planner.getSubset(match.rels[0]), match);

  LOGGER.debug("Pop match: {}", match);
  return match;
}

到這裡,我們就把 VolcanoPlanner 的優化講述完了,當然並沒有面面俱到所有的細節,VolcanoPlanner 的整體處理圖如下:

640?wx_fmt=png


VolcanoPlanner 整體處理流程

一些思考

1. 初始化 RuleQueue 時,新增的 one useless rule name 有什麼用?

在初始化 RuleQueue 時,會給 VolcanoPlanner 的四個階段PRE_PROCESS_MDR, PRE_PROCESS, OPTIMIZE, CLEANUP都初始化一個 PhaseMatchList 物件(記錄這個階段對應的 RuleMatch),這時候會給其中的三個階段新增一個 useless rule,如下所示:

1
2
3
4
5
6
7
8
9
10
protected VolcanoPlannerPhaseRuleMappingInitializer
    getPhaseRuleMappingInitializer() {
  return phaseRuleMap -> {
    // Disable all phases except OPTIMIZE by adding one useless rule name.
    //note: 通過新增一個無用的 rule name 來 disable 優化器的其他三個階段
    phaseRuleMap.get(VolcanoPlannerPhase.PRE_PROCESS_MDR).add("xxx");
    phaseRuleMap.get(VolcanoPlannerPhase.PRE_PROCESS).add("xxx");
    phaseRuleMap.get(VolcanoPlannerPhase.CLEANUP).add("xxx");
  };
}

開始時還困惑這個什麼用?後來看到下面的程式碼基本就明白了

1
2
3
4
5
6
7
8
for (VolcanoPlannerPhase phase : VolcanoPlannerPhase.values()) {
  // empty phases get converted to "all rules"
  //note: 如果階段對應的 rule set 為空,那麼就給這個階段對應的 rule set 新增一個 【ALL_RULES】
  //也就是隻有 OPTIMIZE 這個階段對應的會新增 ALL_RULES
  if (phaseRuleMapping.get(phase).isEmpty()) {
    phaseRuleMapping.put(phase, ALL_RULES);
  }
}

後面在呼叫 RuleQueue 的addMatch()方法會做相應的判斷,如果 phaseRuleSet 不為 ALL_RULES,並且 phaseRuleSet 不包含這個 ruleClassName 時,那麼就跳過這個 RuleMatch,也就是說實際上只有OPTIMIZE這個階段是發揮作用的,其他階段沒有新增任何 RuleMatch。

2. 四個 phase 實際上只用了 1個階段,為什麼要設定4個階段?

VolcanoPlanner 的四個階段PRE_PROCESS_MDR, PRE_PROCESS, OPTIMIZE, CLEANUP,實際只有OPTIMIZE進行真正的優化操作,其他階段並沒有,這裡自己是有一些困惑的:

  1. 為什麼要分為4個階段,在新增 RuleMatch 時,是向四個階段同時新增,這個設計有什麼好處?為什麼要優化四次?

  2. 設計了4個階段,為什麼預設只用了1個?

這兩個問題,暫時也沒有頭緒,有想法的,歡迎交流。