Flink基礎（四十二）：FLINK SQL(十八) 配置

阿新 • • 發佈：2020-11-28

0 配置

Table 和 SQL API 的預設配置能夠確保結果準確，同時也提供可接受的效能。

根據 Table 程式的需求，可能需要調整特定的引數用於優化。例如，無界流程式可能需要保證所需的狀態是有限的(請參閱流式概念).

1 概覽

在每個 TableEnvironment 中，TableConfig提供用於當前會話的配置項。

對於常見或者重要的配置項，TableConfig提供帶有詳細註釋的getters和setters方法。

對於更加高階的配置，使用者可以直接訪問底層的 key-value 配置項。以下章節列舉了所有可用於調整 Flink Table 和 SQL API 程式的配置項。

注意因為配置項會在執行操作的不同時間點被讀取，所以推薦在例項化 TableEnvironment 後儘早地設定配置項。

// instantiate table environment
val tEnv: TableEnvironment = ...

// access flink configuration
val configuration = tEnv.getConfig().getConfiguration()
// set low-level key-value options
configuration.setString("table.exec.mini-batch.enabled", "true")
configuration.setString("table.exec.mini-batch.allow-latency", "5 s")
configuration.setString( 
"table.exec.mini-batch.size", "5000")

注意目前，key-value 配置項僅被 Blink planner 支援。

2 執行配置

以下選項可用於優化查詢執行的效能。

Key	Default	Type	Description
table.exec.async-lookup.buffer-capacity BatchStreaming	100	Integer	The max number of async i/o operation that the async lookup join can trigger.
table.exec.async-lookup.timeout BatchStreaming	"3 min"	String	The async timeout for the asynchronous operation to complete.
table.exec.disabled-operators Batch	(none)	String	Mainly for testing. A comma-separated list of operator names, each name represents a kind of disabled operator. Operators that can be disabled include "NestedLoopJoin", "ShuffleHashJoin", "BroadcastHashJoin", "SortMergeJoin", "HashAgg", "SortAgg". By default no operator is disabled.
table.exec.mini-batch.allow-latency Streaming	"-1 ms"	String	The maximum latency can be used for MiniBatch to buffer input records. MiniBatch is an optimization to buffer input records to reduce state access. MiniBatch is triggered with the allowed latency interval and when the maximum number of buffered records reached. NOTE: If table.exec.mini-batch.enabled is set true, its value must be greater than zero.
table.exec.mini-batch.enabled Streaming	false	Boolean	Specifies whether to enable MiniBatch optimization. MiniBatch is an optimization to buffer input records to reduce state access. This is disabled by default. To enable this, users should set this config to true. NOTE: If mini-batch is enabled, 'table.exec.mini-batch.allow-latency' and 'table.exec.mini-batch.size' must be set.
table.exec.mini-batch.size Streaming	-1	Long	The maximum number of input records can be buffered for MiniBatch. MiniBatch is an optimization to buffer input records to reduce state access. MiniBatch is triggered with the allowed latency interval and when the maximum number of buffered records reached. NOTE: MiniBatch only works for non-windowed aggregations currently. If table.exec.mini-batch.enabled is set true, its value must be positive.
table.exec.resource.default-parallelism BatchStreaming	-1	Integer	Sets default parallelism for all operators (such as aggregate, join, filter) to run with parallel instances. This config has a higher priority than parallelism of StreamExecutionEnvironment (actually, this config overrides the parallelism of StreamExecutionEnvironment). A value of -1 indicates that no default parallelism is set, then it will fallback to use the parallelism of StreamExecutionEnvironment.
table.exec.shuffle-mode Batch	"ALL_EDGES_BLOCKING"	String	Sets exec shuffle mode. Accepted values are: `ALL_EDGES_BLOCKING`: All edges will use blocking shuffle. `FORWARD_EDGES_PIPELINED`: Forward edges will use pipelined shuffle, others blocking. `POINTWISE_EDGES_PIPELINED`: Pointwise edges will use pipelined shuffle, others blocking. Pointwise edges include forward and rescale edges. `ALL_EDGES_PIPELINED`: All edges will use pipelined shuffle. `batch`: the same as`ALL_EDGES_BLOCKING`. Deprecated. `pipelined`: the same as`ALL_EDGES_PIPELINED`. Deprecated. Note: Blocking shuffle means data will be fully produced before sent to consumer tasks. Pipelined shuffle means data will be sent to consumer tasks once produced.
table.exec.sink.not-null-enforcer BatchStreaming	ERROR	Enum Possible values: [ERROR, DROP]	The NOT NULL column constraint on a table enforces that null values can't be inserted into the table. Flink supports 'error' (default) and 'drop' enforcement behavior. By default, Flink will check values and throw runtime exception when null values writing into NOT NULL columns. Users can change the behavior to 'drop' to silently drop such records without throwing exception.
table.exec.sort.async-merge-enabled Batch	true	Boolean	Whether to asynchronously merge sorted spill files.
table.exec.sort.default-limit Batch	-1	Integer	Default limit when user don't set a limit after order by. -1 indicates that this configuration is ignored.
table.exec.sort.max-num-file-handles Batch	128	Integer	The maximal fan-in for external merge sort. It limits the number of file handles per operator. If it is too small, may cause intermediate merging. But if it is too large, it will cause too many files opened at the same time, consume memory and lead to random reading.
table.exec.source.idle-timeout Streaming	"-1 ms"	String	When a source do not receive any elements for the timeout time, it will be marked as temporarily idle. This allows downstream tasks to advance their watermarks without the need to wait for watermarks from this source while it is idle.
table.exec.spill-compression.block-size Batch	"64 kb"	String	The memory size used to do compress when spilling data. The larger the memory, the higher the compression ratio, but more memory resource will be consumed by the job.
table.exec.spill-compression.enabled Batch	true	Boolean	Whether to compress spilled data. Currently we only support compress spilled data for sort and hash-agg and hash-join operators.
table.exec.window-agg.buffer-size-limit Batch	100000	Integer	Sets the window elements buffer size limit used in group window agg operator.

3 優化器配置

以下配置可以用於調整查詢優化器的行為以獲得更好的執行計劃。

Key	Default	Type	Description
table.optimizer.agg-phase-strategy BatchStreaming	"AUTO"	String	Strategy for aggregate phase. Only AUTO, TWO_PHASE or ONE_PHASE can be set. AUTO: No special enforcer for aggregate stage. Whether to choose two stage aggregate or one stage aggregate depends on cost. TWO_PHASE: Enforce to use two stage aggregate which has localAggregate and globalAggregate. Note that if aggregate call does not support optimize into two phase, we will still use one stage aggregate. ONE_PHASE: Enforce to use one stage aggregate which only has CompleteGlobalAggregate.
table.optimizer.distinct-agg.split.bucket-num Streaming	1024	Integer	Configure the number of buckets when splitting distinct aggregation. The number is used in the first level aggregation to calculate a bucket key 'hash_code(distinct_key) % BUCKET_NUM' which is used as an additional group key after splitting.
table.optimizer.distinct-agg.split.enabled Streaming	false	Boolean	Tells the optimizer whether to split distinct aggregation (e.g. COUNT(DISTINCT col), SUM(DISTINCT col)) into two level. The first aggregation is shuffled by an additional key which is calculated using the hashcode of distinct_key and number of buckets. This optimization is very useful when there is data skew in distinct aggregation and gives the ability to scale-up the job. Default is false.
table.optimizer.join-reorder-enabled BatchStreaming	false	Boolean	Enables join reorder in optimizer. Default is disabled.
table.optimizer.join.broadcast-threshold Batch	1048576	Long	Configures the maximum size in bytes for a table that will be broadcast to all worker nodes when performing a join. By setting this value to -1 to disable broadcasting.
table.optimizer.reuse-source-enabled BatchStreaming	true	Boolean	When it is true, the optimizer will try to find out duplicated table sources and reuse them. This works only when table.optimizer.reuse-sub-plan-enabled is true.
table.optimizer.reuse-sub-plan-enabled BatchStreaming	true	Boolean	When it is true, the optimizer will try to find out duplicated sub-plans and reuse them.
table.optimizer.source.predicate-pushdown-enabled BatchStreaming	true	Boolean	When it is true, the optimizer will push down predicates into the FilterableTableSource. Default value is true.

4 Planner 配置

以下配置可以用於調整 planner 的行為。

Key	Default	Type	Description
table.dynamic-table-options.enabled BatchStreaming	false	Boolean	Enable or disable the OPTIONS hint used to specify table optionsdynamically, if disabled, an exception would be thrown if any OPTIONS hint is specified
table.sql-dialect BatchStreaming	"default"	String	The SQL dialect defines how to parse a SQL query. A different SQL dialect may support different SQL grammar. Currently supported dialects are: default and hive

Flink基礎（四十二）：FLINK SQL(十八) 配置

0 配置 Table 和 SQL API 的預設配置能夠確保結果準確，同時也提供可接受的效能。

Flink基礎（二十二）：FLINK基本題（一）

1 簡單介紹一下 Flink 　　Flink 是一個框架和分散式處理引擎，用於對無界和有界資料流進行有狀態計算。並且 Flink 提供了資料分佈、容錯機制以及資源管理等核心功能。Flink提供了諸多高抽象層的API以便使用者編寫分

Flink基礎（三十二）：FLINK SQL(八)DESCRIBE 語句

DESCRIBE 語句用來描述一張表或者檢視的 Schema。執行 DESCRIBE 語句 DESCRIBE 語句可以通過TableEnvironment的executeSql()執行，也可以在SQL CLI中執行 DROP 語句。若 DESCRIBE 操作執行成功，executeSql() 方法

spring5入門（十二）：Aspect註解使用、配置檔案使用

使用註解實現前置通知 # 建立普通類 public class User { public void add() { System.out.println(\"add.......\");

Flink基礎（三十九）：FLINK SQL(十五) 函式（四）自定義函式（二）

1 標量函式自定義標量函式可以把 0 到多個標量值對映成 1 個標量值，資料型別裡列出的任何資料型別都可作為求值方法的引數和返回值型別。

Flink基礎（二十六）：FLINK SQL(二)查詢語句（二）操作符（一）

1 Scan、Projection 與 Filter 操作符描述 Scan / Select / As批處理流處理 SELECT * FROM Orders

Flink基礎（二十五）：FLINK SQL(一)查詢語句（一）基本查詢

來源：https://ci.apache.org/projects/flink/flink-docs-release-1.11/zh/dev/table/sql/queries.html 0 簡介

Flink基礎（二十七）：FLINK SQL(三)查詢語句（三）操作符（二）

5 OrderBy & Limit 操作符描述 Order By批處理流處理注意：流處理結果需主要根據時間屬性按照升序進行排序。支援使用其他排序屬性。

Flink基礎（二十九）：FLINK SQL(五)DROP 語句

0 簡介 DROP 語句用於從當前或指定的Catalog中刪除一個已經註冊的表、檢視或函式。

Flink基礎（三十六）：FLINK SQL(十二) 函式（一）概述

0 函式 Flink 允許使用者在 Table API 和 SQL 中使用函式進行資料的轉換。 1 函式型別

Flink基礎（四十）：FLINK SQL(十六) 函式（五）自定義函式（三）

表值聚合函式自定義表值聚合函式（UDTAGG）可以把一個表（一行或者多行，每行有一列或者多列）聚合成另一張表，結果中可以有多行多列。

Flink基礎（四十一）：FLINK SQL(十七)Catalogs

0 簡介 Catalog 提供了元資料資訊，例如資料庫、表、分割槽、檢視以及資料庫或其他外部系統中儲存的函式和資訊。

Flink實戰（八十五）：flink-sql使用（十二）Flink 與 hive 結合使用（四）Hive Read & Write

0 簡介 Using the HiveCatalog and Flink’s connector to Hive, Flink can read and write from Hive data as an alternative to Hive’s batch engine. Be sure to follow the instructions to include the co

Flink實戰（七十二）：監控（四）自定義metrics相關指標（二）

技術標籤：Flink入門宣告：本系列部落格是根據SGG的視訊整理而成，非常適合大家入門學習。

Java SE基礎鞏固（十二）：單元測試

1 概述總所周知，測試是軟體開發中一個非常重要的環節，用來驗證程式執行是否符合預期（這個預期包括了程式的正確性、效能質量等），如果不符合預期，就根據測試的結果報告定位問題，修復問題，然後再次測試，這個過

Hadoop基礎（二十二）：Shuffle機制（三）

7 Combiner合併（6）自定義Combiner實現步驟（a）自定義一個Combiner繼承Reducer，重寫Reduce方法

Hadoop基礎（五十二）：企業級調優（二）

4 資料傾斜 4.1 合理設定 Map 數 1）通常情況下，作業會通過 input 的目錄產生一個或者多個 map 任務。

Hbase基礎（十二）：Kylin視覺化

可以與Kylin結合使用的視覺化工具很多，例如： ODBC：與Tableau、Excel、PowerBI等工具整合

Flink基礎（十七）： Flink CEP 簡介

1.什麼是 CEP 概念：　　一個或多個由簡單事件構成的事件流通過一定的規則匹配，然後輸出使用者想得到的資料，滿足規則的複雜事件。

大資料實戰（四十二）：電商數倉（三十五）之使用者行為資料倉庫（二十一）月活躍率

月活躍使用者與截止到該月累計的使用者總和之間的比例 1 DWS層使用DWS層月活表以及ADS新增使用者表作為DWS層

Flink基礎（四十二）：FLINK SQL(十八) 配置

0 配置

1 概覽

2 執行配置

table.exec.async-lookup.buffer-capacity

table.exec.async-lookup.timeout

table.exec.disabled-operators

table.exec.mini-batch.allow-latency

table.exec.mini-batch.enabled

table.exec.mini-batch.size

table.exec.resource.default-parallelism

table.exec.shuffle-mode

table.exec.sink.not-null-enforcer

table.exec.sort.async-merge-enabled

table.exec.sort.default-limit

table.exec.sort.max-num-file-handles

table.exec.source.idle-timeout

table.exec.spill-compression.block-size

table.exec.spill-compression.enabled

table.exec.window-agg.buffer-size-limit

3 優化器配置

table.optimizer.agg-phase-strategy

table.optimizer.distinct-agg.split.bucket-num

table.optimizer.distinct-agg.split.enabled

table.optimizer.join-reorder-enabled

table.optimizer.join.broadcast-threshold

table.optimizer.reuse-source-enabled

table.optimizer.reuse-sub-plan-enabled

table.optimizer.source.predicate-pushdown-enabled

4 Planner 配置

table.dynamic-table-options.enabled

table.sql-dialect

相關推薦