Flink開發_Flink的SQL和TableAPI的UDF

阿新 • • 發佈：2020-12-05

Flink Table API & SQL

 關係型資料庫中:database.schema.table
               其他
 分散式資料庫中:catalog.database.table 
   這裡的自定義函式，主要指在 Flink Table API & SQL 這個層級的自定義函式，注意和Datastream有所區別

1.函式區分

1.從兩個角度來區分函式
    從函式的擁有來講： 系統內建函式 和自定義函式 system (or built-in) functions v.s. catalog functions
	從生命週期來講：   分為臨時函式和永久函式    temporary functions           v.s.  persistent functions
2.因此組合的方式：
    Temporary system functions
    System functions
    Temporary catalog functions
    Catalog functions

2.如何使用函式：

精確引用   1.10版本及以上使用
模糊引用： 在SQL語句中使用，針對同名的情況，引用的順序是：
   Temporary system function
   System function
   Temporary catalog function, in the current catalog and current database of the session
   Catalog function, in the current catalog and current database of the session

3.系統內建函式

Scalar Functions 標量型函式
    算數函式  代數函式(= != >  isnull in exists between)  邏輯函式(and or )
	字元函式  時間函式
	條件函式 case when   COALESCE  IS_DIGIT
	函式型別轉換  CAST
	Collection函式： ARRAY  Map 
	分組函式 Hash值函式
Aggregate Functions 聚合型函式
   max min sum count 
   COLLECT
   ROW_NUMBER   DENSE_RANK RANK()
      ROW_NUMBER 1 2 3 4
      dense_rank 函式在生成序號時是連續的，1 2  2 3    dense稠密
      rank      函式生成的序號有可能不連續。 1 2 2 4
Column Functions 列函式 -Column functions are only used in Table API.
   withColumns
   withoutColumns

Flink的UDF

 擴充套件了查詢的表達能力，同時可以把這種表達能力開放出去
  基於JVM語言的UDF： Java Scala

自定義函式型別

        Scalar functions      map scalar values                  to a new scalar value.
         Table functions      map scalar values                  to new rows.
     Aggregate functions      map scalar values of multiple rows to a new scalar value.
    Table aggregate functions map scalar values of multiple rows to new rows.
     Async table functions are special functions for table sources that perform a lookup.
	 從與Hive比較角度： UDF UDTF UDAF
	 Flink自身又有說細分和增加
 版本的不同：
      UDF UDTF: org.apache.flink.table.types.DataType        
	  UDAF    : org.apache.flink.api.common.typeinfo.TypeInformation 
	            aggregate 這部分正在重構，目前是使用TypeInformation，重構後使用DataType
(注意： Flink設計型別資訊的有
  TypeInformation  org.apache.flink.api.common.typeinfo.Types
                   org.apache.flink.api.common.typeinfo.TypeInformation
  Type             org.apache.flink.table.api.Types
  DataType         org.apache.flink.table.types.DataType  1.9版本以後移除了對 TypeInformation 的依賴
  )
    * @see ScalarFunction      org.apache.flink.table.functions.
    * @see TableFunction       org.apache.flink.table.functions.
    * @see AggregateFunction
    * @see TableAggregateFunction
	* @see AsyncTableFunction

 如何編寫
 如何呼叫： both Table API and SQL.
   For SQL queries , a function must always be registered under a name. 
   For Table API   , a function can be registered or directly used inline
示例：
0.編寫UDF
   // define function logic
   public static class SubstringFunction extends ScalarFunction {
     public String eval(String s, Integer begin, Integer end) {
       return s.substring(begin, end);
     }
   }
###使用UDF
1.對於SQL來講，需要註冊，然後在SQL中使用
  // register function
  env.createTemporarySystemFunction("SubstringFunction", SubstringFunction.class);
  // call registered function in SQL
  env.sqlQuery("SELECT SubstringFunction(myField, 5, 12) FROM MyTable");

2.對於TableAPI來說，可以直接用，或者註冊後再在Table API中使用
  // call function "inline" without registration in Table API
  env.from("MyTable").select(call(SubstringFunction.class, $("myField"), 5, 12));
  
  // register function
  env.createTemporarySystemFunction("SubstringFunction", SubstringFunction.class);
  // call registered function in Table API
  env.from("MyTable").select(call("SubstringFunction", $("myField"), 5, 12));

具體說明

   Udf提供了open()和close()方法，可以被複寫，功能類似Dataset和DataStream API的RichFunction方法
 1.UDF繼承 ScalarFunction 抽象類，主要實現 eval 方法。
   輸出一行
   org.apache.flink.table.functions
     public abstract class ScalarFunction extends UserDefinedFunction {}
	 注意：返回值型別： 
	     基本的返回值型別 和自定義複雜的返回值型別
		  複雜的可能要實現方法： getResultType()
 2.UDF繼承 TableFunction 抽象類，主要實現 eval 方法。
    輸出任意數目的行數。返回的行也可以包含一個或者多個列，通過提供 provide a collect(T) method
    org.apache.flink.table.functions  
	 public abstract class TableFunction<T> extends UserDefinedFunction {}
3.Aggregation Functions
   The following methods are mandatory for each AggregateFunction:
     createAccumulator()
     accumulate()
     getValue()
    Spark SQL的UDAF  UserDefinedAggregateFunction
 Flink： org.apache.flink.table.functions.AggregateFunction
	public abstract class AggregateFunction<T, ACC> extends UserDefinedAggregateFunction<T, ACC> {
	    <IN, ACC, OUT>
		必不可少的： createAccumulator() accumulate() getValue()
        The following methods of AggregateFunction are required depending on the use case
	       merge()方法在會話組視窗（session group window）上下文中是必須的
		   retract() 
		   resetAccumulator()
 Spark中 org.apache.spark.sql.expressions
   public abstract class UserDefinedAggregateFunction extends Serializable{}
    inputSchema  bufferSchema  dataType
    initialize  update  merge  evaluate
 Hive: org.apache.hadoop.hive.ql.udf.generic.AbstractGenericUDAFResolver
   1）需繼承AbstractGenericUDAFResolver抽象類，重寫方法getEvaluator(TypeInfo[] parameters)；
   2）內部靜態類需繼承GenericUDAFEvaluator抽象類，重寫方法 init() 
      實現方法 getNewAggregationBuffer() reset() iterate() terminatePartial() merge() terminate() 

4.Table Aggregation Functions
  TableAggregateFunction
    createAccumulator()
    accumulate()
 The following methods of TableAggregateFunction are required depending on the use case:
     retract() is required for aggregations on bounded OVER windows.
     merge() is required for many batch aggregations and session window aggregations.
     resetAccumulator() is required for many batch aggregations.
     emitValue() is required for batch and window aggregations.
  emitUpdateWithRetract

參考：

https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/table/functions/systemFunctions.html
https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/table/functions/udfs.html

Flink開發-Flink的計算模型和介面

Flink的計算模型和介面開發模型開發步驟：資料輸入資料轉換資料輸出 1.資料輸入- DataSource

Flink開發_Flink的SQL和TableAPI的UDF

Flink Table API & SQL 關係型資料庫中:database.schema.table 其他分散式資料庫中:catalog.database.table

Flink 系列（二）—— Flink 開發環境搭建

一、安裝 Scala 外掛 Flink 分別提供了基於 Java 語言和 Scala 語言的 API ，如果想要使用 Scala 語言來開發 Flink 程式，可以通過在 IDEA 中安裝 Scala 外掛來提供語法提示，程式碼高亮等功能。開啟 IDEA,依次點選

Flink開發IDEA環境搭建與測試的方法

一.IDEA開發環境 1.pom檔案設定 <properties> <maven.compiler.source>1.8</maven.compiler.source>

vue專案打包之開發環境和部署環境的實現

專案開發階段和生產環境可能不一樣如前端在開發階段，介面可能是自己使用 node.js 搭建的伺服器，API 返回的也都是假資料，等後臺介面開發好後，再切換成後臺提供的介面，等測試沒有問題，服務端上線後，又要改成正

flink中Time和window

一、Time （1）Even time 　　1、Event Time 是事件發生的時間，一般就是資料本身攜帶的時間。這個時間通常是在事件到達 Flink 之前就確定的，並且可以從每個事件中獲取到事件時間戳。

入門大資料---Flink開發環境搭建

Flink 引數配置和常見引數調優

1、Flink引數配置 jobmanger.rpc.address：jobmanger的地址 jobmanger.rpc.port：jobmanger的埠 jobmanager.heap.mb：jobmanager的堆記憶體大小。不建議配的太大，1-2G足夠。

PythonOCC開發-如何搭建開發環境和一個建立圓臺例子

轉載出處，學習資料https://blog.csdn.net/weixin_42755384/article/details/84138407 https://blog.csdn.net/weixin_42755384/article/details/87893697

Spring註解開發@Bean和@ComponentScan使用案例

元件註冊用@Bean來註冊搭建好maven web工程 pom加入spring-context，spring-core等核心依賴

Flink例項（一）: flink開發環境準備

1. 工程目錄 pom.xml <?xml version=\"1.0\" encoding=\"UTF-8\"?> <project xmlns=\"http://maven.apache.org/POM/4.0.0\"

關於B/S簡訊平臺系統新的開發需求和升級內容（這裡講述後臺的改變和修改的地方）

根據使用者和市場需要我們按照要求升級了簡訊系統。所謂一白遮三醜，東西的好壞第一步就是外觀和介面是否漂亮是否符合時代所以這次我們不僅在更換了介面還增加了功能

編碼員，程式設計師，黑客，開發人員和電腦科學家走進維恩圖

A friend recently said: "I want to learn how to code. How and where do I start?" 最近有一位朋友說：“我想學習編碼。我從哪裡開始？”

canary 版本_Google Chrome開發人員和Canary頻道的新64位版本現已釋出

canary 版本 Are you looking for the ‘hottest’ combination of Google Chrome and 64-bit Windows systems? Then you will definitely want to give the new 64-bit builds of Google Chrome in

解決電腦公用網路被防火牆攔截（開發版和電腦通過網線連線）

問題描述開發板通過網線與windows連線電腦能夠ping通開發板但是開發板不能夠ping通windows（因為有防火牆攔截）

資料開發_Python和Java在函式引數傳遞以及賦值的總結

理解的角度函式引數傳遞機制和變數賦值函式呼叫的角度值傳遞（passl-by-value），是把實參的值賦值給形參。那麼對形參的修改，不影響實參的值

Android開發--Service和Activity通過廣播傳遞訊息

　　Android的Service也執行在主執行緒，但是在服務裡面是沒法直接呼叫更改UI，如果需要服務傳遞訊息給Activity，通過廣播是其中的一種方法：

Flink開發中的問題

1. 流與批處理的區別流處理系統流處理系統，其節點間資料傳輸的標準模型是：當一條資料被處理完成後，序列化到快取中，然後立刻通過網路傳輸到下一個節點，由下一個節點繼續處理。

Flink之TableAPI和SQL（3）：通過TableAPI和SQL表的一些操作（包括查詢，過濾，聚集等）

具體實現如下程式碼所示： // 1、建立執行環境 val env: StreamExecutionEnvironment = StreamExecutionEnvironment.getExecutionEnvironment

Flink開發_Flink中的函式介面

Flink 函式： org.apache.flink.api.common.functions group reduce combine 1.轉換Transformation public interface FilterFunction<T> extends Function, Serializable {

Flink開發_Flink的SQL和TableAPI的UDF

Flink Table API & SQL

1.函式區分

2.如何使用函式：

3.系統內建函式

Flink的UDF

自定義函式型別

具體說明

參考：

相關推薦