Spark Structured Streaming框架(5)之進程管理
Structured Streaming提供一些API來管理Streaming對象。用戶可以通過這些API來手動管理已經啟動的Streaming,保證在系統中的Streaming有序執行。
1. StreamingQuery
在調用DataStreamWriter方法的start啟動Streaming後,會返回一個StreamingQuery對象。所以用戶就可以通過這個對象來管理Streaming。
如下所示:
val query = df.writeStream.format("console").start() // get the query object
query.id // get the unique identifier of the running query that persists across restarts from checkpoint data
query.runId // get the unique id of this run of the query, which will be generated at every start/restart
query.name // get the name of the auto-generated or user-specified name
query.explain() // print detailed explanations of the query
query.stop() // stop the query
query.awaitTermination() // block until query is terminated, with stop() or with error
query.exception // the exception if the query has been terminated with error
query.recentProgress // an array of the most recent progress updates for this query
query.lastProgress // the most recent progress update of this streaming query |
2. StreamingQueryManager
Structured Streaming提供了另外一個管理Streaming的接口是:StreamingQueryManager。用戶可以通過SparkSession對象的streams方法獲得。
如下所示:
val spark: SparkSession = ... val streamManager = spark.streams() streamManager.active // get the list of currently active streaming queries
streamManager.get(id) // get a query object by its unique id
streamManager.awaitAnyTermination() // block until any one of them terminates |
3. 參考文獻
[1]. Structured Streaming Programming Guide.
[2]. Kafka Integration Guide.
Spark Structured Streaming框架(5)之進程管理