Spark Window 程式碼片段整理
阿新 • • 發佈:2018-12-10
參考地址:
Window Operations on Event Time
Dataset<Row> words = ... // streaming DataFrame of schema { timestamp: Timestamp, word: String } // Group the data by window and word and compute the count of each group Dataset<Row> windowedCounts = words.groupBy( functions.window(words.col("timestamp"), "10 minutes", "5 minutes"), words.col("word") ).count();
Handling Late Data and Watermarking
Dataset<Row> words = ... // streaming DataFrame of schema { timestamp: Timestamp, word: String } // Group the data by window and word and compute the count of each group Dataset<Row> windowedCounts = words .withWatermark("timestamp", "10 minutes") .groupBy( functions.window(words.col("timestamp"), "10 minutes", "5 minutes"), words.col("word")) .count();
比較好的一個例子