spark streaming 程式在linux執行
阿新 • • 發佈:2018-11-10
● 將程式碼打成jar包上傳至linux
package com.ws.saprk import org.apache.spark.streaming.dstream.DStream import org.apache.spark.streaming.{Seconds, StreamingContext} import org.apache.spark.{SparkConf, SparkContext} object StreamingTextFile { def main(args: Array[String]): Unit = { val conf = new SparkConf().setAppName("StreamingTextFile") val ssc = new StreamingContext(conf,Seconds(5)) //這邊有個坑,不能在本地上執行(windows),而且linux下也只能往該目錄下通過流的方式追加資料才會被讀取 //比如echo xxxxx >> /root/test/game.log,會被streaming識別執行 //而且,原來在此目錄存在的檔案也不會被識別,只有新增的並且通過流的資料才會識別!!!!! val test: DStream[String] = ssc.textFileStream("/root/test/") val splitArr = test.flatMap(_.split(" ")) val result = splitArr.map(x=>(x,1)).reduceByKey(_+_) result.print() ssc.start() ssc.awaitTermination() } }
● spark-submit 執行jar包
#這邊使用ip簡寫(qjw-01)也有問題
[[email protected] spark-2.1.3]# ./bin/spark-submit --master spark://192.168.0.21:7077 --class com.ws.saprk.StreamingTextFile /root/ws.jar
● 寫入資料
[[email protected] ~]# echo 1 2 3 4 5 6 7 8 9 1 2 43 5 6 5 >> /root/test/i.log
● 結果
------------------------------------------- Time: 1539098465000 ms ------------------------------------------- ------------------------------------------- Time: 1539098470000 ms ------------------------------------------- (4,1) (8,1) (6,2) (2,2) (7,1) (5,3) (9,1) (3,1) (1,2) (43,1) -------------------------------------------