1. 程式人生 > >spark streaming 程式在linux執行

spark streaming 程式在linux執行

● 將程式碼打成jar包上傳至linux

package com.ws.saprk
import org.apache.spark.streaming.dstream.DStream
import org.apache.spark.streaming.{Seconds, StreamingContext}
import org.apache.spark.{SparkConf, SparkContext}
object StreamingTextFile {
  def main(args: Array[String]): Unit = {
    val conf = new SparkConf().setAppName("StreamingTextFile")

    val ssc = new StreamingContext(conf,Seconds(5))
    
	//這邊有個坑,不能在本地上執行(windows),而且linux下也只能往該目錄下通過流的方式追加資料才會被讀取
	//比如echo xxxxx >> /root/test/game.log,會被streaming識別執行
	//而且,原來在此目錄存在的檔案也不會被識別,只有新增的並且通過流的資料才會識別!!!!!
    val test: DStream[String] = ssc.textFileStream("/root/test/")

    val splitArr = test.flatMap(_.split(" "))

    val result = splitArr.map(x=>(x,1)).reduceByKey(_+_)
    
    result.print()
    
    ssc.start()
    
    ssc.awaitTermination()
  }
}

● spark-submit 執行jar包

#這邊使用ip簡寫(qjw-01)也有問題
[[email protected] spark-2.1.3]# ./bin/spark-submit --master spark://192.168.0.21:7077 --class com.ws.saprk.StreamingTextFile /root/ws.jar

● 寫入資料

[[email protected] ~]# echo 1 2 3 4 5 6 7 8 9 1 2 43 5 6 5 >> /root/test/i.log

● 結果

-------------------------------------------
Time: 1539098465000 ms
-------------------------------------------

-------------------------------------------
Time: 1539098470000 ms
-------------------------------------------
(4,1)
(8,1)
(6,2)
(2,2)
(7,1)
(5,3)
(9,1)
(3,1)
(1,2)
(43,1)

-------------------------------------------