Spark-分組TOPN演算法:
阿新 • • 發佈:2019-01-10
該資料集都為:“http://bigdata.edu360.cn/laozhou” 這個樣子,要求的就是最受歡迎的老師
分組TOPN演算法:
object FavTeacher { def main(args: Array[String]): Unit = { val conf = new SparkConf().setAppName("FavTeacher").setMaster("local") val sc = new SparkContext(conf) //指定以後從哪裡讀取資料 val lines = sc.textFile(args(0)) //整理資料 val teacherAndOne = lines.map(line => { //val line = "http://bigdata.edu360.cn/laozhou" val conSubject = line.split("/")(2) val subject =conSubject.split("[.]")(0) val teacher = line.split("/")(3) (teacher, 1) }) //聚合 val reduced = teacherAndOne.reduceByKey(_+_) //排序 val sorted = reduced.sortBy(_._2, false) //觸發Action執行計算 val result = sorted.collect() //列印 println(result.toBuffer) sc.stop() } }