單詞統計(scala程式)
阿新 • • 發佈:2018-12-24
1.建立Scala模組,並新增pom.xml
<?xml version="1.0" encoding="UTF-8"?> <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> <modelVersion>4.0.0</modelVersion> <groupId>com.ithuang</groupId> <artifactId>SparkDemo1</artifactId> <version>1.0-SNAPSHOT</version> <dependencies> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-core_2.11</artifactId> <version>2.1.0</version> </dependency> </dependencies> </project>
2.編寫scala檔案
import org.apache.spark.{SparkConf, SparkContext} object WordCountDemo { def main(args: Array[String]): Unit = { //建立Spark配置物件 val conf = new SparkConf(); conf.setAppName("WordCountSpark") //設定master屬性 conf.setMaster("local") ; //通過conf建立sc val sc = new SparkContext(conf); //載入文字檔案 val rdd1 = sc.textFile("d:/Test.txt"); //壓扁 val rdd2 = rdd1.flatMap(line => line.split(" ")) ; //對映w => (w,1) val rdd3 = rdd2.map((_,1)) val rdd4 = rdd3.reduceByKey(_ + _) val r = rdd4.collect() r.foreach(println) } }
注意:檔案"d:/Test.txt"讀者可以自己建立,只要符合單詞之間用空格來分割即可