1. 程式人生 > >單詞統計(scala程式)

單詞統計(scala程式)

1.建立Scala模組,並新增pom.xml
  

<?xml version="1.0" encoding="UTF-8"?>
		<project xmlns="http://maven.apache.org/POM/4.0.0"
				 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
				 xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
			<modelVersion>4.0.0</modelVersion>

			<groupId>com.ithuang</groupId>
			<artifactId>SparkDemo1</artifactId>
			<version>1.0-SNAPSHOT</version>
			<dependencies>
				<dependency>
					<groupId>org.apache.spark</groupId>
					<artifactId>spark-core_2.11</artifactId>
					<version>2.1.0</version>
				</dependency>
			</dependencies>
		</project>

        
    2.編寫scala檔案

import org.apache.spark.{SparkConf, SparkContext}

		
		object WordCountDemo {
			def main(args: Array[String]): Unit = {
				//建立Spark配置物件
				val conf = new SparkConf();
				conf.setAppName("WordCountSpark")
				//設定master屬性
				conf.setMaster("local") ;

				//通過conf建立sc
				val sc = new SparkContext(conf);

				//載入文字檔案
				val rdd1 = sc.textFile("d:/Test.txt");
				//壓扁
				val rdd2 = rdd1.flatMap(line => line.split(" ")) ;
				//對映w => (w,1)
				val rdd3 = rdd2.map((_,1))
				val rdd4 = rdd3.reduceByKey(_ + _)
				val r = rdd4.collect()
				r.foreach(println)
			}
		}

注意:檔案"d:/Test.txt"讀者可以自己建立,只要符合單詞之間用空格來分割即可