1. 程式人生 > 實用技巧 >3. scala-spark wordCount 案例

3. scala-spark wordCount 案例

1. 建立maven 工程

2. 相關依賴和外掛

<dependencies>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-core_2.11</artifactId>
            <version>2.1.1</version>
        </dependency>
 </dependencies>
    <build>
        <finalName>wordCount</finalName>
        <plugins>
            <plugin>
                <groupId>net.alchim31.maven</groupId>
                <artifactId>scala-maven-plugin</artifactId>
                <version>4.2.0</version>
<executions> <execution> <goals> <goal>compile</goal> <goal>testCompile</goal> </goals> </execution> </executions> </plugin> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-assembly-plugin</artifactId> <version>3.1.0</version>
<configuration> <archive> <manifest> <mainClass>wordCount</mainClass> </manifest> </archive> <descriptorRefs> <descriptorRef>jar-with-dependencies</descriptorRef> </descriptorRefs> </configuration> <executions> <execution> <id>make-assembly</id> <phase>package
</phase> <goals> <goal>single</goal> </goals> </execution> </executions> </plugin> </plugins> </build>

3. wordCount 案例

package com.atgu.bigdata.spark
import org.apache.spark._
import org.apache.spark.rdd.RDD
object wordCount extends App {
  // local模式
  // 1.建立sparkConf 物件
   val conf: SparkConf = new SparkConf().setMaster("local[*]").setAppName("wordCount")
  // 2. 建立spark 上下文物件
  val sc:SparkContext=new SparkContext(config = conf)
  // 3. 讀取檔案
 val lines: RDD[String] = sc.textFile("file:///opt/data/1.txt")
  // 4. 切割單詞
  val words: RDD[String] = lines.flatMap(_.split(" "))
//  words.collect().foreach(println)
  // map
  private val keycounts: RDD[(String, Int)] = words.map((_, 1))
  //
  private val results: RDD[(String, Int)] = keycounts.reduceByKey(_ + _)
  private val res: Array[(String, Int)] = results.collect
  res.foreach(println)

}

4. 專案目錄結構