Spark英文單詞分析案例

阿新 • • 發佈：2018-12-18

1、有如下檔案testdata.txt（）

At a high level

every Spark application consists of a driver program that runs the user’s main function and executes various parallel operations on a cluster

The main abstraction Spark provides is a resilient distributed dataset (RDD)

which is a collection of elements partitioned across the nodes of the cluster that can be operated on in parallel

RDDs are created by starting with a file in the Hadoop file system (or any other Hadoop-supported file system)

or an existing Scala collection in the driver program

and transforming it

Users may also ask Spark to persist an RDD in memory

allowing it to be reused efficiently across parallel operations. Finally

RDDs automatically recover from node failures

要求完成如下功能

（1）篩選出包含Spark的行，並統計行數

（2）輸出包含單詞最多的那一行的單詞數

（3）統計資料中包含“a”的行數和包含“b”的行數

import org.apache.spark.{Accumulator, SparkConf, SparkContext}
import org.apache.spark.rdd.RDD

object work02 {
  def main(args: Array[String]): Unit = {
    val conf = new SparkConf().setAppName("localtion").setMaster("local[*]")
    val sc=new SparkContext(conf)
    //獲取資料
    val user1:RDD[String]=sc.textFile("E://aaa//testdata.txt",1)
    //1.統計Spark行數
    val result=user1.map((lines:String)=>if(lines.contains("Spark"))
      ("Spark",1) else (" ",0)).filter(_.equals(("Spark",1)))
    val result1=result.reduceByKey(_+_)
    //result1.foreach(println)
    //2.單詞最多的哪一行的單詞數
    var i:Int=0
    val result2=user1.map((lines:String)=>{
      val str=lines.split(" ").toList
      i=i+1
      (str,i)
    })
    val result3=result2.map(temp=>(temp._1.length,temp._2)).sortByKey(false).take(1)
    //result3.foreach(println)
  //3.統計資料中包含“a”的行數和包含“b”的行數
  val result4=user1.map((lines:String)=>if(lines.contains("a"))
    ("a",1) else (" ",0)).filter(_.equals(("a",1)))
    val result5=result4.reduceByKey(_+_)
    result5.foreach(println)

    val result6=user1.map((lines:String)=>if(lines.contains("b"))
      ("b",1) else (" ",0)).filter(_.equals(("b",1)))
    val result7=result6.reduceByKey(_+_)
    result7.foreach(println)
  }

}

Spark英文單詞分析案例

1、有如下檔案testdata.txt（） At a high level every Spark application consists of a driver program that runs the user’s main function and executes various

Spark SQL電影分析案例

用Spark SQL分析熱門電影的TopN 1.資料結構 users.dat 5220::M::25::7::91436 5221::F::56::1::96734 5222::M::25::12::94501 5223::M::56::10

大資料之storm（一） --- storm簡介，核心元件，工作流程，安裝和部署，電話通訊案例分析，叢集執行，單詞統計案例分析，調整併發度

一、storm簡介 --------------------------------------------------------- 1.開源，分散式，實時計算 2.實時可靠的處理無限資料流，可以使用任何語言開發 3.適用於實時分析，線上機器學習

大資料學習筆記(spark日誌分析案例)

前提：500w條記錄環境下（可以更多，視計算機效能而定），統計每天最熱門的top3板塊。 1、PV和UV 我們要統計的是最熱門的top3板塊，而熱門如果只是簡單地通過頁面瀏覽量（PV）或者使用者瀏覽量（UV）來決定都顯得比較片面，這裡我們綜合這兩者（0.3PV+

大資料Spark+Kafka實時資料分析案例

下面分析詳細分析下上述步驟：應用程式將購物日誌傳送給Kafka，topic為”sex”，因為這裡只是統計購物男女生人數，所以只需要傳送購物日誌中性別屬性即可。這裡採用模擬的方式傳送購物日誌，即讀取購物日誌資料，每間隔相同的時間傳送給Kafka。接著利用Spark Streaming從Kafka主題”s

Spark日誌分析案例

SparkCore日誌分析主程式 package com.ibeifeng.bigdata.spark.app.core import org.apache.spark.{SparkContext, SparkConf} /** * Created b

java parse 帶英文單詞的日期字符串轉 date （轉化新浪微博api返回的時間）

site ats 技術 cnblogs local 隨筆 html5 null 就會拂曉風起專註前端技術cocos2d、js、flash、html5，聯系：[email protected]/* */，請不吝推薦簡歷。博客園首頁

Unity3D 常用英文單詞

solid 力量 fixed blob 轉變英文單詞 transform 因數 eating Unity3D 常用英文單詞 Tutorial:輔導，輔助 pivot：中心點；樞軸 diffuse：擴散；四散 assets：資源 Camera:相機 icon:圖標

Spark Core Runtime分析: DAGScheduler, TaskScheduler, SchedulerBackend

進行 text actor 類型能夠 ext lang 運行匯報 Spark Runtime裏的主要層次分析，梳理Runtime組件和運行流程， DAGScheduler Job=多個stage，Stage=多個同種task, Task分為S

linux常用命令的英文單詞縮寫

change mount 磁盤 cups extension port 二進制 chan 設備 ls：list(列出目錄內容) cd：Change Directory（改變目錄） su:switch user 切換用戶 rpm:redhat package manager

英文單詞語句錄

edi out 編譯 runtime stat iat 聲明 enc true An exception is thrown at runtime 在運行時拋出異常。 The code runs with no output. 代碼無輸出運行。 Compilation fa

求一個程序，輸入一個整數，依次輸出它的每位數的英文單詞

整數 docs zip edas blank sdk mk4 fan tun 礁崖壕vntky聊勤和http://www.docin.com/zdb62317迸衛藏2vjx9懈中來http://tushu.docin.com/sina_6345203404撬乓帕9j36c療

Apache Spark大數據分析入門（一）

做的項目 persist fig shell命令 tutorial math 提高 welcom 摘要：Apache Spark的出現讓普通人也具備了大數據及實時數據分析能力。鑒於此，本文通過動手實戰操作演示帶領大家快速地入門學習Spark。本文是Apache Spark

python判斷一個單詞是否為有效的英文單詞？——三種方法

eas www. cal ges art etc code port href For (much) more power and flexibility, use a dedicated spellchecking library like PyEnchant. Ther

壓測過程中故障排查之一：高CPU占用問題分析案例

一段運行應用進行返回 sco close 找到 java 說明：一個應用占用CPU很高，除了確實是計算密集型應用之外，通常原因都是出現了死循環以我們最近出現的一個實際故障為例，介紹怎麽定位和解決這類問題。根據top命令，發現PID為28555的Java進程占

程序員眼中的英文單詞是這樣

轉儲 sts 客戶 ruby 前端mvc host 安裝 web acc 單詞普通人眼中開發者眼中 socket 插座套接字 performance 演出性能 ATM 自動取款機異步傳輸模式 me

spark jdk8 單詞統計示例

apache imp ace lang rtb use basis 寫法 work 在github上有spark-java8 實例地址： https://github.com/ypriverol/spark-java8 https://github.com/ihr/java

1.基礎發音（26個英文單詞）

post -s 單詞 size pan 英文基礎 nbsp spa 一.26個英文單詞發音 A　　[e?] B　　 [bi:] C　　 [si:] D　　 [di:] E　　 [i:] F　　[ef] G　　[d?i:] H　　[e?t?] I　　[a?] J　　[d?

css基礎 word-spacing 英文單詞的間距

效果 bsp 分享圖片且行且珍惜 doctype utf-8 視頻 logs lock 禮悟：　　公恒學思合行悟，尊師重道存感恩。葉見尋根三返一，江河湖海同一體。虛懷若谷良心主，願行無悔給最苦。讀書鍛煉養身心，誠勸且行且珍惜。　　

統計english article 的英文單詞

repl Coding () index open ctu body read utf # -*- coding: utf-8 -*-import string def extend_word(text): if text.find(‘\‘‘) > 0:

Spark英文單詞分析案例

相關推薦