Spark修煉之道（進階篇）——Spark入門到精通：第十三節 Spark Streaming—— Spark SQL、DataFrame與Spark Streaming

阿新 • • 發佈：2018-12-25

主要內容

Spark SQL、DataFrame與Spark Streaming

1. Spark SQL、DataFrame與Spark Streaming

import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import org.apache.spark.rdd.RDD
import org.apache.spark.streaming.{Time, Seconds, StreamingContext}
import org.apache.spark.util.IntParam
import 
 org.apache.spark.sql.SQLContext
import org.apache.spark.storage.StorageLevel

object SqlNetworkWordCount {
  def main(args: Array[String]) {
    if (args.length < 2) {
      System.err.println("Usage: NetworkWordCount <hostname> <port>")
      System.exit(1)
    }

    StreamingExamples.setStreamingLogLevels()

    // Create the context with a 2 second batch size 

    val sparkConf = new SparkConf().setAppName("SqlNetworkWordCount").setMaster("local[4]")
    val ssc = new StreamingContext(sparkConf, Seconds(2))

    // Create a socket stream on target ip:port and count the
    // words in input stream of \n delimited text (eg. generated by 'nc')
    // Note that no duplication in storage level only for running locally. 

    // Replication necessary in distributed scenario for fault tolerance.
    //Socke作為資料來源
    val lines = ssc.socketTextStream(args(0), args(1).toInt, StorageLevel.MEMORY_AND_DISK_SER)
    //words DStream
    val words = lines.flatMap(_.split(" "))

    // Convert RDDs of the words DStream to DataFrame and run SQL query
    //呼叫foreachRDD方法，遍歷DStream中的RDD
    words.foreachRDD((rdd: RDD[String], time: Time) => {
      // Get the singleton instance of SQLContext
      val sqlContext = SQLContextSingleton.getInstance(rdd.sparkContext)
      import sqlContext.implicits._

      // Convert RDD[String] to RDD[case class] to DataFrame
      val wordsDataFrame = rdd.map(w => Record(w)).toDF()

      // Register as table
      wordsDataFrame.registerTempTable("words")

      // Do word count on table using SQL and print it
      val wordCountsDataFrame =
        sqlContext.sql("select word, count(*) as total from words group by word")
      println(s"========= $time =========")
      wordCountsDataFrame.show()
    })

    ssc.start()
    ssc.awaitTermination()
  }
}


/** Case class for converting RDD to DataFrame */
case class Record(word: String)


/** Lazily instantiated singleton instance of SQLContext */
object SQLContextSingleton {

  @transient  private var instance: SQLContext = _

  def getInstance(sparkContext: SparkContext): SQLContext = {
    if (instance == null) {
      instance = new SQLContext(sparkContext)
    }
    instance
  }
}

執行程式後，再執行下列命令

[email protected]:~# nc -lk 9999
Spark is a fast and general cluster computing system for Big Data
Spark is a fast and general cluster computing system for Big Data
Spark is a fast and general cluster computing system for Big Data
Spark is a fast and general cluster computing system for Big Data
Spark is a fast and general cluster computing system for Big Data
Spark is a fast and general cluster computing system for Big Data
Spark is a fast and general cluster computing system for Big Data

處理結果：


========= 1448783840000 ms =========
+---------+-----+
|     word|total|
+---------+-----+
|    Spark|   12|
|   system|   12|
|  general|   12|
|     fast|   12|
|      and|   12|
|computing|   12|
|        a|   12|
|       is|   12|
|      for|   12|
|      Big|   12|
|  cluster|   12|
|     Data|   12|
+---------+-----+

========= 1448783842000 ms =========
+----+-----+
|word|total|
+----+-----+
+----+-----+

========= 1448783844000 ms =========
+----+-----+
|word|total|
+----+-----+
+----+-----+

Spark修煉之道（進階篇）——Spark入門到精通：第一節 Spark 1.5.0叢集搭建

作者：周志湖網名：搖擺少年夢微訊號：zhouzhihubeyond 本節主要內容作業系統環境準備 Hadoop 2.4.1叢集搭建 Spark 1.5.0 叢集部署注：在利用CentOS 6.5作業系統安裝spark 1.5叢集過程中，

Spark修煉之道（進階篇）——Spark入門到精通：第十四節 Spark Streaming 快取、Checkpoint機制

作者：周志湖微訊號：zhouzhihubeyond 主要內容 Spark Stream 快取 Checkpoint 案例 1. Spark Stream 快取通過前面一系列的課程介紹，我們知道DStream是由一系列的RDD構成的，

Spark修煉之道（進階篇）——Spark入門到精通：第十六節 Spark Streaming與Kafka

作者：周志湖主要內容 Spark Streaming與Kafka版的WordCount示例（一） Spark Streaming與Kafka版的WordCount示例（二） 1. Spark Streaming與Kafka版本的WordCount示例

Spark修煉之道（進階篇）——Spark入門到精通：第十節 Spark SQL案例實戰（一）

作者：周志湖放假了，終於能抽出時間更新部落格了……. 1. 獲取資料本文通過將github上的Spark專案git日誌作為資料，對SparkSQL的內容進行詳細介紹資料獲取命令如下： [[email protected] spa

Spark修煉之道（進階篇）——Spark入門到精通：第十三節 Spark Streaming—— Spark SQL、DataFrame與Spark Streaming

主要內容 Spark SQL、DataFrame與Spark Streaming 1. Spark SQL、DataFrame與Spark Streaming import org.apache.spark.SparkConf import org

Spark修煉之道（進階篇）——Spark入門到精通：第十五節 Kafka 0.8.2.1 叢集搭建

作者：周志湖微訊號：zhouzhihubeyond 本節為下一節Kafka與Spark Streaming做鋪墊主要內容 1.kafka 叢集搭建 1. kafka 叢集搭建 kafka 安裝與配置 tar -zxvf kafka_2

Spark修煉之道（進階篇）——Spark入門到精通：第九節 Spark SQL執行流程解析

1.整體執行流程使用下列程式碼對SparkSQL流程進行分析，讓大家明白LogicalPlan的幾種狀態，理解SparkSQL整體執行流程 // sc is an existing SparkContext. val sqlContext = new or

Spark修煉之道（進階篇）——Spark入門到精通：第六節 Spark程式設計模型（三)

作者：周志湖網名：搖擺少年夢微訊號：zhouzhihubeyond 本節主要內容 RDD transformation（續) RDD actions 1. RDD transformation（續) （1）repartitionAnd

Spark修煉之道（進階篇）——Spark入門到精通：第十節 Spark Streaming（一)

本節主要內容 Spark流式計算簡介 Spark Streaming相關核心類入門案例 1. Spark流式計算簡介 Hadoop的MapReduce及Spark SQL等只能進行離線計算，無法滿足實時性要求較高的業務需求，例如實時推薦、實時

Android開發之GreenDao（進階篇）

1、資料庫升級原理：建立臨時表-->刪除原表-->建立新表-->複製臨時表資料到新表並刪除臨時表；這樣就實現資料庫表的更新了新建一個數據庫更新輔助類 MigrationHelper public class MigrationHelper {

Spark修煉之道（高階篇）——Spark原始碼閱讀：第十三節 Spark SQL之SQLContext（一)

作者：周志湖 1. SQLContext的建立 SQLContext是Spark SQL進行結構化資料處理的入口，可以通過它進行DataFrame的建立及SQL的執行，其建立方式如下： //sc為SparkContext val sqlContext

Spark修煉之道（高階篇）——Spark原始碼閱讀：第十二節 Spark SQL 處理流程分析

作者：周志湖下面的程式碼演示了通過Case Class進行表Schema定義的例子： // sc is an existing SparkContext. val sqlContext = new org.apache.spark.sql.SQLConte

Spark修煉之道（高階篇）——Spark原始碼閱讀：第八節 Task執行

Task執行在上一節中，我們提到在Driver端CoarseGrainedSchedulerBackend中的launchTasks方法向Worker節點中的Executor傳送啟動任務命令，該命令的接收者是CoarseGrainedExecutorBack

Spark修煉之道（基礎篇）——Linux大資料開發基礎：第十三節：Shell程式設計入門（五)

本節主要內容 while expression do command command done （1）計數器格式適用於迴圈次數已知或固定時 root@sparkslave02:~/ShellLearning/Chapter13# vim w

Spark修煉之道（高階篇）——Spark原始碼閱讀：第一節 Spark應用程式提交流程

作者：搖擺少年夢微訊號： zhouzhihubeyond spark-submit 指令碼應用程式提交流程在執行Spar應用程式時，會將spark應用程式打包後使用spark-submit指令碼提交到Spark中執行，執行提交命令如下： root@s

無業務不伸縮之二，雲監控搭配SLB及ESS（進階篇）

雲端計算ESS彈性伸縮課程無業務不伸縮之二，雲監控搭配SLB及ESS（進階篇）連載雲端計算文章主題後續的連載如下1、無業務不伸縮之一，雲端計算有ESS2、無業務不伸縮之二，雲監控搭配SLB及ESS3、無互動不加速，雲端計算有CDN4、無對像不儲存，雲端計算有OSS5、無檔案不儲存，雲端計算有”

史上最簡單MySQL教程詳解（進階篇）之儲存引擎介紹及預設引擎設定

什麼是儲存引擎？與其他資料庫例如Oracle 和SQL Server等資料庫中只有一種儲存引擎不同的是，MySQL有一個被稱為“Pluggable Storage Engine Architecture”(可替換儲存引擎架構)的特性，也就意味著My

hadoop之mapreduce詳解（進階篇）

上篇文章hadoop之mapreduce詳解（基礎篇）我們瞭解了mapreduce的執行過程和shuffle過程，本篇文章主要從mapreduce的元件和輸入輸出方面進行闡述。一、mapreduce作業控制模組以及其他功能 mapreduce包括作業控制模組，程式設計模型，資料處理引擎。這裡我們重點闡述

Angular實戰之使用NG-ZORRO建立一個企業級中後臺框架（進階篇）

前言：　　上一篇文章我們講了如何在建立的Angular專案中快速引入ng-zorro-antd企業中臺元件庫，並且快速構建後臺管理頁面框架模板。這一章主要介紹的是如何在建立好的後臺管理頁面框架的快速生成NG-ZORRO相關的元件，並且介紹Angular相關目錄結構、生命週期函式，路由配置和使用相關知識點，以

Mysql 入門，增刪改查（進階篇）

bsp com pre sco height name 數據 mysql from 主要已以下兩個表students與students_score，進行數據的增刪改查操作！ 1、SELECT 1）select id,tel from students

Spark修煉之道（進階篇）——Spark入門到精通：第十三節 Spark Streaming—— Spark SQL、DataFrame與Spark Streaming

主要內容

1. Spark SQL、DataFrame與Spark Streaming

相關推薦