第70課:SparkSQL內建函式解密與實戰學習筆記
第70課:SparkSQL內建函式解密與實戰學習筆記
本期內容:
1 SparkSQL內建函式解析
2 SparkSQL內建函式實戰
SparkSQL的DataFrame引入了大量的內建函式,這些內建函式一般都有CG(CodeGeneration)功能,這樣的函式在編譯和執行時都會經過高度優化。
問題:SparkSQL操作Hive和Hive on spark一樣嗎?
=> 不一樣。SparkSQL操作Hive只是把Hive當作資料倉庫的來源,而計算引擎就是SparkSQL本身。Hive on spark是Hive的子專案,Hive on Spark的核心是把Hive的執行引擎換成
SparkSQL操作Hive上的資料叫Spark on Hive,而Hive on Spark依舊是以Hive為核心,只是把計算引擎由MapReduce替換為Spark。
Spark官網上DataFrame 的API Docs:
classDataFrame extends Queryable with Serializable
Experimental
A distributed collection of data organized into named columns.
A DataFrame is equivalent to a relational table in Spark SQL. The following example creates a DataFrame by pointing Spark SQL to a Parquet data set.
val people = sqlContext.read.parquet("...") // in Scala
DataFrame people = sqlContext.read().parquet("...") // in Java
Once created, it can be manipulated using the various domain-specific-language (DSL) functions defined in:
To select a column from the data frame, use apply method in Scala and col in Java.
val ageCol = people("age") // in Scala
Column ageCol = people.col("age") // in Java
Note that the Column type can also be manipulated through its various functions.
// The following creates a new column that increases everybody's age by 10.
people("age") + 10 // in Scala
people.col("age").plus(10); // in Java
A more concrete example in Scala:
// To create DataFrame using SQLContextval people = sqlContext.read.parquet("...")val department = sqlContext.read.parquet("...")
people.filter("age > 30")
.join(department, people("deptId") === department("id"))
.groupBy(department("name"), "gender")
.agg(avg(people("salary")), max(people("age")))
and in Java:
// To create DataFrame using SQLContext
DataFrame people = sqlContext.read().parquet("...");
DataFrame department = sqlContext.read().parquet("...");
people.filter("age".gt(30))
.join(department, people.col("deptId").equalTo(department("id")))
.groupBy(department.col("name"), "gender")
.agg(avg(people.col("salary")), max(people.col("age")));
以上內容中的join,groupBy,agg都是SparkSQL的內建函式。
SParkl1.5.x以後推出了很多內建函式,據不完全統計,有一百多個內建函式。
下面實戰開發一個聚合操作的例子:
package com.dt.spark.sql
import org.apache.spark.sql.types.{IntegerType, StringType, StructField, StructType}
import org.apache.spark.sql.{Row, SQLContext}
import org.apache.spark.sql.hive.HiveContext
import org.apache.spark.{SparkContext, SparkConf}
import org.apache.spark.sql.functions._
/**
* 使用Scala開發叢集執行的Spark WordCount程式
* @author DT大資料夢工廠
* 新浪微博:http://weibo.com/ilovepains/
* Created by hp on 2016/3/28.
*
* 使用Spark SQL中的內建函式對資料進行分析,Spark SQL API不同的是,DataFrame中的內建函式操作的結果是返回一個Column物件,而
* DataFrame天生就是"A distributed collection of data organized into named columns.",這就為資料的複雜分析建立了堅實的基礎
* 並提供了極大的方便性,例如說,我們在操作DataFrame的方法中可以隨時呼叫內建函式進行業務需要的處理,這之於我們構建附件的業務邏輯而言是可以
* 極大的減少不必須的時間消耗(基於上就是實際模型的對映),讓我們聚焦在資料分析上,這對於提高工程師的生產力而言是非常有價值的
* Spark 1.5.x開始提供了大量的內建函式,例如agg:
* def agg(aggExpr: (String, String), aggExprs: (String, String)*): DataFrame = {
* groupBy().agg(aggExpr, aggExprs : _*)
*}
* 還有max、mean、min、sum、avg、explode、size、sort_array、day、to_date、abs、acros、asin、atan
* 總體上而言內建函式包含了五大基本型別:
* 1,聚合函式,例如countDistinct、sumDistinct等;
* 2,集合函式,例如sort_array、explode等
* 3,日期、時間函式,例如hour、quarter、next_day
* 4,數學函式,例如asin、atan、sqrt、tan、round等;
* 5,開窗函式,例如rowNumber等
* 6,字串函式,concat、format_number、rexexp_extract
* 7,其它函式,isNaN、sha、randn、callUDF
*
*/
object SparkSQLAgg {
def main (args: Array[String]) {
/**
* 第1步:建立Spark的配置物件SparkConf,設定Spark程式的執行時的配置資訊,
* 例如說通過setMaster來設定程式要連結的Spark叢集的Master的URL,如果設定
* 為local,則代表Spark程式在本地執行,特別適合於機器配置條件非常差(例如
* 只有1G的記憶體)的初學者 *
*/
val conf = new SparkConf() //建立SparkConf物件
conf.setAppName("SparkSQLInnerFunctions") //設定應用程式的名稱,在程式執行的監控介面可以看到名稱
// conf.setMaster("spark://Master:7077") //此時,程式在Spark叢集
conf.setMaster("local")
/**
* 第2步:建立SparkContext物件
* SparkContext是Spark程式所有功能的唯一入口,無論是採用Scala、Java、Python、R等都必須有一個SparkContext
* SparkContext核心作用:初始化Spark應用程式執行所需要的核心元件,包括DAGScheduler、TaskScheduler、SchedulerBackend
* 同時還會負責Spark程式往Master註冊程式等
* SparkContext是整個Spark應用程式中最為至關重要的一個物件
*/
val sc = new SparkContext(conf) //建立SparkContext物件,通過傳入SparkConf例項來定製Spark執行的具體引數和配置資訊
val sqlContext = new SQLContext(sc) //構建SQL上下文
//要使用Spark SQL的內建函式,就一定要匯入SQLContext下的隱式轉換
import sqlContext.implicits._
/**
* 第三步:模擬電商訪問的資料,實際情況會比模擬資料複雜很多,最後生成RDD
*/
val userData = Array(
"2016-3-27,001,http://spark.apache.org/,1000",
"2016-3-27,001,http://hadoop.apache.org/,1001",
"2016-3-27,002,http://fink.apache.org/,1002",
"2016-3-28,003,http://kafka.apache.org/,1020",
"2016-3-28,004,http://spark.apache.org/,1010",
"2016-3-28,002,http://hive.apache.org/,1200",
"2016-3-28,001,http://parquet.apache.org/,1500",
"2016-3-28,001,http://spark.apache.org/,1800"
)
val userDataRDD = sc.parallelize(userData) //生成DD分散式集合物件
/**
* 第四步:根據業務需要對資料進行預處理生成DataFrame,要想把RDD轉換成DataFrame,需要先把RDD中的元素型別變成Row型別
* 於此同時要提供DataFrame中的Columns的元資料資訊描述
*/
val userDataRDDRow = userDataRDD.map(row => {val splited = row.split(",") ;Row(splited(0),splited(1).toInt,splited(2),splited(3).toInt)})
val structTypes = StructType(Array(
StructField("time", StringType, true),
StructField("id", IntegerType, true),
StructField("url", StringType, true),
StructField("amount", IntegerType, true)
))
val userDataDF = sqlContext.createDataFrame(userDataRDDRow,structTypes)
/**
* 第五步:使用Spark SQL提供的內建函式對DataFrame進行操作,特別注意:內建函式生成的Column物件且自定進行CG;
*
*
*/
userDataDF.groupBy("time").agg('time, countDistinct('id))
.map(row=>Row(row(1),row(2))).collect.foreach(println)
userDataDF.groupBy("time").agg('time, sum('amount)).show()
}
}
在Eclipse中執行如下:
16/04/10 23:54:04 INFO TaskSetManager: Finished task 58.0 in stage 6.0 (TID 461) in 18 ms on localhost (59/199)
16/04/10 23:54:04 INFO ShuffleBlockFetcherIterator: Getting 1 non-empty blocks out of 1 blocks
16/04/10 23:54:04 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
16/04/10 23:54:04 INFO Executor: Finished task 59.0 in stage 6.0 (TID 462). 1652 bytes result sent to driver
16/04/10 23:54:04 INFO TaskSetManager: Starting task 60.0 in stage 6.0 (TID 463, localhost, partition 61,NODE_LOCAL, 1999 bytes)
16/04/10 23:54:04 INFO Executor: Running task 60.0 in stage 6.0 (TID 463)
16/04/10 23:54:04 INFO TaskSetManager: Finished task 59.0 in stage 6.0 (TID 462) in 15 ms on localhost (60/199)
16/04/10 23:54:04 INFO ShuffleBlockFetcherIterator: Getting 1 non-empty blocks out of 1 blocks
16/04/10 23:54:04 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 2 ms
16/04/10 23:54:04 INFO Executor: Finished task 60.0 in stage 6.0 (TID 463). 1652 bytes result sent to driver
16/04/10 23:54:04 INFO TaskSetManager: Starting task 61.0 in stage 6.0 (TID 464, localhost, partition 62,NODE_LOCAL, 1999 bytes)
16/04/10 23:54:04 INFO TaskSetManager: Finished task 60.0 in stage 6.0 (TID 463) in 17 ms on localhost (61/199)
16/04/10 23:54:04 INFO Executor: Running task 61.0 in stage 6.0 (TID 464)
16/04/10 23:54:04 INFO ShuffleBlockFetcherIterator: Getting 1 non-empty blocks out of 1 blocks
16/04/10 23:54:04 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 1 ms
16/04/10 23:54:04 INFO Executor: Finished task 61.0 in stage 6.0 (TID 464). 1652 bytes result sent to driver
16/04/10 23:54:04 INFO TaskSetManager: Starting task 62.0 in stage 6.0 (TID 465, localhost, partition 63,NODE_LOCAL, 1999 bytes)
16/04/10 23:54:04 INFO Executor: Running task 62.0 in stage 6.0 (TID 465)
16/04/10 23:54:04 INFO TaskSetManager: Finished task 61.0 in stage 6.0 (TID 464) in 99 ms on localhost (62/199)
16/04/10 23:54:04 INFO ShuffleBlockFetcherIterator: Getting 1 non-empty blocks out of 1 blocks
16/04/10 23:54:04 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
16/04/10 23:54:04 INFO Executor: Finished task 62.0 in stage 6.0 (TID 465). 1652 bytes result sent to driver
16/04/10 23:54:04 INFO TaskSetManager: Starting task 63.0 in stage 6.0 (TID 466, localhost, partition 64,NODE_LOCAL, 1999 bytes)
16/04/10 23:54:04 INFO TaskSetManager: Finished task 62.0 in stage 6.0 (TID 465) in 18 ms on localhost (63/199)
16/04/10 23:54:04 INFO Executor: Running task 63.0 in stage 6.0 (TID 466)
16/04/10 23:54:04 INFO ShuffleBlockFetcherIterator: Getting 1 non-empty blocks out of 1 blocks
16/04/10 23:54:04 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
16/04/10 23:54:04 INFO Executor: Finished task 63.0 in stage 6.0 (TID 466). 1652 bytes result sent to driver
16/04/10 23:54:04 INFO TaskSetManager: Starting task 64.0 in stage 6.0 (TID 467, localhost, partition 65,NODE_LOCAL, 1999 bytes)
16/04/10 23:54:04 INFO TaskSetManager: Finished task 63.0 in stage 6.0 (TID 466) in 16 ms on localhost (64/199)
16/04/10 23:54:04 INFO Executor: Running task 64.0 in stage 6.0 (TID 467)
16/04/10 23:54:04 INFO ShuffleBlockFetcherIterator: Getting 1 non-empty blocks out of 1 blocks
16/04/10 23:54:04 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 1 ms
16/04/10 23:54:04 INFO Executor: Finished task 64.0 in stage 6.0 (TID 467). 1652 bytes result sent to driver
16/04/10 23:54:04 INFO TaskSetManager: Starting task 65.0 in stage 6.0 (TID 468, localhost, partition 66,NODE_LOCAL, 1999 bytes)
16/04/10 23:54:04 INFO Executor: Running task 65.0 in stage 6.0 (TID 468)
16/04/10 23:54:04 INFO TaskSetManager: Finished task 64.0 in stage 6.0 (TID 467) in 18 ms on localhost (65/199)
16/04/10 23:54:04 INFO ShuffleBlockFetcherIterator: Getting 1 non-empty blocks out of 1 blocks
16/04/10 23:54:04 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 1 ms
16/04/10 23:54:04 INFO Executor: Finished task 65.0 in stage 6.0 (TID 468). 1652 bytes result sent to driver
16/04/10 23:54:04 INFO TaskSetManager: Starting task 66.0 in stage 6.0 (TID 469, localhost, partition 67,NODE_LOCAL, 1999 bytes)
16/04/10 23:54:04 INFO TaskSetManager: Finished task 65.0 in stage 6.0 (TID 468) in 47 ms on localhost (66/199)
16/04/10 23:54:04 INFO Executor: Running task 66.0 in stage 6.0 (TID 469)
16/04/10 23:54:04 INFO ShuffleBlockFetcherIterator: Getting 1 non-empty blocks out of 1 blocks
16/04/10 23:54:04 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
16/04/10 23:54:04 INFO Executor: Finished task 66.0 in stage 6.0 (TID 469). 1652 bytes result sent to driver
16/04/10 23:54:04 INFO TaskSetManager: Starting task 67.0 in stage 6.0 (TID 470, localhost, partition 68,NODE_LOCAL, 1999 bytes)
16/04/10 23:54:04 INFO Executor: Running task 67.0 in stage 6.0 (TID 470)
16/04/10 23:54:04 INFO TaskSetManager: Finished task 66.0 in stage 6.0 (TID 469) in 17 ms on localhost (67/199)
16/04/10 23:54:04 INFO ShuffleBlockFetcherIterator: Getting 1 non-empty blocks out of 1 blocks
16/04/10 23:54:04 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
16/04/10 23:54:04 INFO Executor: Finished task 67.0 in stage 6.0 (TID 470). 1652 bytes result sent to driver
16/04/10 23:54:04 INFO TaskSetManager: Starting task 68.0 in stage 6.0 (TID 471, localhost, partition 69,NODE_LOCAL, 1999 bytes)
16/04/10 23:54:04 INFO Executor: Running task 68.0 in stage 6.0 (TID 471)
16/04/10 23:54:04 INFO TaskSetManager: Finished task 67.0 in stage 6.0 (TID 470) in 11 ms on localhost (68/199)
16/04/10 23:54:04 INFO ShuffleBlockFetcherIterator: Getting 1 non-empty blocks out of 1 blocks
16/04/10 23:54:04 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 1 ms
16/04/10 23:54:04 INFO Executor: Finished task 68.0 in stage 6.0 (TID 471). 1652 bytes result sent to driver
16/04/10 23:54:04 INFO TaskSetManager: Starting task 69.0 in stage 6.0 (TID 472, localhost, partition 70,NODE_LOCAL, 1999 bytes)
16/04/10 23:54:04 INFO Executor: Running task 69.0 in stage 6.0 (TID 472)
16/04/10 23:54:04 INFO TaskSetManager: Finished task 68.0 in stage 6.0 (TID 471) in 21 ms on localhost (69/199)
16/04/10 23:54:04 INFO ShuffleBlockFetcherIterator: Getting 1 non-empty blocks out of 1 blocks
16/04/10 23:54:04 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
16/04/10 23:54:04 INFO Executor: Finished task 69.0 in stage 6.0 (TID 472). 1652 bytes result sent to driver
16/04/10 23:54:04 INFO TaskSetManager: Starting task 70.0 in stage 6.0 (TID 473, localhost, partition 71,NODE_LOCAL, 1999 bytes)
16/04/10 23:54:04 INFO Executor: Running task 70.0 in stage 6.0 (TID 473)
16/04/10 23:54:04 INFO TaskSetManager: Finished task 69.0 in stage 6.0 (TID 472) in 15 ms on localhost (70/199)
16/04/10 23:54:04 INFO ShuffleBlockFetcherIterator: Getting 1 non-empty blocks out of 1 blocks
16/04/10 23:54:04 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 6 ms
16/04/10 23:54:04 INFO Executor: Finished task 70.0 in stage 6.0 (TID 473). 1652 bytes result sent to driver
16/04/10 23:54:04 INFO TaskSetManager: Starting task 71.0 in stage 6.0 (TID 474, localhost, partition 72,NODE_LOCAL, 1999 bytes)
16/04/10 23:54:04 INFO TaskSetManager: Finished task 70.0 in stage 6.0 (TID 473) in 50 ms on localhost (71/199)
16/04/10 23:54:04 INFO Executor: Running task 71.0 in stage 6.0 (TID 474)
16/04/10 23:54:04 INFO ShuffleBlockFetcherIterator: Getting 1 non-empty blocks out of 1 blocks
16/04/10 23:54:04 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
16/04/10 23:54:05 INFO Executor: Finished task 71.0 in stage 6.0 (TID 474). 1652 bytes result sent to driver
16/04/10 23:54:05 INFO TaskSetManager: Starting task 72.0 in stage 6.0 (TID 475, localhost, partition 73,NODE_LOCAL, 1999 bytes)
16/04/10 23:54:05 INFO TaskSetManager: Finished task 71.0 in stage 6.0 (TID 474) in 42 ms on localhost (72/199)
16/04/10 23:54:05 INFO Executor: Running task 72.0 in stage 6.0 (TID 475)
16/04/10 23:54:05 INFO ShuffleBlockFetcherIterator: Getting 1 non-empty blocks out of 1 blocks
16/04/10 23:54:05 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 1 ms
16/04/10 23:54:05 INFO Executor: Finished task 72.0 in stage 6.0 (TID 475). 1652 bytes result sent to driver
16/04/10 23:54:05 INFO TaskSetManager: Starting task 73.0 in stage 6.0 (TID 476, localhost, partition 74,NODE_LOCAL, 1999 bytes)
16/04/10 23:54:05 INFO TaskSetManager: Finished task 72.0 in stage 6.0 (TID 475) in 102 ms on localhost (73/199)
16/04/10 23:54:05 INFO Executor: Running task 73.0 in stage 6.0 (TID 476)
16/04/10 23:54:05 INFO ShuffleBlockFetcherIterator: Getting 1 non-empty blocks out of 1 blocks
16/04/10 23:54:05 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 1 ms
16/04/10 23:54:05 INFO Executor: Finished task 73.0 in stage 6.0 (TID 476). 1652 bytes result sent to driver
16/04/10 23:54:05 INFO TaskSetManager: Starting task 74.0 in stage 6.0 (TID 477, localhost, partition 75,NODE_LOCAL, 1999 bytes)
16/04/10 23:54:05 INFO TaskSetManager: Finished task 73.0 in stage 6.0 (TID 476) in 32 ms on localhost (74/199)
16/04/10 23:54:05 INFO Executor: Running task 74.0 in stage 6.0 (TID 477)
16/04/10 23:54:05 INFO ShuffleBlockFetcherIterator: Getting 1 non-empty blocks out of 1 blocks
16/04/10 23:54:05 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 1 ms
16/04/10 23:54:05 INFO Executor: Finished task 74.0 in stage 6.0 (TID 477). 1652 bytes result sent to driver
16/04/10 23:54:05 INFO TaskSetManager: Starting task 75.0 in stage 6.0 (TID 478, localhost, partition 76,NODE_LOCAL, 1999 bytes)
16/04/10 23:54:05 INFO TaskSetManager: Finished task 74.0 in stage 6.0 (TID 477) in 66 ms on localhost (75/199)
16/04/10 23:54:05 INFO Executor: Running task 75.0 in stage 6.0 (TID 478)
16/04/10 23:54:05 INFO ShuffleBlockFetcherIterator: Getting 1 non-empty blocks out of 1 blocks
16/04/10 23:54:05 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
16/04/10 23:54:05 INFO Executor: Finished task 75.0 in stage 6.0 (TID 478). 1652 bytes result sent to driver
16/04/10 23:54:05 INFO TaskSetManager: Starting task 76.0 in stage 6.0 (TID 479, localhost, partition 77,NODE_LOCAL, 1999 bytes)
16/04/10 23:54:05 INFO TaskSetManager: Finished task 75.0 in stage 6.0 (TID 478) in 56 ms on localhost (76/199)
16/04/10 23:54:05 INFO Executor: Running task 76.0 in stage 6.0 (TID 479)
16/04/10 23:54:05 INFO ShuffleBlockFetcherIterator: Getting 1 non-empty blocks out of 1 blocks
16/04/10 23:54:05 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
16/04/10 23:54:05 INFO Executor: Finished task 76.0 in stage 6.0 (TID 479). 1652 bytes result sent to driver
16/04/10 23:54:05 INFO TaskSetManager: Starting task 77.0 in stage 6.0 (TID 480, localhost, partition 78,NODE_LOCAL, 1999 bytes)
16/04/10 23:54:05 INFO TaskSetManager: Finished task 76.0 in stage 6.0 (TID 479) in 15 ms on localhost (77/199)
16/04/10 23:54:05 INFO Executor: Running task 77.0 in stage 6.0 (TID 480)
16/04/10 23:54:05 INFO ShuffleBlockFetcherIterator: Getting 1 non-empty blocks out of 1 blocks
16/04/10 23:54:05 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
16/04/10 23:54:05 INFO Executor: Finished task 77.0 in stage 6.0 (TID 480). 1652 bytes result sent to driver
16/04/10 23:54:05 INFO TaskSetManager: Starting task 78.0 in stage 6.0 (TID 481, localhost, partition 79,NODE_LOCAL, 1999 bytes)
16/04/10 23:54:05 INFO Executor: Running task 78.0 in stage 6.0 (TID 481)
16/04/10 23:54:05 INFO TaskSetManager: Finished task 77.0 in stage 6.0 (TID 480) in 15 ms on localhost (78/199)
16/04/10 23:54:05 INFO ShuffleBlockFetcherIterator: Getting 1 non-empty blocks out of 1 blocks
16/04/10 23:54:05 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 1 ms
16/04/10 23:54:05 INFO Executor: Finished task 78.0 in stage 6.0 (TID 481). 1652 bytes result sent to driver
16/04/10 23:54:05 INFO TaskSetManager: Starting task 79.0 in stage 6.0 (TID 482, localhost, partition 80,NODE_LOCAL, 1999 bytes)
16/04/10 23:54:05 INFO TaskSetManager: Finished task 78.0 in stage 6.0 (TID 481) in 54 ms on localhost (79/199)
16/04/10 23:54:05 INFO Executor: Running task 79.0 in stage 6.0 (TID 482)
16/04/10 23:54:05 INFO ShuffleBlockFetcherIterator: Getting 1 non-empty blocks out of 1 blocks
16/04/10 23:54:05 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
16/04/10 23:54:05 INFO Executor: Finished task 79.0 in stage 6.0 (TID 482). 1652 bytes result sent to driver
16/04/10 23:54:05 INFO TaskSetManager: Starting task 80.0 in stage 6.0 (TID 483, localhost, partition 81,NODE_LOCAL, 1999 bytes)
16/04/10 23:54:05 INFO Executor: Running task 80.0 in stage 6.0 (TID 483)
16/04/10 23:54:05 INFO TaskSetManager: Finished task 79.0 in stage 6.0 (TID 482) in 19 ms on localhost (80/199)
16/04/10 23:54:05 INFO ShuffleBlockFetcherIterator: Getting 1 non-empty blocks out of 1 blocks
16/04/10 23:54:05 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 1 ms
16/04/10 23:54:05 INFO Executor: Finished task 80.0 in stage 6.0 (TID 483). 1652 bytes result sent to driver
16/04/10 23:54:05 INFO TaskSetManager: Starting task 81.0 in stage 6.0 (TID 484, localhost, partition 82,NODE_LOCAL, 1999 bytes)
16/04/10 23:54:05 INFO Executor: Running task 81.0 in stage 6.0 (TID 484)
16/04/10 23:54:05 INFO TaskSetManager: Finished task 80.0 in stage 6.0 (TID 483) in 19 ms on localhost (81/199)
16/04/10 23:54:05 INFO ShuffleBlockFetcherIterator: Getting 1 non-empty blocks out of 1 blocks
16/04/10 23:54:05 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
16/04/10 23:54:05 INFO Executor: Finished task 81.0 in stage 6.0 (TID 484). 1652 bytes result sent to driver
16/04/10 23:54:05 INFO TaskSetManager: Starting task 82.0 in stage 6.0 (TID 485, localhost, partition 83,NODE_LOCAL, 1999 bytes)
16/04/10 23:54:05 INFO Executor: Running task 82.0 in stage 6.0 (TID 485)
16/04/10 23:54:05 INFO TaskSetManager: Finished task 81.0 in stage 6.0 (TID 484) in 14 ms on localhost (82/199)
16/04/10 23:54:05 INFO ShuffleBlockFetcherIterator: Getting 1 non-empty blocks out of 1 blocks
16/04/10 23:54:05 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 1 ms
16/04/10 23:54:05 INFO Executor: Finished task 82.0 in stage 6.0 (TID 485). 1652 bytes result sent to driver
16/04/10 23:54:05 INFO TaskSetManager: Starting task 83.0 in stage 6.0 (TID 486, localhost, partition 84,NODE_LOCAL, 1999 bytes)
16/04/10 23:54:05 INFO TaskSetManager: Finished task 82.0 in stage 6.0 (TID 485) in 79 ms on localhost (83/199)
16/04/10 23:54:05 INFO Executor: Running task 83.0 in stage 6.0 (TID 486)
16/04/10 23:54:05 INFO ShuffleBlockFetcherIterator: Getting 1 non-empty blocks out of 1 blocks
16/04/10 23:54:05 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 1 ms
16/04/10 23:54:05 INFO Executor: Finished task 83.0 in stage 6.0 (TID 486). 1652 bytes result sent to driver
16/04/10 23:54:05 INFO TaskSetManager: Starting task 84.0 in stage 6.0 (TID 487, localhost, partition 85,NODE_LOCAL, 1999 bytes)
16/04/10 23:54:05 INFO Executor: Running task 84.0 in stage 6.0 (TID 487)
16/04/10 23:54:05 INFO TaskSetManager: Finished task 83.0 in stage 6.0 (TID 486) in 31 ms on localhost (84/199)
16/04/10 23:54:05 INFO ShuffleBlockFetcherIterator: Getting 1 non-empty blocks out of 1 blocks
16/04/10 23:54:05 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
16/04/10 23:54:05 INFO Executor: Finished task 84.0 in stage 6.0 (TID 487). 1652 bytes result sent to driver
16/04/10 23:54:05 INFO TaskSetManager: Starting task 85.0 in stage 6.0 (TID 488, localhost, partition 86,NODE_LOCAL, 1999 bytes)
16/04/10 23:54:05 INFO TaskSetManager: Finished task 84.0 in stage 6.0 (TID 487) in 26 ms on localhost (85/199)
16/04/10 23:54:05 INFO Executor: Running task 85.0 in stage 6.0 (TID 488)
16/04/10 23:54:05 INFO ShuffleBlockFetcherIterator: Getting 1 non-empty blocks out of 1 blocks
16/04/10 23:54:05 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 1 ms
16/04/10 23:54:05 INFO Executor: Finished task 85.0 in stage 6.0 (TID 488). 1652 bytes result sent to driver
16/04/10 23:54:05 INFO TaskSetManager: Starting task 86.0 in stage 6.0 (TID 489, localhost, partition 87,NODE_LOCAL, 1999 bytes)
16/04/10 23:54:05 INFO Executor: Running task 86.0 in stage 6.0 (TID 489)
16/04/10 23:54:05 INFO TaskSetManager: Finished task 85.0 in stage 6.0 (TID 488) in 14 ms on localhost (86/199)
16/04/10 23:54:05 INFO ShuffleBlockFetcherIterator: Getting 1 non-empty blocks out of 1 blocks
16/04/10 23:54:05 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
16/04/10 23:54:05 INFO Executor: Finished task 86.0 in stage 6.0 (TID 489). 1652 bytes result sent to driver
16/04/10 23:54:05 INFO TaskSetManager: Starting task 87.0 in stage 6.0 (TID 490, localhost, partition 88,NODE_LOCAL, 1999 bytes)
16/04/10 23:54:05 INFO TaskSetManager: Finished task 86.0 in stage 6.0 (TID 489) in 48 ms on localhost (87/199)
16/04/10 23:54:05 INFO Executor: Running task 87.0 in stage 6.0 (TID 490)
16/04/10 23:54:05 INFO ShuffleBlockFetcherIterator: Getting 1 non-empty blocks out of 1 blocks
16/04/10 23:54:05 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
16/04/10 23:54:05 INFO Executor: Finished task 87.0 in stage 6.0 (TID 490). 1652 bytes result sent to driver
16/04/10 23:54:05 INFO TaskSetManager: Starting task 88.0 in stage 6.0 (TID 491, localhost, partition 89,NODE_LOCAL, 1999 bytes)
16/04/10 23:54:05 INFO Executor: Running task 88.0 in stage 6.0 (TID 491)
16/04/10 23:54:05 INFO TaskSetManager: Finished task 87.0 in stage 6.0 (TID 490) in 20 ms on localhost (88/199)
16/04/10 23:54:05 INFO ShuffleBlockFetcherIterator: Getting 1 non-empty blocks out of 1 blocks
16/04/10 23:54:05 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
16/04/10 23:54:05 INFO GenerateMutableProjection: Code generated in 136.381588 ms
16/04/10 23:54:05 INFO Executor: Finished task 88.0 in stage 6.0 (TID 491). 2032 bytes result sent to driver
16/04/10 23:54:05 INFO TaskSetManager: Starting task 89.0 in stage 6.0 (TID 492, localhost, partition 90,NODE_LOCAL, 1999 bytes)
16/04/10 23:54:05 INFO TaskSetManager: Finished task 88.0 in stage 6.0 (TID 491) in 308 ms on localhost (89/199)
16/04/10 23:54:05 INFO Executor: Running task 89.0 in stage 6.0 (TID 492)
16/04/10 23:54:05 INFO ShuffleBlockFetcherIterator: Getting 1 non-empty blocks out of 1 blocks
16/04/10 23:54:05 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 1 ms
16/04/10 23:54:05 INFO Executor: Finished task 89.0 in stage 6.0 (TID 492). 2032 bytes result sent to driver
16/04/10 23:54:05 INFO TaskSetManager: Starting task 90.0 in stage 6.0 (TID 493, localhost, partition 91,NODE_LOCAL, 1999 bytes)
16/04/10 23:54:05 INFO Executor: Running task 90.0 in stage 6.0 (TID 493)
16/04/10 23:54:05 INFO TaskSetManager: Finished task 89.0 in stage 6.0 (TID 492) in 45 ms on localhost (90/199)
16/04/10 23:54:05 INFO ShuffleBlockFetcherIterator: Getting 1 non-empty blocks out of 1 blocks
16/04/10 23:54:05 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 1 ms
16/04/10 23:54:05 INFO Executor: Finished task 90.0 in stage 6.0 (TID 493). 1652 bytes result sent to driver
16/04/10 23:54:05 INFO TaskSetManager: Starting task 91.0 in stage 6.0 (TID 494, localhost, partition 92,NODE_LOCAL, 1999 bytes)
16/04/10 23:54:05 INFO Executor: Running task 91.0 in stage 6.0 (TID 494)
16/04/10 23:54:05 INFO TaskSetManager: Finished task 90.0 in stage 6.0 (TID 493) in 24 ms on localhost (91/199)
16/04/10 23:54:05 INFO ShuffleBlockFetcherIterator: Getting 1 non-empty blocks out of 1 blocks
16/04/10 23:54:05 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
16/04/10 23:54:05 INFO Executor: Finished task 91.0 in stage 6.0 (TID 494). 1652 bytes result sent to driver
16/04/10 23:54:05 INFO TaskSetManager: Starting task 92.0 in stage 6.0 (TID 495, localhost, partition 93,NODE_LOCAL, 1999 bytes)
16/04/10 23:54:05 INFO TaskSetManager: Finished task 91.0 in stage 6.0 (TID 494) in 14 ms on localhost (92/199)
16/04/10 23:54:05 INFO Executor: Running task 92.0 in stage 6.0 (TID 495)
16/04/10 23:54:05 INFO ShuffleBlockFetcherIterator: Getting 1 non-empty blocks out of 1 blocks
16/04/10 23:54:05 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 1 ms
16/04/10 23:54:05 INFO Executor: Finished task 92.0 in stage 6.0 (TID 495). 1652 bytes result sent to driver
16/04/10 23:54:05 INFO TaskSetManager: Starting task 93.0 in stage 6.0 (TID 496, localhost, partition 94,NODE_LOCAL, 1999 bytes)
16/04/10 23:54:05 INFO TaskSetManager: Finished task 92.0 in stage 6.0 (TID 495) in 19 ms on localhost (93/199)
16/04/10 23:54:05 INFO Executor: Running task 93.0 in stage 6.0 (TID 496)
16/04/10 23:54:05 INFO ShuffleBlockFetcherIterator: Getting 1 non-empty blocks out of 1 blocks
16/04/10 23:54:05 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
16/04/10 23:54:05 INFO Executor: Finished task 93.0 in stage 6.0 (TID 496). 1652 bytes result sent to driver
16/04/10 23:54:06 INFO TaskSetManager: Starting task 94.0 in stage 6.0 (TID 497, localhost, partition 95,NODE_LOCAL, 1999 bytes)
16/04/10 23:54:06 INFO TaskSetManager: Finished task 93.0 in stage 6.0 (TID 496) in 28 ms on localhost (94/199)
16/04/10 23:54:06 INFO Executor: Running task 94.0 in stage 6.0 (TID 497)
16/04/10 23:54:06 INFO ShuffleBlockFetcherIterator: Getting 1 non-empty blocks out of 1 blocks
16/04/10 23:54:06 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
16/04/10 23:54:06 INFO Executor: Finished task 94.0 in stage 6.0 (TID 497). 1652 bytes result sent to driver
16/04/10 23:54:06 INFO TaskSetManager: Starting task 95.0 in stage 6.0 (TID 498, localhost, partition 96,NODE_LOCAL, 1999 bytes)
16/04/10 23:54:06 INFO Executor: Running task 95.0 in stage 6.0 (TID 498)
16/04/10 23:54:06 INFO TaskSetManager: Finished task 94.0 in stage 6.0 (TID 497) in 113 ms on localhost (95/199)
16/04/10 23:54:06 INFO ShuffleBlockFetcherIterator: Getting 1 non-empty blocks out of 1 blocks
16/04/10 23:54:06 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
16/04/10 23:54:06 INFO Executor: Finished task 95.0 in stage 6.0 (TID 498). 1652 bytes result sent to driver
16/04/10 23:54:06 INFO TaskSetManager: Starting task 96.0 in stage 6.0 (TID 499, localhost, partition 97,NODE_LOCAL, 1999 bytes)
16/04/10 23:54:06 INFO TaskSetManager: Finished task 95.0 in stage 6.0 (TID 498) in 42 ms on localhost (96/199)
16/04/10 23:54:06 INFO Executor: Running task 96.0 in stage 6.0 (TID 499)
16/04/10 23:54:06 INFO ShuffleBlockFetcherIterator: Getting 1 non-empty blocks out of 1 blocks
16/04/10 23:54:06 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
16/04/10 23:54:06 INFO Executor: Finished task 96.0 in stage 6.0 (TID 499). 1652 bytes result sent to driver
16/04/10 23:54:06 INFO TaskSetManager: Starting task 97.0 in stage 6.0 (TID 500, localhost, partition 98,NODE_LOCAL, 1999 bytes)
16/04/10 23:54:06 INFO Executor: Running task 97.0 in stage 6.0 (TID 500)
16/04/10 23:54:06 INFO TaskSetManager: Finished task 96.0 in stage 6.0 (TID 499) in 23 ms on localhost (97/199)
16/04/10 23:54:06 INFO ShuffleBlockFetcherIterator: Getting 1 non-empty blocks out of 1 blocks
16/04/10 23:54:06 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
16/04/10 23:54:06 INFO Executor: Finished task 97.0 in stage 6.0 (TID 500). 1652 bytes result sent to driver
16/04/10 23:54:06 INFO TaskSetManager: Starting task 98.0 in stage 6.0 (TID 501, localhost, partition 99,NODE_LOCAL, 1999 bytes)
16/04/10 23:54:06 INFO Executor: Running task 98.0 in stage 6.0 (TID 501)
16/04/10 23:54:06 INFO TaskSetManager: Finished task 97.0 in stage 6.0 (TID 500) in 14 ms on localhost (98/199)
16/04/10 23:54:06 INFO ShuffleBlockFetcherIterator: Getting 1 non-empty blocks out of 1 blocks
16/04/10 23:54:06 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
16/04/10 23:54:06 INFO Executor: Finished task 98.0 in stage 6.0 (TID 501). 1652 bytes result sent to driver
16/04/10 23:54:06 INFO TaskSetManager: Starting task 99.0 in stage 6.0 (TID 502, localhost, partition 100,NODE_LOCAL, 1999 bytes)
16/04/10 23:54:06 INFO Executor: Running task 99.0 in stage 6.0 (TID 502)
16/04/10 23:54:06 INFO TaskSetManager: Finished task 98.0 in stage 6.0 (TID 501) in 21 ms on localhost (99/199)
16/04/10 23:54:06 INFO ShuffleBlockFetcherIterator: Getting 1 non-empty blocks out of 1 blocks
16/04/10 23:54:06 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
16/04/10 23:54:06 INFO Executor: Finished task 99.0 in stage 6.0 (TID 502). 1652 bytes result sent to driver
16/04/10 23:54:06 INFO TaskSetManager: Starting task 100.0 in stage 6.0 (TID 503, localhost, partition 101,NODE_LOCAL, 1999 bytes)
16/04/10 23:54:06 INFO Executor: Running task 100.0 in stage 6.0 (TID 503)
16/04/10 23:54:06 INFO TaskSetManager: Finished task 99.0 in stage 6.0 (TID 502) in 11 ms on localhost (100/199)
16/04/10 23:54:06 INFO ShuffleBlockFetcherIterator: Getting 1 non-empty blocks out of 1 blocks
16/04/10 23:54:06 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
16/04/10 23:54:06 INFO Executor: Finished task 100.0 in stage 6.0 (TID 503). 1652 bytes result sent to driver
16/04/10 23:54:06 INFO TaskSetManager: Starting task 101.0 in stage 6.0 (TID 504, localhost, partition 102,NODE_LOCAL, 1999 bytes)
16/04/10 23:54:06 INFO Executor: Running task 101.0 in stage 6.0 (TID 504)
16/04/10 23:54:06 INFO TaskSetManager: Finished task 100.0 in stage 6.0 (TID 503) in 12 ms on localhost (101/199)
16/04/10 23:54:06 INFO ShuffleBlockFetcherIterator: Getting 1 non-empty blocks out of 1 blocks
16/04/10 23:54:06 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
16/04/10 23:54:06 INFO Executor: Finished task 101.0 in stage 6.0 (TID 504). 1652 bytes result sent to driver
16/04/10 23:54:06 INFO TaskSetManager: Starting task 102.0 in stage 6.0 (TID 505, localhost, partition 103,NODE_LOCAL, 1999 bytes)
16/04/10 23:54:06 INFO Executor: Running task 102.0 in stage 6.0 (TID 505)
16/04/10 23:54:06 INFO TaskSetManager: Finished task 101.0 in stage 6.0 (TID 504) in 10 ms on localhost (102/199)
16/04/10 23:54:06 INFO ShuffleBlockFetcherIterator: Getting 1 non-empty blocks out of 1 blocks
16/04/10 23:54:06 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 1 ms
16/04/10 23:54:06 INFO Executor: Finished task 102.0 in stage 6.0 (TID 505). 1652 bytes result sent to driver
16/04/10 23:54:06 INFO TaskSetManager: Starting task 103.0 in stage 6.0 (TID 506, localhost, partition 104,NODE_LOCAL, 1999 bytes)
16/04/10 23:54:06 INFO TaskSetManager: Finished task 102.0 in stage 6.0 (TID 505) in 42 ms on localhost (103/199)
16/04/10 23:54:06 INFO Executor: Running task 103.0 in stage 6.0 (TID 506)
16/04/10 23:54:06 INFO ShuffleBlockFetcherIterator: Getting 1 non-empty blocks out of 1 blocks
16/04/10 23:54:06 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
16/04/10 23:54:06 INFO Executor: Finished task 103.0 in stage 6.0 (TID 506). 1652 bytes result sent to driver
16/04/10 23:54:06 INFO TaskSetManager: Starting task 104.0 in stage 6.0 (TID 507, localhost, partition 105,NODE_LOCAL, 1999 bytes)
16/04/10 23:54:06 INFO Executor: Running task 104.0 in stage 6.0 (TID 507)
16/04/10 23:54:06 INFO TaskSetManager: Finished task 103.0 in stage 6.0 (TID 506) in 19 ms on localhost (104/199)
16/04/10 23:54:06 INFO ShuffleBlockFetcherIterator: Getting 1 non-empty blocks out of 1 blocks
16/04/10 23:54:06 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
16/04/10 23:54:06 INFO Executor: Finished task 104.0 in stage 6.0 (TID 507). 1652 bytes result sent to driver
16/04/10 23:54:06 INFO TaskSetManager: Starting task 105.0 in stage 6.0 (TID 508, localhost, partition 106,NODE_LOCAL, 1999 bytes)
16/04/10 23:54:06 INF