spark rdd 和 DF 轉換
阿新 • • 發佈:2019-01-08
RDD -》 DF
有兩種方式
一、
一、Inferring the Schema Using Reflection
將 RDD[t] 轉為一個 object ,然後 to df
val peopleDF = spark.sparkContext .textFile("examples/src/main/resources/people.txt") .map(_.split(",")) .map(attributes => Person(attributes(0), attributes(1).trim.toInt)) .toDF()
rdd 也能直接裝 DATASet 要 import 隱式裝換 類 import spark.implicits._
如果 轉換的物件為 tuple . 轉換後 下標為 _1 _2 .....
二、Programmatically Specifying the Schema
把 columnt meta 和 rdd createDataFrame 在一起
val peopleRDD = spark.sparkContext.textFile("examples/src/main/resources/people.txt") // The schema is encoded in a string val schemaString = "name age" // Generate the schema based on the string of schema val fields = schemaString.split(" ") .map(fieldName => StructField(fieldName, StringType, nullable = true)) val schema = StructType(fields)
val rowRDD = peopleRDD .map(_.split(",")) .map(attributes => Row(attributes(0), attributes(1).trim)) // Apply the schema to the RDD val peopleDF = spark.createDataFrame(rowRDD, schema) // Creates a temporary view using the DataFrame peopleDF.createOrReplaceTempView("people")
DF to RDd
val tt = teenagersDF.rdd