Spark RDD轉換為DataFrame
阿新 • • 發佈:2017-12-07
person true line ted struct ger fields text san
#構造case class,利用反射機制隱式轉換 scala> import spark.implicits._ scala> val rdd= sc.textFile("input/textdata.txt") scala> case class Person(id:Int,name:String) scala> val df = rdd.map(_.split(",")).map(x=>Person(x(0).toInt,x(1))).toDF scala> df.show +---+--------+ | id| name| +---+--------+ | 1|zhangsan| | 2| lisi| | 3| wangwu| | 4| zhaoliu| +---+--------+ #通過schema,Row構造dataframe scala> import org.apache.spark.sql.types._ scala> import org.apache.spark.sql.Row scala> val structFields = Array(StructField("id",IntegerType,true),StructField("name",StringType,true)) scala> val structType = StructType(structFields) #創建schema結構 scala> val lines= sc.textFile("input/textdata.txt") scala> val rdd = lines.map(_.split(",")).map(x=>Row(x(0).toInt,x(1))) #創建RDD[Row] scala> val df = spark.createDataFrame(rdd,structType) #通過RDD[Row],schema構建DataFrame scala> df.show +---+--------+ | id| name| +---+--------+ | 1|zhangsan| | 2| lisi| | 3| wangwu| | 4| zhaoliu| +---+--------+
cat textdata.txt 1,zhangsan 2,lisi 3,wangwu 4,zhaoliu
Spark RDD轉換為DataFrame