spark dataframe建立和操作
阿新 • • 發佈:2019-02-16
對spark進行操作
1獲取spark環境 JavaSparkContext getSparkContext
建立dataframe
//獲取spark上下文資訊
JavaSparkContext sc = this.getSparkContext();
if (sc == null) {
throw new TransStepException("spark env no run");
}
SQLContext sqlContext = new SQLContext(sc);
StructType structType = DataTypes.createStructType(structFieldList);
DataFrame newDF = sqlContext.createDataFrame(rdd內容, structType);
通過dataframe建立rdd
DataRows dataRows = getRows();
DataFrame df = ((SparkDataRows) dataRows).getDataFrame();
JavaRDD<Row> javaRDD = df.javaRDD();
通過dataframe獲取表頭
StructType st = df.schema();
structType轉化為dataframe
StructField[] structFields = st.fields();
建立表頭
List<StructField> structFields = new ArrayList<StructField>();
structFields.add(String name,資料型別type, 是否允許為空));// 預設字串型別
StructType structType = DataTypes.createStructType(structFieldList);