1. 程式人生 > >spark dataframe建立和操作

spark dataframe建立和操作

對spark進行操作

1獲取spark環境  JavaSparkContext getSparkContext

建立dataframe

        //獲取spark上下文資訊
        JavaSparkContext sc = this.getSparkContext();

        if (sc == null) {
            throw new TransStepException("spark env no run");
        }
        SQLContext sqlContext = new SQLContext(sc);
StructType structType = DataTypes.createStructType(structFieldList);
DataFrame newDF = sqlContext.createDataFrame(rdd內容, structType);

通過dataframe建立rdd

DataRows dataRows = getRows();
DataFrame df = ((SparkDataRows) dataRows).getDataFrame();
            JavaRDD<Row> javaRDD = df.javaRDD();

通過dataframe獲取表頭

StructType st = df.schema();

structType轉化為dataframe

StructField[] structFields = st.fields();

建立表頭

List<StructField>
structFields = new ArrayList<StructField>(); structFields.add(String name,資料型別type, 是否允許為空));// 預設字串型別 StructType structType = DataTypes.createStructType(structFieldList);