將java RDD結果寫入Hive表中

阿新 • • 發佈：2019-01-13

情況一：只需插入一列

JavaRDD<String> titleParticiple = ....;

/**

* 將分詞結果儲存到Hive表，供資料探查使用
* */
HiveContext hiveCtx = new HiveContext(jsc);
SQLContext sqlCtx = new SQLContext(jsc);
/**
* 在RDD的基礎上建立型別為Row的RDD，
*/
JavaRDD<Row> brandRDD = titleParticiple.map(new Function<String, Row>() {

private static final long serialVersionUID = 1L;

public Row call( String line )
throws Exception {
return RowFactory.create(line);
}
});
/**
*1、動態的構建DataFrame的元資料，一般而言，有多少列以及酶類的具體型別可能來源於JSON檔案或者資料庫
*/

List<StructField> structFields = new ArrayList<StructField>();
//structFields.add(DataTypes.createStructField( "id", DataTypes.IntegerType, true ));
structFields.add(DataTypes.createStructField( "brand", DataTypes.StringType, true ));
/**
* 2、構建StructType用於DataFrame 元資料的描述

*
*/
StructType structType = DataTypes.createStructType( structFields );
/**
* 3、基於MeataData以及RDD<Row>來構造DataFrame
*/
Dataset<Row> personsDF = sqlCtx.createDataFrame(brandRDD,structType);
/**
* 4、註冊成為臨時表以供後續的SQL查詢操作
*/
personsDF.registerTempTable("brands");
hiveCtx.sql("use sousuo"); //使用sousuo資料庫
hiveCtx.sql("drop table if exists sousuo.temp_yeqingyun_20170913");//刪除原來的表
hiveCtx.sql("CREATE TABLE IF NOT EXISTS sousuo.temp_yeqingyun_20170913 (brand STRING)");//建立表

hiveCtx.sql("insert into sousuo.temp_yeqingyun_20170913 select brand from brands");//將brands表中的內容全部拷貝到temp_yeqingyun_20170913表中

情況二：需要插入多列，且插入的型別有int和String:

JavaPairRDD<String, String> brandTypeGoodsPair = “...”;

HiveContext hiveCtx = new HiveContext(jsc);

SQLContext sqlCtx = new SQLContext(jsc);

JavaRDD<Row> brandRDD = brandTypeGoodsPair.map(new Function<Tuple2<String, String>, Row>() {
private static final long serialVersionUID = 1L;
int i=0;
public Row call( Tuple2<String, String> pair) throws Exception {
    i++;
    String[] valueArray = pair._2().split(":");
    String value0 = valueArray[0];
    int value1 = Integer.parseInt(valueArray[1]);
return RowFactory.create(i, pair._1, value0, value1);
}
});

List<StructField> structFields = new ArrayList<StructField>();
structFields.add(DataTypes.createStructField( "id", DataTypes.IntegerType, true ));
structFields.add(DataTypes.createStructField( "directory3", DataTypes.StringType, true ));
structFields.add(DataTypes.createStructField( "brandItemModel", DataTypes.StringType, true ));
structFields.add(DataTypes.createStructField( "num", DataTypes.IntegerType, true ));
StructType structType = DataTypes.createStructType( structFields );
Dataset<Row> brandDF = sqlCtx.createDataFrame(brandRDD,structType);
brandDF.registerTempTable("brands_test2");
hiveCtx.sql("use sousuo");
hiveCtx.sql("drop table if exists sousuo.temp_yeqingyun_test2_20170913");
hiveCtx.sql("CREATE TABLE IF NOT EXISTS sousuo.temp_yeqingyun_test2_20170913 (id INT, directory3 STRING, brandItemModel STRING, num INT)");
hiveCtx.sql("insert into sousuo.temp_yeqingyun_test2_20170913 select id,directory3,brandItemModel,num from brands_test2");

將java RDD結果寫入Hive表中

將java RDD結果寫入Hive表中

將hive模糊查詢結果寫入分割槽表中

Java實現把測試結果寫入Excel表中

Hive 實戰練習（一）—按照日期將每天的資料匯入Hive表中

使用spark將記憶體中的資料寫入到hive表中

將查詢結果插入到表中

利用sqoop指定列指定條件的方式將資料從mysql中增量匯入hive表中

使用shell將hdfs上的資料匯入到hive表中

python檔案讀寫（從file1中讀出資料並計算，然後將結果寫入到file2中）

用sqoop將mysql的資料匯入到hive表中，原理分析

用sqoop將mysql的資料匯入到hive表中

查詢MDB中高程點的高程值有0值的圖幅（用遊標遍歷查詢某個欄位的值），並將查到的結果寫入到TXT中

查找MDB中高程點的高程值有0值的圖幅（用遊標遍歷查找某個字段的值），並將查到的結果寫入到TXT中

JAVA類實現從hdfs匯入資料到hive表中

Spark將計算結果寫入到Mysql中

劍指Offer面試題15（Java版）：鏈表中倒數第K個結點

使用spark對hive表中的多列數據判重

flume的sink寫入hive表

hibernate使用setResultTransformer()將SQL查詢結果放入集合中

python接口測試-將運行結果寫入Excel表格

將java RDD結果寫入Hive表中

相關推薦