通過spark sql建立HIVE的分割槽表
阿新 • • 發佈:2019-01-04
今天需要通過匯入文字中的資料到HIVE資料庫,而且因為預設該表的資料會比較大,所以採用分割槽表的設計方案。將表按地區和日期分割槽。在這個過程出現過一些BUG,記錄以便後期檢視。
spark.sql("use oracledb") spark.sql("CREATE TABLE IF NOT EXISTS " + tablename + " (OBUID STRING, BUS_ID STRING,REVTIME STRING,OBUTIME STRING,LONGITUDE STRING,LATITUDE STRING,\ GPSKEY STRING,DIRECTION STRING,SPEED STRING,RUNNING_NO STRING,DATA_SERIAL STRING,GPS_MILEAGE STRING,SATELLITE_COUNT STRING,ROUTE_CODE STRING,SERVICE STRING)z執行指令碼後出現以下錯誤:\ PARTITIONED BY(AREASTRING,OBUDATE STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' ") spark.sql("set hive.exec.dynamic.partition.mode = nonstrict") spark.sql("set hive.exec.dynamic.partition = true") # print("建立資料庫完成") if addoroverwrite: # 追加 spark.sql("INSERT INTO TABLE " + tablename + " PARTITION(AREA,OBUDATE) SELECT OBUID,BUS_ID, REVTIME, OBUTIME,LONGITUDE ,LATITUDE,GPSKEY,DIRECTION,SPEED,\ RUNNING_NO,DATA_SERIAL,GPS_MILEAGE, SATELLITE_COUNT ,ROUTE_CODE,SERVICE,'gz' AS AREA,SUBSTR(OBUTIME,1,10) AS OBUDATEFROM " + tablename + "_tmp")
Partition spec {area=, obudate=, AREA=gz, OBUDATE=2017-01-} contains non-partition columns;
經過度娘,有提到分割槽表中大小寫的BUG,於是修改指令碼,將分割槽欄位小寫,執行成功。修改後的指令碼:
spark.sql("use oracledb") spark.sql("CREATE TABLE IF NOT EXISTS " + tablename + " (OBUID STRING, BUS_ID STRING,REVTIME STRING,OBUTIME STRING,LONGITUDE STRING,LATITUDE STRING,\ GPSKEY STRING,DIRECTION STRING,SPEED STRING,RUNNING_NO STRING,DATA_SERIAL STRING,GPS_MILEAGE STRING,SATELLITE_COUNT STRING,ROUTE_CODE STRING,SERVICE STRING)\ PARTITIONED BY(area STRING,obudate STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' ") # 設定引數 # hive > set hive.exec.dynamic.partition.mode = nonstrict; # hive > set hive.exec.dynamic.partition = true; spark.sql("set hive.exec.dynamic.partition.mode = nonstrict") spark.sql("set hive.exec.dynamic.partition = true") # print("建立資料庫完成") if addoroverwrite: # 追加 spark.sql("INSERT INTO TABLE " + tablename + " PARTITION(area,obudate) SELECT OBUID,BUS_ID, REVTIME, OBUTIME,LONGITUDE ,LATITUDE,GPSKEY,DIRECTION,SPEED,\ RUNNING_NO,DATA_SERIAL,GPS_MILEAGE, SATELLITE_COUNT ,ROUTE_CODE,SERVICE,'gz' AS area ,SUBSTR(OBUTIME,1,10) AS obudate FROM " + tablename + "_tmp")