clickhouse輸入輸出格式之ORC
阿新 • • 發佈:2021-03-20
ORC資料的輸入輸出
僅支援ORC格式的寫入。
ORC和CH資料型別的匹配關係
ORC data type (INSERT) | ClickHouse data type |
---|---|
UINT8, BOOL | UInt8 |
INT8 | Int8 |
UINT16 | UInt16 |
INT16 | Int16 |
UINT32 | UInt32 |
INT32 | Int32 |
UINT64 | UInt64 |
INT64 | Int64 |
FLOAT, HALF_FLOAT | Float32 |
DOUBLE | Float64 |
DATE32 | Date |
DATE64, TIMESTAMP | DateTime |
STRING, BINARY | String |
DECIMAL | Decimal |
備註:
- 不支援的ORC資料型別:DATE32, TIME32, FIXED_SIZE_BINARY, JSON, UUID, ENUM。
- ClickHouse表的列名必須與ORC表的列名一致。
使用Spark生成ORC檔案
val list = List(
("113.248.234.232", "123.212.22.01", "2018-07-12 14:35:31"),
("115.248.158.231", "154.245.56.23", "2020-07-12 13:26:26"),
("115.248.158.231", "154.245.56.23", "2020-07-12 13:22:13" ),
("187.248.135.230", "221.228.112.45", "2019-08-09 13:17:39"),
("187.248.234.232", "221.228.112.24", "2019-08-09 20:51:16"),
("115.248.158.231", "154.245.56.23", "2020-07-12 17:22:56")
)
val rdd = sc.makeRDD(list)
import spark.implicits._
val df = rdd.toDF("srcip", "destip", "time")
df.repartition(1).write.format("orc").mode("append").save("/tmp/orc")
建立測試表
create table orc_demo (srcip String, destip String, time DateTime) ENGINE=TinyLog;
資料匯入
cat file.orc | clickhouse-client --query="INSERT INTO test.orc_demo FORMAT ORC"
查詢結果
select * from orc_demo