Greenplum常見建立表方式與說明

阿新 • • 發佈：2018-12-12

1 建立Heap表

drop table if exists test_head; create table test_head(id int primary key) distributed by (id);

distributed by 表示制定分佈鍵，便於segment儲存資料

2 建立AO表

2.1 AO表不壓縮

drop table if exists test_ao; create table test_ao(id int) with (appendonly=true) distributed by (id);

appendonly=true是表示AO(Append-optimized)儲存表的表示，引數為true和false，例如appendonly=true或appendonly=false

2.2 AO表壓縮

drop table if exists test_ao; create table test_ao(id int) with (appendonly=true, compresslevel=5) distributed by (id);

compresslevel是壓縮率，取值為1~9,一般選擇5就足夠了，值越高壓縮率越高

2.3 AO表列存壓縮與上表的壓縮方式不同

drop table if exists test_ao; create table test_ao(id int) with (appendonly=true,compresslevel=5, orientation=column) distributed by (id);

orientation是對列進行壓縮，寫法只有orientation=column

2.3.1 對orientation引數進行測試

2.3.1.1 建立表語句

建立不對列壓縮的表

CREATE TABLE **********_20180810( ******** ) WITH (appendonly=true, compresstype=zlib, compresslevel=5) DISTRIBUTED BY (pripid);

建立對列壓縮的表

CREATE TABLE ********_20180812( ********* ) WITH (appendonly=true, compresstype=zlib, compresslevel=5,orientation=column) DISTRIBUTED BY (pripid);

一共15個欄位

2.3.1.2 檢視資料的大小

$ du -sh *******_20180922.csv 48G ********_20180922.csv

2.3.1.3 使用COPY命令匯入資料

$ time psql -d stagging -h 192.****.11 -p 5432 -U gpadmin -c "\COPY *******_20180810 FROM '/data/oracle-export-data/DATA20180922/*******_20180922.csv' WITH csv DELIMITER E'\001' LOG ERRORS SEGMENT REJECT LIMIT 3000 ROWS" Password for user gpadmin:

real 11m49.978s user 1m17.379s sys 0m43.668s

time psql -d stagging -h 192.****.11 -p 5432 -U gpadmin -c "\COPY *******_20180812 FROM '/data/oracle-export-data/DATA20180922/*******_20180922.csv' WITH csv DELIMITER E'\001' LOG ERRORS SEGMENT REJECT LIMIT 3000 ROWS" Password for user gpadmin:

real 12m11.227s user 1m27.575s sys 0m50.548s

在以上結果中可以看出不對列壓縮用時11m49.978s，而對列壓縮的用時12m11.227s，相差23S

2.3.1.4 檢視在資料庫中佔用的大小

select pg_size_pretty(pg_relation_size('*******_20180810')); -- 14 GB

select pg_size_pretty(pg_relation_size('*******_20180812')); -- 11 GB

使用列壓縮竟然縮小了3G的空間，好恐怖，23S節省3G空間，值得擁有。

2.3.1.5 查看錶的行數

select count(*) from *******_20180810; -- 156784862

select count(*) from *******_20180812; -- 156784862

3 建立HDFS外表例項

3.1 建立外部表例項

CREATE EXTERNAL TABLE *******_20180812( ****************

) LOCATION ('gphdfs://nameservice1/tmp/******_20180812/******/*') format 'text' (delimiter E'\u0001' FILL MISSING FIELDS) LOG ERRORS SEGMENT REJECT LIMIT 3000 ROWS;

EXTERNAL外表需要新增關鍵字

nameservice1是HDFS的HA的地址，需要先配置好

tmp/******_20180812/*******/是HDFS上的路徑

delimiter分隔符是 E'\u0001'，也就是隱藏符SOH

LOG ERRORS SEGMENT REJECT說明吧錯誤資料放到GP預設的gp_read_error_log中

LIMIT 3000 ROWS 表示允許錯誤的最大的錯誤數，可以調大也可以調小，最小為1

3.2 檢視錯誤資料的例項

SELECT gp_read_error_log('tableName');

錯誤表字段解釋:

4 快速複製表

CREATE TABLE ********_20180814 WITH ( appendonly = TRUE, compresstype = zlib, compresslevel = 5, orientation = column ) AS SELECT * FROM **********_20180812 Distributed BY (pripid)

檢視執行的執行的時間

**************

FROM ********_20180812 Distributed BY (pripid)

時間: 69.977s

受影響的行: 1,5678,4862

可以看出用時 69.977s匯入1,5678,4862行的資料

Greenplum常見建立表方式與說明

1 建立Heap表

2 建立AO表

2.1 AO表不壓縮

2.2 AO表壓縮

2.3 AO表列存壓縮與上表的壓縮方式不同

2.3.1 對orientation引數進行測試

3 建立HDFS外表例項

3.1 建立外部表例項

3.2 檢視錯誤資料的例項

4 快速複製表

Greenplum常見建立表方式與說明

Java中常見圖形繪製方式與實現

5 常見建立TABLE方式

網路驗證常見的攻擊方式與防禦手段

QT（1）：空工程建立程式方式與文字建立程式方式的執行

asp.net 表單數據提交，常見方式與錯誤總結

MySql cmd下的學習筆記 —— 有關建立表的操作（有關與數據類型）

建立的程序的兩種方式與常用屬性

SQLite(二) - 建立表與新增資料

SQLServer 表值函式與標量值函式定義方式與呼叫區別

Hive建立表常見的命令

1. Hibernate通過實體類與hbm建立表

執行緒池的幾種常見的建立的方式

Java中執行緒建立的方式：繼承thread類與實現Runnable介面

Java多執行緒建立的三種方式與對比

HIVE的安裝配置、mysql的安裝、hive建立表、建立分割槽、修改表等內容、hive beeline使用、HIVE的四種資料匯入方式、使用Java程式碼執行hive的sql命令

“四種常見的 POST 提交資料方式”與之對應的“content-type取值”

js表單提交方式與表單事件

HBase 根據表名與列資訊與配置資訊建立表

create table ,create as 與create like三種建表方式的使用詳解

Greenplum常見建立表方式與說明

1 建立Heap表

2 建立AO表

2.1 AO表不壓縮

2.2 AO表壓縮

2.3 AO表列存壓縮 與上表的壓縮方式不同

2.3.1 對orientation引數進行測試

3 建立HDFS外表例項

3.1 建立外部表例項

3.2 檢視錯誤資料的例項

4 快速複製表

相關推薦

2.3 AO表列存壓縮與上表的壓縮方式不同