citus 之三 reference table
阿新 • • 發佈:2018-11-14
os: ubuntu 16.04
postgresql: 9.6.8
citus: postgresql-9.6-citus 8.0.0
安裝結束,下一篇blog介紹下如何建立表。
citus 有兩種表:
- distributed table:分片表,rows會分佈在 worker節點中。主要用於大量資料的事實表。
- reference table:廣播表,每個 worker 節點都儲存一模一樣的資料。主要用於維度表。
登入 coordinator 建立廣播表
$ psql -h 192.168.0.92 -U cituser citusdb citusdb=# create table ref_t0(c0 varchar(100),c1 varchar(100)); CREATE TABLE citusdb=# create table ref_t1(c0 varchar(100),c1 varchar(100)); CREATE TABLE citusdb=# select create_reference_table('ref_t0'); create_reference_table ------------------------ (1 row) Time: 664.340 ms citusdb=# select create_reference_table('ref_t1'); create_reference_table ------------------------ (1 row) Time: 211.499 ms citusdb=# \d+ List of relations Schema | Name | Type | Owner | Size | Description --------+--------+-------+---------+---------+------------- public | ref_t0 | table | cituser | 0 bytes | public | ref_t1 | table | cituser | 0 bytes |
pgsql2 節點上檢視
citusdb=# \d+ List of relations Schema | Name | Type | Owner | Size | Description --------+---------------+-------+---------+---------+------------- public | ref_t0_102072 | table | cituser | 0 bytes | public | ref_t1_102073 | table | cituser | 0 bytes |
pgsql3 節點上檢視
citusdb=# \d+ List of relations Schema | Name | Type | Owner | Size | Description --------+---------------+-------+---------+---------+------------- public | ref_t0_102072 | table | cituser | 0 bytes | public | ref_t1_102073 | table | cituser | 0 bytes |
插入資料
citusdb=# insert into ref_t0(c0,c1)
select md5(md5((id)::varchar)),md5((id)::varchar) from generate_series(1,2000000) as id;
INSERT 0 2000000
citusdb=# insert into ref_t1(c0,c1)
select md5(md5((id)::varchar)),md5((id)::varchar) from generate_series(1,1000000) as id;
pgsql2 節點上檢視
citusdb=# \d+
List of relations
Schema | Name | Type | Owner | Size | Description
--------+---------------+-------+---------+---------+-------------
public | ref_t0_102072 | table | cituser | 193 MB |
public | ref_t1_102073 | table | cituser | 97 MB |
pgsql3 節點上檢視
citusdb=# \d+
List of relations
Schema | Name | Type | Owner | Size | Description
--------+---------------+-------+---------+---------+-------------
public | ref_t0_102072 | table | cituser | 193 MB |
public | ref_t1_102073 | table | cituser | 97 MB |
可以看到,在 worker 節點 pgsql2、pgsql3 上的資料是一模一樣。
coordinator 節點 pgsql1 是不儲存任何資料的。如下:
citusdb=# \d+
List of relations
Schema | Name | Type | Owner | Size | Description
--------+--------+-------+---------+---------+-------------
public | ref_t0 | table | cituser | 0 bytes |
public | ref_t1 | table | cituser | 0 bytes |
執行計劃
citusdb=# explain verbose select count(1) from ref_t0;
QUERY PLAN
-------------------------------------------------------------------------------------------------------
Custom Scan (Citus Router) (cost=0.00..0.00 rows=0 width=0)
Output: remote_scan.count
Task Count: 1
Tasks Shown: All
-> Task
Node: host=192.168.0.90 port=5432 dbname=citusdb
-> Aggregate (cost=49692.00..49692.01 rows=1 width=8)
Output: count(1)
-> Seq Scan on public.ref_t0_102072 ref_t0 (cost=0.00..44692.00 rows=2000000 width=0)
Output: c0, c1
(10 rows)
Time: 15.737 ms
多表join
citusdb=#
citusdb=# select count(1) from ref_t0 t0,ref_t1 t1 where t0.c0=t1.c0;
count
---------
1000000
(1 row)
citusdb=# explain verbose select count(1) from ref_t0 t0,ref_t1 t1 where t0.c0=t1.c0;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------
Custom Scan (Citus Router) (cost=0.00..0.00 rows=0 width=0)
Output: remote_scan.count
Task Count: 1
Tasks Shown: All
-> Task
Node: host=192.168.0.90 port=5432 dbname=citusdb
-> Aggregate (cost=146414.00..146414.01 rows=1 width=8)
Output: count(1)
-> Hash Join (cost=42659.00..143914.00 rows=1000000 width=0)
Hash Cond: ((t0.c0)::text = (t1.c0)::text)
-> Seq Scan on public.ref_t0_102072 t0 (cost=0.00..44692.00 rows=2000000 width=33)
Output: t0.c0
-> Hash (cost=22346.00..22346.00 rows=1000000 width=33)
Output: t1.c0
-> Seq Scan on public.ref_t1_102073 t1 (cost=0.00..22346.00 rows=1000000 width=33)
Output: t1.c0
(16 rows)
Time: 393.115 ms
對 distributed table 和 reference table 做了一個簡單的對比,發現數據量大稍大時,分片表的優勢就顯現出來了。
第一次
citusdb=# \timing
Timing is on.
citusdb=# select count(1) from ref_t0 t0,ref_t1 t1 where t0.c0=t1.c0;
count
---------
1000000
(1 row)
Time: 3478.988 ms
citusdb=# select count(1) from tmp_t0 t0,tmp_t1 t1 where t0.c0=t1.c0;
count
---------
1000000
(1 row)
Time: 2362.888 ms
第二次
citusdb=# select count(1) from ref_t0 t0,ref_t1 t1 where t0.c0=t1.c0;
count
---------
1000000
(1 row)
Time: 5947.913 ms
citusdb=# select count(1) from tmp_t0 t0,tmp_t1 t1 where t0.c0=t1.c0;
count
---------
1000000
(1 row)
Time: 1783.115 ms
第三次
citusdb=# select count(1) from ref_t0 t0,ref_t1 t1 where t0.c0=t1.c0;
count
---------
1000000
(1 row)
Time: 5951.641 ms
citusdb=# select count(1) from tmp_t0 t0,tmp_t1 t1 where t0.c0=t1.c0;
count
---------
1000000
(1 row)
Time: 1274.679 ms
第四次
citusdb=# select count(1) from ref_t0 t0,ref_t1 t1 where t0.c0=t1.c0;
count
---------
1000000
(1 row)
Time: 4662.890 ms
citusdb=# select count(1) from tmp_t0 t0,tmp_t1 t1 where t0.c0=t1.c0;
count
---------
1000000
(1 row)
Time: 1347.655 ms
參考:
https://www.citusdata.com/
https://docs.citusdata.com/en/v8.0/
https://docs.citusdata.com/en/stable/index.html