1. 程式人生 > >citus 之三 reference table

citus 之三 reference table

os: ubuntu 16.04
postgresql: 9.6.8
citus: postgresql-9.6-citus 8.0.0

安裝結束,下一篇blog介紹下如何建立表。
citus 有兩種表:

  1. distributed table:分片表,rows會分佈在 worker節點中。主要用於大量資料的事實表。
  2. reference table:廣播表,每個 worker 節點都儲存一模一樣的資料。主要用於維度表。

登入 coordinator 建立廣播表

$ psql -h 192.168.0.92 -U cituser citusdb
citusdb=# create table ref_t0(c0 varchar(100),c1 varchar(100));
CREATE TABLE
citusdb=# create table ref_t1(c0 varchar(100),c1 varchar(100));
CREATE TABLE

citusdb=# select create_reference_table('ref_t0');
 create_reference_table 
------------------------
 
(1 row)

Time: 664.340 ms
citusdb=# select create_reference_table('ref_t1');
 create_reference_table 
------------------------
 
(1 row)

Time: 211.499 ms

citusdb=# \d+
                     List of relations
 Schema |  Name  | Type  |  Owner  |  Size   | Description 
--------+--------+-------+---------+---------+-------------
 public | ref_t0 | table | cituser | 0 bytes | 
 public | ref_t1 | table | cituser | 0 bytes |

pgsql2 節點上檢視

citusdb=# \d+
                        List of relations
 Schema |     Name      | Type  |  Owner  |  Size   | Description 
--------+---------------+-------+---------+---------+-------------
 public | ref_t0_102072 | table | cituser | 0 bytes | 
 public | ref_t1_102073 | table | cituser | 0 bytes | 

pgsql3 節點上檢視

citusdb=# \d+
                        List of relations
 Schema |     Name      | Type  |  Owner  |  Size   | Description 
--------+---------------+-------+---------+---------+-------------
 public | ref_t0_102072 | table | cituser | 0 bytes | 
 public | ref_t1_102073 | table | cituser | 0 bytes | 

插入資料

citusdb=# insert into ref_t0(c0,c1) 
select md5(md5((id)::varchar)),md5((id)::varchar) from generate_series(1,2000000) as id;

INSERT 0 2000000

citusdb=# insert into ref_t1(c0,c1) 
select md5(md5((id)::varchar)),md5((id)::varchar) from generate_series(1,1000000) as id;

pgsql2 節點上檢視

citusdb=# \d+
                        List of relations
 Schema |     Name      | Type  |  Owner  |  Size   | Description 
--------+---------------+-------+---------+---------+-------------
 public | ref_t0_102072 | table | cituser | 193 MB  | 
 public | ref_t1_102073 | table | cituser | 97 MB   |

pgsql3 節點上檢視

citusdb=# \d+
                        List of relations
 Schema |     Name      | Type  |  Owner  |  Size   | Description 
--------+---------------+-------+---------+---------+-------------
 public | ref_t0_102072 | table | cituser | 193 MB  | 
 public | ref_t1_102073 | table | cituser | 97 MB   |

可以看到,在 worker 節點 pgsql2、pgsql3 上的資料是一模一樣。
coordinator 節點 pgsql1 是不儲存任何資料的。如下:

citusdb=# \d+
                     List of relations
 Schema |  Name  | Type  |  Owner  |  Size   | Description 
--------+--------+-------+---------+---------+-------------
 public | ref_t0 | table | cituser | 0 bytes | 
 public | ref_t1 | table | cituser | 0 bytes |

執行計劃

citusdb=# explain verbose select count(1) from ref_t0;
                                              QUERY PLAN                                               
-------------------------------------------------------------------------------------------------------
 Custom Scan (Citus Router)  (cost=0.00..0.00 rows=0 width=0)
   Output: remote_scan.count
   Task Count: 1
   Tasks Shown: All
   ->  Task
         Node: host=192.168.0.90 port=5432 dbname=citusdb
         ->  Aggregate  (cost=49692.00..49692.01 rows=1 width=8)
               Output: count(1)
               ->  Seq Scan on public.ref_t0_102072 ref_t0  (cost=0.00..44692.00 rows=2000000 width=0)
                     Output: c0, c1
(10 rows)

Time: 15.737 ms

多表join

citusdb=# 
citusdb=# select count(1) from ref_t0 t0,ref_t1 t1 where t0.c0=t1.c0;
  count  
---------
 1000000
(1 row)
citusdb=# explain verbose select count(1) from ref_t0 t0,ref_t1 t1 where t0.c0=t1.c0;
                                                   QUERY PLAN                                                   
----------------------------------------------------------------------------------------------------------------
 Custom Scan (Citus Router)  (cost=0.00..0.00 rows=0 width=0)
   Output: remote_scan.count
   Task Count: 1
   Tasks Shown: All
   ->  Task
         Node: host=192.168.0.90 port=5432 dbname=citusdb
         ->  Aggregate  (cost=146414.00..146414.01 rows=1 width=8)
               Output: count(1)
               ->  Hash Join  (cost=42659.00..143914.00 rows=1000000 width=0)
                     Hash Cond: ((t0.c0)::text = (t1.c0)::text)
                     ->  Seq Scan on public.ref_t0_102072 t0  (cost=0.00..44692.00 rows=2000000 width=33)
                           Output: t0.c0
                     ->  Hash  (cost=22346.00..22346.00 rows=1000000 width=33)
                           Output: t1.c0
                           ->  Seq Scan on public.ref_t1_102073 t1  (cost=0.00..22346.00 rows=1000000 width=33)
                                 Output: t1.c0
(16 rows)

Time: 393.115 ms

對 distributed table 和 reference table 做了一個簡單的對比,發現數據量大稍大時,分片表的優勢就顯現出來了。
第一次

citusdb=# \timing
Timing is on. 
citusdb=# select count(1) from ref_t0 t0,ref_t1 t1 where t0.c0=t1.c0;
  count  
---------
 1000000
(1 row)

Time: 3478.988 ms
citusdb=# select count(1) from tmp_t0 t0,tmp_t1 t1 where t0.c0=t1.c0;
  count  
---------
 1000000
(1 row)

Time: 2362.888 ms

第二次

citusdb=# select count(1) from ref_t0 t0,ref_t1 t1 where t0.c0=t1.c0;
  count  
---------
 1000000
(1 row)

Time: 5947.913 ms
citusdb=# select count(1) from tmp_t0 t0,tmp_t1 t1 where t0.c0=t1.c0;
  count  
---------
 1000000
(1 row)

Time: 1783.115 ms

第三次

citusdb=# select count(1) from ref_t0 t0,ref_t1 t1 where t0.c0=t1.c0;
  count  
---------
 1000000
(1 row)

Time: 5951.641 ms
citusdb=# select count(1) from tmp_t0 t0,tmp_t1 t1 where t0.c0=t1.c0;
  count  
---------
 1000000
(1 row)

Time: 1274.679 ms

第四次

citusdb=# select count(1) from ref_t0 t0,ref_t1 t1 where t0.c0=t1.c0;
  count  
---------
 1000000
(1 row)

Time: 4662.890 ms
citusdb=# select count(1) from tmp_t0 t0,tmp_t1 t1 where t0.c0=t1.c0;
  count  
---------
 1000000
(1 row)

Time: 1347.655 ms

參考:
https://www.citusdata.com/
https://docs.citusdata.com/en/v8.0/
https://docs.citusdata.com/en/stable/index.html