PostgreSQL 百億資料 秒級響應 正則及模糊查詢
原文: https://yq.aliyun.com/articles/7444?spm=5176.blog7549.yqblogcon1.6.2wcXO2
摘要: 正則匹配和模糊匹配通常是搜尋引擎的特長,但是如果你使用的是 PostgreSQL 資料庫照樣能實現,並且效能不賴,加上分散式方案 (譬如 plproxy, pg_shard, fdw shard, pg-xc, pg-xl, greenplum),處理百億以上資料量的正則匹配和模糊匹配效果槓槓的,.
正則匹配和模糊匹配通常是搜尋引擎的特長,但是如果你使用的是 PostgreSQL 資料庫照樣能實現,並且效能不賴,加上分散式方案 (譬如 plproxy, pg_shard, fdw shard, pg-xc, pg-xl, greenplum),處理百億以上資料量的正則匹配和模糊匹配效果槓槓的,同時還不失資料庫固有的功能,一舉多得。
物聯網中有大量的資料,除了數字資料,還有字串類的資料,例如條形碼,車牌,手機號,郵箱,姓名等等。
假設使用者需要在大量的感測資料中進行模糊檢索,甚至規則表示式匹配,有什麼高效的方法呢?
這種場景還挺多,例如市面上發現了一批藥品可能有問題,需要對藥品條碼進行規則表示式查詢,找出複合條件的藥品流向。
又比如在偵查行動時,線索的檢索,如使用者提供的殘缺的電話號碼,郵箱,車牌,IP地址,QQ號碼,微訊號碼等等。
根據這些資訊加上時間的疊加,模糊匹配和關聯,最終找出罪犯。
可以看出,模糊匹配,正則表示式匹配,和人臉拼圖有點類似,需求非常的迫切。
首先對應用場景進行一下分類,以及現有技術下能使用的優化手段。
.1. 帶字首的模糊查詢,例如 like 'ABC%',在PG中也可以寫成 ~ '^ABC'
可以使用btree索引優化,或者拆列用多列索引疊加bit and或bit or進行優化(只適合固定長度的端字串,例如char(8))。
.2. 帶字尾的模糊查詢,例如 like '%ABC',在PG中也可以寫成 ~ 'ABC$'
可以使用reverse函式btree索引,或者拆列用多列索引疊加bit and或bit or進行優化(只適合固定長度的端字串,例如char(8))。
.3. 不帶字首和字尾的模糊查詢,例如 like '%AB_C%',在PG中也可以寫成 ~ 'AB.C'
可以使用pg_trgm的gin索引,或者拆列用多列索引疊加bit and或bit or進行優化(只適合固定長度的端字串,例如char(8))。
.4. 正則表示式查詢,例如 ~ '[\d]+def1.?[a|b|0|8]{1,3}'
可以使用pg_trgm的gin索引,或者拆列用多列索引疊加bit and或bit or進行優化(只適合固定長度的端字串,例如char(8))。
PostgreSQL pg_trgm外掛自從9.1開始支援模糊查詢使用索引,從9.3開始支援規則表示式查詢使用索引,大大提高了PostgreSQL在刑偵方面的能力。
程式碼見
https://github.com/postgrespro/pg_trgm_pro
pg_trgm外掛的原理,將字串前加2個空格,後加1個空格,組成一個新的字串,並將這個新的字串按照每3個相鄰的字元拆分成多個token。
當使用規則表示式或者模糊查詢進行匹配時,會檢索出他們的近似度,再進行filter。
GIN索引的圖例:
從btree檢索到匹配的token時,指向對應的list, 從list中儲存的ctid找到對應的記錄。
因為一個字串會拆成很多個token,所以沒插入一條記錄,會更新多條索引,這也是GIN索引需要fastupdate的原因。
正則匹配是怎麼做到的呢?
詳見 https://raw.githubusercontent.com/postgrespro/pg_trgm_pro/master/trgm_regexp.c
實際上它是將正則表示式轉換成了NFA格式,然後掃描多個TOKEN,進行bit and|or匹配。
正則組合如果轉換出來的的bit and|or很多的話,就需要大量的recheck,效能也不能好到哪裡去。
下面針對以上四種場景,例項講解如何優化。
.1. 帶字首的模糊查詢,例如 like 'ABC%',在PG中也可以寫成 ~ '^ABC'
可以使用btree索引優化,或者拆列用多列索引疊加bit and或bit or進行優化(只適合固定長度的端字串,例如char(8))。
例子,1000萬隨機產生的MD5資料的前8個字元。
postgres=# create table tb(info text);
CREATE TABLE
postgres=# insert into tb select substring(md5(random()::text),1,8) from generate_series(1,10000000);
INSERT 0 10000000
postgres=# create index idx_tb on tb(info);
CREATE INDEX
postgres=# select * from tb limit 1;
info
----------
376821ab
(1 row)
postgres=# explain select * from tb where info ~ '^376821' limit 10;
QUERY PLAN
-------------------------------------------------------------------------------
Limit (cost=0.43..0.52 rows=10 width=9)
-> Index Only Scan using idx_tb on tb (cost=0.43..8.46 rows=1000 width=9)
Index Cond: ((info >= '376821'::text) AND (info < '376822'::text))
Filter: (info ~ '^376821'::text)
(4 rows)
postgres=# select * from tb where info ~ '^376821' limit 10;
info
----------
376821ab
(1 row)
Time: 0.536 ms
postgres=# set enable_indexscan=off;
SET
Time: 1.344 ms
postgres=# set enable_bitmapscan=off;
SET
Time: 0.158 ms
postgres=# explain select * from tb where info ~ '^376821' limit 10;
QUERY PLAN
----------------------------------------------------------------
Limit (cost=0.00..1790.55 rows=10 width=9)
-> Seq Scan on tb (cost=0.00..179055.00 rows=1000 width=9)
Filter: (info ~ '^376821'::text)
(3 rows)
Time: 0.505 ms
帶字首的模糊查詢,不使用索引需要5483毫秒。
帶字首的模糊查詢,使用索引只需要0.5毫秒。
postgres=# select * from tb where info ~ '^376821' limit 10;
info
----------
376821ab
(1 row)
Time: 5483.655 ms
.2. 帶字尾的模糊查詢,例如 like '%ABC',在PG中也可以寫成 ~ 'ABC$'
可以使用reverse函式btree索引,或者拆列用多列索引疊加bit and或bit or進行優化(只適合固定長度的端字串,例如char(8))。
postgres=# create index idx_tb1 on tb(reverse(info));
CREATE INDEX
postgres=# explain select * from tb where reverse(info) ~ '^ba128' limit 10;
QUERY PLAN
--------------------------------------------------------------------------------------------
Limit (cost=0.43..28.19 rows=10 width=9)
-> Index Scan using idx_tb1 on tb (cost=0.43..138778.43 rows=50000 width=9)
Index Cond: ((reverse(info) >= 'ba128'::text) AND (reverse(info) < 'ba129'::text))
Filter: (reverse(info) ~ '^ba128'::text)
(4 rows)
postgres=# select * from tb where reverse(info) ~ '^ba128' limit 10;
info
----------
220821ab
671821ab
305821ab
e65821ab
536821ab
376821ab
668821ab
4d8821ab
26c821ab
(9 rows)
Time: 0.506 ms
帶字尾的模糊查詢,使用索引只需要0.5毫秒。
.3. 不帶字首和字尾的模糊查詢,例如 like '%AB_C%',在PG中也可以寫成 ~ 'AB.C'
可以使用pg_trgm的gin索引,或者拆列用多列索引疊加bit and或bit or進行優化(只適合固定長度的端字串,例如char(8))。
postgres=# create extension pg_trgm;
postgres=# explain select * from tb where info ~ '5821a';
QUERY PLAN
----------------------------------------------------------------------------
Bitmap Heap Scan on tb (cost=103.75..3677.71 rows=1000 width=9)
Recheck Cond: (info ~ '5821a'::text)
-> Bitmap Index Scan on idx_tb_2 (cost=0.00..103.50 rows=1000 width=0)
Index Cond: (info ~ '5821a'::text)
(4 rows)
Time: 0.647 ms
postgres=# select * from tb where info ~ '5821a';
info
----------
5821a8a3
945821af
45821a74
9fe5821a
5821a7e0
5821af2a
1075821a
e5821ac9
d265821a
45f5821a
df5821a4
de5821af
71c5821a
375821a3
fc5821af
5c5821ad
e65821ab
5821adde
c35821a6
5821a642
305821ab
5821a1c8
75821a5c
ce95821a
a65821ad
(25 rows)
Time: 3.808 ms
前後模糊查詢,使用索引只需要3.8毫秒。
.4. 正則表示式查詢,例如 ~ '[\d]+def1.?[a|b|0|8]{1,3}'
可以使用pg_trgm的gin索引,或者拆列用多列索引疊加bit and或bit or進行優化(只適合固定長度的端字串,例如char(8))。
前後模糊查詢,使用索引只需要108毫秒。
postgres=# select * from tb where info ~ 'e65[\d]{2}a[b]{1,2}8' limit 10;
info
----------
4e6567ab
1e6530ab
e6500ab8
ae6583ab
e6564ab7
5e6532ab
e6526abf
e6560ab6
(8 rows)
Time: 108.577 ms
時間主要花費在排他上面。
檢索了14794行,remove了14793行。大量的時間花費在無用功上,但是比全表掃還是好很多。
postgres=# explain (verbose,analyze,buffers,costs,timing) select * from tb where info ~ 'e65[\d]{2}a[b]{1,2}8' limit 10;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------
Limit (cost=511.75..547.49 rows=10 width=9) (actual time=89.934..120.567 rows=1 loops=1)
Output: info
Buffers: shared hit=13054
-> Bitmap Heap Scan on public.tb (cost=511.75..4085.71 rows=1000 width=9) (actual time=89.930..120.562 rows=1 loops=1)
Output: info
Recheck Cond: (tb.info ~ 'e65[\d]{2}a[b]{1,2}8'::text)
Rows Removed by Index Recheck: 14793
Heap Blocks: exact=12929
Buffers: shared hit=13054
-> Bitmap Index Scan on idx_tb_2 (cost=0.00..511.50 rows=1000 width=0) (actual time=67.589..67.589 rows=14794 loops=1)
Index Cond: (tb.info ~ 'e65[\d]{2}a[b]{1,2}8'::text)
Buffers: shared hit=125
Planning time: 0.493 ms
Execution time: 120.618 ms
(14 rows)
Time: 124.693 ms
優化:
使用gin索引後,需要考慮效能問題,因為info欄位被打散成了多個char(3)的token,從而涉及到非常多的索引條目,如果有非常高併發的插入,最好把gin_pending_list_limit設大,來提高插入效率,降低實時合併索引帶來的RT升高。
使用了fastupdate後,會在每次vacuum表時,自動將pengding的資訊合併到GIN索引中。
還有一點,查詢不會有合併的動作,對於沒有合併的GIN資訊是使用遍歷的方式搜尋的。
壓測高併發的效能:
create table tbl(id serial8, crt_time timestamp, sensorid int, sensorloc point, info text) with (autovacuum_enabled=on, autovacuum_vacuum_threshold=0.000001,autovacuum_vacuum_cost_delay=0);
CREATE INDEX trgm_idx ON tbl USING GIN (info gin_trgm_ops) with (fastupdate='on', gin_pending_list_limit='6553600');
alter sequence tbl_id_seq cache 10000;
修改配置,讓資料庫的autovacuum快速迭代合併gin。
vi $PGDATA/postgresql.conf
autovacuum_naptime=1s
maintenance_work_mem=1GB
autovacuum_work_mem=1GB
autovacuum = on
autovacuum_max_workers = 3
log_autovacuum_min_duration = 0
autovacuum_vacuum_cost_delay=0
$ pg_ctl reload
建立一個測試函式,用來產生隨機的測試資料。
postgres=# create or replace function f() returns void as $$
insert into tbl (crt_time,sensorid,info) values ( clock_timestamp(),trunc(random()*500000),substring(md5(random()::text),1,8) );
$$ language sql strict;
vi test.sql
select f();
pgbench -M prepared -n -r -P 1 -f ./test.sql -c 48 -j 48 -T 10000
progress: 50.0 s, 52800.9 tps, lat 0.453 ms stddev 0.390
progress: 51.0 s, 52775.8 tps, lat 0.453 ms stddev 0.398
progress: 52.0 s, 53173.2 tps, lat 0.449 ms stddev 0.371
progress: 53.0 s, 53010.0 tps, lat 0.451 ms stddev 0.390
progress: 54.0 s, 53360.9 tps, lat 0.448 ms stddev 0.365
progress: 55.0 s, 53285.0 tps, lat 0.449 ms stddev 0.362
progress: 56.0 s, 53662.1 tps, lat 0.445 ms stddev 0.368
progress: 57.0 s, 53283.8 tps, lat 0.448 ms stddev 0.385
progress: 58.0 s, 53703.4 tps, lat 0.445 ms stddev 0.355
progress: 59.0 s, 53818.7 tps, lat 0.444 ms stddev 0.344
progress: 60.0 s, 53889.2 tps, lat 0.443 ms stddev 0.361
progress: 61.0 s, 53613.8 tps, lat 0.446 ms stddev 0.355
progress: 62.0 s, 53339.9 tps, lat 0.448 ms stddev 0.392
progress: 63.0 s, 54014.9 tps, lat 0.442 ms stddev 0.346
progress: 64.0 s, 53112.1 tps, lat 0.450 ms stddev 0.374
progress: 65.0 s, 53706.1 tps, lat 0.445 ms stddev 0.367
progress: 66.0 s, 53720.9 tps, lat 0.445 ms stddev 0.353
progress: 67.0 s, 52858.1 tps, lat 0.452 ms stddev 0.415
progress: 68.0 s, 53218.9 tps, lat 0.449 ms stddev 0.387
progress: 69.0 s, 53403.0 tps, lat 0.447 ms stddev 0.377
progress: 70.0 s, 53179.9 tps, lat 0.449 ms stddev 0.377
progress: 71.0 s, 53232.4 tps, lat 0.449 ms stddev 0.373
progress: 72.0 s, 53011.7 tps, lat 0.451 ms stddev 0.386
progress: 73.0 s, 52685.1 tps, lat 0.454 ms stddev 0.384
progress: 74.0 s, 52937.8 tps, lat 0.452 ms stddev 0.377
按照這個速度,一天能支援超過40億資料入庫。
接下來對比一下字串分離的例子,這個例子適用於字串長度固定,並且很小的場景,如果字串長度不固定,這種方法沒用。
適用splict的方法,測試資料不盡人意,所以還是用pg_trgm比較靠譜。
postgres=# create table t_split(id int, crt_time timestamp, sensorid int, sensorloc point, info text, c1 char(1), c2 char(1), c3 char(1), c4 char(1), c5 char(1), c6 char(1), c7 char(1), c8 char(1));
CREATE TABLE
Time: 2.123 ms
postgres=# insert into t_split(id,crt_time,sensorid,info,c1,c2,c3,c4,c5,c6,c7,c8) select id,ct,sen,info,substring(info,1,1),substring(info,2,1),substring(info,3,1),substring(info,4,1),substring(info,5,1),substring(info,6,1),substring(info,7,1),substring(info,8,1) from (select id, clock_timestamp() ct, trunc(random()*500000) sen, substring(md5(random()::text), 1, 8) info from generate_series(1,10000000) t(id)) t;
INSERT 0 10000000
Time: 81829.274 ms
postgres=# create index idx1 on t_split (c1);
postgres=# create index idx2 on t_split (c2);
postgres=# create index idx3 on t_split (c3);
postgres=# create index idx4 on t_split (c4);
postgres=# create index idx5 on t_split (c5);
postgres=# create index idx6 on t_split (c6);
postgres=# create index idx7 on t_split (c7);
postgres=# create index idx8 on t_split (c8);
postgres=# create index idx9 on t_split using gin (info gin_trgm_ops);
postgres=# select * from t_split limit 1;
id | crt_time | sensorid | sensorloc | info | c1 | c2 | c3 | c4 | c5 | c6 | c7 | c8
----+----------------------------+----------+-----------+----------+----+----+----+----+----+----+----+----
1 | 2016-03-02 09:58:03.990639 | 161958 || 33eed779 | 3 | 3 | e | e | d | 7 | 7 | 9
(1 row)
postgres=# select * from t_split where info ~ '^3[\d]?eed[\d]?79$' limit 10;
id | crt_time | sensorid | sensorloc | info | c1 | c2 | c3 | c4 | c5 | c6 | c7 | c8
----+----------------------------+----------+-----------+----------+----+----+----+----+----+----+----+----
1 | 2016-03-02 09:58:03.990639 | 161958 || 33eed779 | 3 | 3 | e | e | d | 7 | 7 | 9
(1 row)
Time: 133.041 ms
postgres=# explain (analyze,verbose,timing,costs,buffers) select * from t_split where info ~ '^3[\d]?eed[\d]?79$' limit 10;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------
Limit (cost=575.75..612.78 rows=10 width=57) (actual time=92.406..129.838 rows=1 loops=1)
Output: id, crt_time, sensorid, sensorloc, info, c1, c2, c3, c4, c5, c6, c7, c8
Buffers: shared hit=13798
-> Bitmap Heap Scan on public.t_split (cost=575.75..4278.56 rows=1000 width=57) (actual time=92.403..129.833 rows=1 loops=1)
Output: id, crt_time, sensorid, sensorloc, info, c1, c2, c3, c4, c5, c6, c7, c8
Recheck Cond: (t_split.info ~ '^3[\d]?eed[\d]?79$'::text)
Rows Removed by Index Recheck: 14690
Heap Blocks: exact=13669
Buffers: shared hit=13798
-> Bitmap Index Scan on idx9 (cost=0.00..575.50 rows=1000 width=0) (actual time=89.576..89.576 rows=14691 loops=1)
Index Cond: (t_split.info ~ '^3[\d]?eed[\d]?79$'::text)
Buffers: shared hit=129
Planning time: 0.385 ms
Execution time: 129.883 ms
(14 rows)
Time: 130.678 ms
postgres=# select * from t_split where c1='3' and c3='e' and c4='e' and c5='d' and c7='7' and c8='9' and c2 between '0' and '9' and c6 between '0' and '9' limit 10;
id | crt_time | sensorid | sensorloc | info | c1 | c2 | c3 | c4 | c5 | c6 | c7 | c8
----+----------------------------+----------+-----------+----------+----+----+----+----+----+----+----+----
1 | 2016-03-02 09:58:03.990639 | 161958 || 33eed779 | 3 | 3 | e | e | d | 7 | 7 | 9
(1 row)
Time: 337.367 ms
postgres=# explain (analyze,verbose,timing,costs,buffers) select * from t_split where c1='3' and c3='e' and c4='e' and c5='d' and c7='7' and c8='9' and c2 between '0' and '9' and c6 between '0' and '9' limit 10;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=33582.31..41499.35 rows=1 width=57) (actual time=339.230..344.675 rows=1 loops=1)
Output: id, crt_time, sensorid, sensorloc, info, c1, c2, c3, c4, c5, c6, c7, c8
Buffers: shared hit=7581
-> Bitmap Heap Scan on public.t_split (cost=33582.31..41499.35 rows=1 width=57) (actual time=339.228..344.673 rows=1 loops=1)
Output: id, crt_time, sensorid, sensorloc, info, c1, c2, c3, c4, c5, c6, c7, c8
Recheck Cond: ((t_split.c3 = 'e'::bpchar) AND (t_split.c8 = '9'::bpchar) AND (t_split.c5 = 'd'::bpchar))
Filter: ((t_split.c2 >= '0'::bpchar) AND (t_split.c2 <= '9'::bpchar) AND (t_split.c6 >= '0'::bpchar) AND (t_split.c6 <= '9'::bpchar) AND (t_split.c1 = '3'::bpchar) AND (t_split.c4 = 'e'::bpchar) AND (t_split.c7 = '7'::bpchar))
Rows Removed by Filter: 2480
Heap Blocks: exact=2450
Buffers: shared hit=7581
-> BitmapAnd (cost=33582.31..33582.31 rows=2224 width=0) (actual time=338.512..338.512 rows=0 loops=1)
Buffers: shared hit=5131
-> Bitmap Index Scan on idx3 (cost=0.00..11016.93 rows=596333 width=0) (actual time=104.418..104.418 rows=624930 loops=1)
Index Cond: (t_split.c3 = 'e'::bpchar)
Buffers: shared hit=1711
-> Bitmap Index Scan on idx8 (cost=0.00..11245.44 rows=608667 width=0) (actual time=100.185..100.185 rows=625739 loops=1)
Index Cond: (t_split.c8 = '9'::bpchar)
Buffers: shared hit=1712
-> Bitmap Index Scan on idx5 (cost=0.00..11319.44 rows=612667 width=0) (actual time=99.480..99.480 rows=624269 loops=1)
Index Cond: (t_split.c5 = 'd'::bpchar)
Buffers: shared hit=1708
Planning time: 0.262 ms
Execution time: 344.731 ms
(23 rows)
Time: 346.424 ms
postgres=# select * from t_split where info ~ '^33.+7.+9$' limit 10;
id | crt_time | sensorid | sensorloc | info | c1 | c2 | c3 | c4 | c5 | c6 | c7 | c8
--------+----------------------------+----------+-----------+----------+----+----+----+----+----+----+----+----
1 | 2016-03-02 09:58:03.990639 | 161958 || 33eed779 | 3 | 3 | e | e | d | 7 | 7 | 9
24412 | 2016-03-02 09:58:04.186359 | 251599 || 33f07429 | 3 | 3 | f | 0 | 7 | 4 | 2 | 9
24989 | 2016-03-02 09:58:04.191112 | 214569 || 334587d9 | 3 | 3 | 4 | 5 | 8 | 7 | d | 9
50100 | 2016-03-02 09:58:04.398499 | 409819 || 33beb7b9 | 3 | 3 | b | e | b | 7 | b | 9
92623 | 2016-03-02 09:58:04.745372 | 280100 || 3373e719 | 3 | 3 | 7 | 3 | e | 7 | 1 | 9
106054 | 2016-03-02 09:58:04.855627 | 155192 || 33c575c9 | 3 | 3 | c | 5 | 7 | 5 | c | 9
107070 | 2016-03-02 09:58:04.863827 | 464325 || 337dd729 | 3 | 3 | 7 | d | d | 7 | 2 | 9
135152 | 2016-03-02 09:58:05.088217 | 240500 || 336271d9 | 3 | 3 | 6 | 2 | 7 | 1 | d | 9
156425 | 2016-03-02 09:58:05.25805 | 218202 || 333e7289 | 3 | 3 | 3 | e | 7 | 2 | 8 | 9
170210 | 2016-03-02 09:58:05.368371 | 132530 || 33a8d789 | 3 | 3 | a | 8 | d | 7 | 8 | 9
(10 rows)
Time: 20.431 ms
postgres=# explain (analyze,verbose,timing,costs,buffers) select * from t_split where info ~ '^33.+7.+9$' limit 10;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------
Limit (cost=43.75..80.78 rows=10 width=57) (actual time=19.573..21.212 rows=10 loops=1)
Output: id, crt_time, sensorid, sensorloc, info, c1, c2, c3, c4, c5, c6, c7, c8
Buffers: shared hit=566
-> Bitmap Heap Scan on public.t_split (cost=43.75..3746.56 rows=1000 width=57) (actual time=19.571..21.206 rows=10 loops=1)
Output: id, crt_time, sensorid, sensorloc, info, c1, c2, c3, c4, c5, c6, c7, c8
Recheck Cond: (t_split.info ~ '^33.+7.+9$'::text)
Rows Removed by Index Recheck: 647
Heap Blocks: exact=552
Buffers: shared hit=566
-> Bitmap Index Scan on idx9 (cost=0.00..43.50 rows=1000 width=0) (actual time=11.712..11.712 rows=39436 loops=1)
Index Cond: (t_split.info ~ '^33.+7.+9$'::text)
Buffers: shared hit=14
Planning time: 0.301 ms
Execution time: 21.255 ms
(14 rows)
Time: 21.995 ms
postgres=# select * from t_split where c1='3' and c2='3' and c8='9' and (c4='7' or c5='7' or c6='7') limit 10;
id | crt_time | sensorid | sensorloc | info | c1 | c2 | c3 | c4 | c5 | c6 | c7 | c8
--------+----------------------------+----------+-----------+----------+----+----+----+----+----+----+----+----
1 | 2016-03-02 09:58:03.990639 | 161958 || 33eed779 | 3 | 3 | e | e | d | 7 | 7 | 9
24412 | 2016-03-02 09:58:04.186359 | 251599 || 33f07429 | 3 | 3 | f | 0 | 7 | 4 | 2 | 9
24989 | 2016-03-02 09:58:04.191112 | 214569 || 334587d9 | 3 | 3 | 4 | 5 | 8 | 7 | d | 9
50100 | 2016-03-02 09:58:04.398499 | 409819 || 33beb7b9 | 3 | 3 | b | e | b | 7 | b | 9
92623 | 2016-03-02 09:58:04.745372 | 280100 || 3373e719 | 3 | 3 | 7 | 3 | e | 7 | 1 | 9
106054 | 2016-03-02 09:58:04.855627 | 155192 || 33c575c9 | 3 | 3 | c | 5 | 7 | 5 | c | 9
107070 | 2016-03-02 09:58:04.863827 | 464325 || 337dd729 | 3 | 3 | 7 | d | d | 7 | 2 | 9
135152 | 2016-03-02 09:58:05.088217 | 240500 || 336271d9 | 3 | 3 | 6 | 2 | 7 | 1 | d | 9
156425 | 2016-03-02 09:58:05.25805 | 218202 || 333e7289 | 3 | 3 | 3 | e | 7 | 2 | 8 | 9
170210 | 2016-03-02 09:58:05.368371 | 132530 || 33a8d789 | 3 | 3 | a | 8 | d | 7 | 8 | 9
(10 rows)
Time: 37.739 ms
postgres=# explain (analyze,verbose,timing,costs,buffers) select * from t_split where c1='3' and c2='3' and c8='9' and (c4='7' or c5='7' or c6='7') limit 10;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..8135.78 rows=10 width=57) (actual time=0.017..35.532 rows=10 loops=1)
Output: id, crt_time, sensorid, sensorloc, info, c1, c2, c3, c4, c5, c6, c7, c8
Buffers: shared hit=1755
-> Seq Scan on public.t_split (cost=0.00..353093.00 rows=434 width=57) (actual time=0.015..35.526 rows=10 loops=1)
Output: id, crt_time, sensorid, sensorloc, info, c1, c2, c3, c4, c5, c6, c7, c8
Filter: ((t_split.c1 = '3'::bpchar) AND (t_split.c2 = '3'::bpchar) AND (t_split.c8 = '9'::bpchar) AND ((t_split.c4 = '7'::bpchar) OR (t_split.c5 = '7'::bpchar) OR (t_split.c6 = '7'::bpchar)))
Rows Removed by Filter: 170200
Buffers: shared hit=1755
Planning time: 0.210 ms
Execution time: 35.572 ms
(10 rows)
Time: 36.260 ms
postgres=# select * from t_split where info ~ '^3.?[b-g]+ed[\d]+79' order by info <-> '^3.?[b-g]+ed[\d]+79' limit 10;
id | crt_time | sensorid | sensorloc | info | c1 | c2 | c3 | c4 | c5 | c6 | c7 | c8
---------+----------------------------+----------+-----------+----------+----+----+----+----+----+----+----+----
1 | 2016-03-02 09:58:03.990639 | 161958 || 33eed779 | 3 | 3 | e | e | d | 7 | 7 | 9
1308724 | 2016-03-02 09:58:14.590901 | 458822 || 3fed9479 | 3 | f | e | d | 9 | 4 | 7 | 9
2866024 | 2016-03-02 09:58:27.20105 | 106467 || 3fed2279 | 3 | f | e | d | 2 | 2 | 7 | 9
4826729 | 2016-03-02 09:58:42.907431 | 228023 || 3ded9879 | 3 | d | e | d | 9 | 8 | 7 | 9
6113373 | 2016-03-02 09:58:53.211146 | 499702 || 36fed479 | 3 | 6 | f | e | d | 4 | 7 | 9
1768237 | 2016-03-02 09:58:18.310069 | 345027 || 30fed079 | 3 | 0 | f | e | d | 0 | 7 | 9
1472324 | 2016-03-02 09:58:15.913629 | 413283 || 3eed5798 | 3 | e | e | d | 5 | 7 | 9 | 8
8319056 | 2016-03-02 09:59:10.902137 | 336740 || 3ded7790 | 3 | d | e | d | 7 | 7 | 9 | 0
8576573 | 2016-03-02 09:59:12.962923 | 130223 || 3eed5793 | 3 | e | e | d | 5 | 7 | 9 | 3
(9 rows)
Time: 268.661 ms
postgres=# explain (analyze,verbose,timing,buffers,costs) select * from t_split where info ~ '^3.?[b-g]+ed[\d]+79' order by info <-> '^3.?[b-g]+ed[\d]+79' limit 10;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=4302.66..4302.69 rows=10 width=57) (actual time=269.214..269.217 rows=9 loops=1)
Output: id, crt_time, sensorid, sensorloc, info, c1, c2, c3, c4, c5, c6, c7, c8, ((info <-> '^3.?[b-g]+ed[\d]+79'::text))
Buffers: shared hit=52606
-> Sort (cost=4302.66..4305.16 rows=1000 width=57) (actual time=269.212..269.212 rows=9 loops=1)
Output: id, crt_time, sensorid, sensorloc, info, c1, c2, c3, c4, c5, c6, c7, c8, ((info <-> '^3.?[b-g]+ed[\d]+79'::text))
Sort Key: ((t_split.info <-> '^3.?[b-g]+ed[\d]+79'::text))
Sort Method: quicksort Memory: 26kB
Buffers: shared hit=52606
-> Bitmap Heap Scan on public.t_split (cost=575.75..4281.06 rows=1000 width=57) (actual time=100.771..269.180 rows=9 loo