1. 程式人生 > 其它 >PostgreSQL模糊查詢

PostgreSQL模糊查詢

1、建立測試表、插入資料

postgres=# create table test_like(id int, name varchar);
CREATE TABLE
postgres=# insert into test_like select generate_series(1,1000000),md5(random()::varchar);
INSERT 0 1000000
postgres=# analyze test_like ;
ANALYZE

2、未建立索引的執行計劃

postgres=# explain select * from test_like where name like '111%' ;
                             QUERY PLAN
--------------------------------------------------------------------
 Remote Fast Query Execution  (cost=0.00..0.00 rows=0 width=0)
   Node/s: dn001, dn002, dn003
   ->  Seq Scan on test_like  (cost=0.00..7950.56 rows=33 width=37)
         Filter: ((name)::text ~~ '111%'::text)
(4 rows)

3、普通btree索引不走索引

postgres=# create index idx_test_like_name on test_like (name);
CREATE INDEX
postgres=# analyze test_like ;
ANALYZE
postgres=# explain select * from test_like where name like '111%' ;
                             QUERY PLAN
--------------------------------------------------------------------
 Remote Fast Query Execution  (cost=0.00..0.00 rows=0 width=0)
   Node/s: dn001, dn002, dn003
   ->  Seq Scan on test_like  (cost=0.00..7950.56 rows=33 width=37)
         Filter: ((name)::text ~~ '111%'::text)
(4 rows)

4、前模糊匹配查詢

1)collate "C"

postgres=# create index idx_test_like_name_c on test_like (name collate "C");
CREATE INDEX
postgres=# analyze test_like ;
ANALYZE
postgres=# explain select * from test_like where name like '111%' ;
                                         QUERY PLAN
--------------------------------------------------------------------------------------------
 Remote Fast Query Execution  (cost=0.00..0.00 rows=0 width=0)
   Node/s: dn001, dn002, dn003
   ->  Bitmap Heap Scan on test_like  (cost=2.99..108.88 rows=33 width=37)
         Filter: ((name)::text ~~ '111%'::text)
         ->  Bitmap Index Scan on idx_test_like_name_c  (cost=0.00..2.98 rows=56 width=0)
               Index Cond: (((name)::text >= '111'::text) AND ((name)::text < '112'::text))
(6 rows)

2)操作符類varchar_pattern_ops方式

postgres=# create index idx_test_like_name_ops on test_like (name varchar_pattern_ops);
CREATE INDEX
postgres=# analyze test_like ;
ANALYZE
postgres=# explain select * from test_like where name like '111%' ;
                                           QUERY PLAN
------------------------------------------------------------------------------------------------
 Remote Fast Query Execution  (cost=0.00..0.00 rows=0 width=0)
   Node/s: dn001, dn002, dn003
   ->  Bitmap Heap Scan on test_like  (cost=2.97..105.20 rows=33 width=37)
         Filter: ((name)::text ~~ '111%'::text)
         ->  Bitmap Index Scan on idx_test_like_name_ops  (cost=0.00..2.96 rows=54 width=0)
               Index Cond: (((name)::text ~>=~ '111'::text) AND ((name)::text ~<~ '112'::text))
(6 rows)

但後匹配、中間匹配不支援:

postgres=# explain select * from test_like where name like '%111%1' ;
                             QUERY PLAN
--------------------------------------------------------------------
 Remote Fast Query Execution  (cost=0.00..0.00 rows=0 width=0)
   Node/s: dn001, dn002, dn003
   ->  Seq Scan on test_like  (cost=0.00..7950.56 rows=33 width=37)
         Filter: ((name)::text ~~ '%111%1'::text)
(4 rows)

postgres=# explain select * from test_like where name like '%111' ;
                             QUERY PLAN
--------------------------------------------------------------------
 Remote Fast Query Execution  (cost=0.00..0.00 rows=0 width=0)
   Node/s: dn001, dn002, dn003
   ->  Seq Scan on test_like  (cost=0.00..7950.56 rows=33 width=37)
         Filter: ((name)::text ~~ '%111'::text)
(4 rows)

postgres=# explain select * from test_like where name like '111%' ;
                                           QUERY PLAN
------------------------------------------------------------------------------------------------
 Remote Fast Query Execution  (cost=0.00..0.00 rows=0 width=0)
   Node/s: dn001, dn002, dn003
   ->  Bitmap Heap Scan on test_like  (cost=2.98..107.04 rows=33 width=37)
         Filter: ((name)::text ~~ '111%'::text)
         ->  Bitmap Index Scan on idx_test_like_name_ops  (cost=0.00..2.97 rows=55 width=0)
               Index Cond: (((name)::text ~>=~ '111'::text) AND ((name)::text ~<~ '112'::text))
(6 rows)

postgres=# explain select * from test_like where name like '%111%' ;
                             QUERY PLAN
--------------------------------------------------------------------
 Remote Fast Query Execution  (cost=0.00..0.00 rows=0 width=0)
   Node/s: dn001, dn002, dn003
   ->  Seq Scan on test_like  (cost=0.00..7950.56 rows=33 width=37)
         Filter: ((name)::text ~~ '%111%'::text)
(4 rows)

5、方式二:使用pg_trim外掛

The pg_trgm module provides functions and operators for determining the similarity of alphanumeric text based on trigram matching, as well as index operator classes that support fast searching for similar strings.

A trigram is a group of three consecutive characters taken from a string. We can measure the similarity of two strings by counting the number of trigrams they share. This simple idea turns out to be very effective for measuring the similarity of words in many natural languages.

Note: pg_trgm ignores non-word characters (non-alphanumerics) when extracting trigrams from a string. Each word is considered to have two spaces prefixed and one space suffixed when determining the set of trigrams contained in the string. For example, the set of trigrams in the string "cat" is " c", " ca", "cat", and "at ". The set of trigrams in the string "foo|bar" is " f", " fo", "foo", "oo ", " b", " ba", "bar", and "ar ".

postgres=# create extension pg_trgm;
CREATE EXTENSION
postgres=# create index idx_test_like_name_trgm on test_like using gin (name gin_trgm_ops);
CREATE INDEX
postgres=# analyze test_like ;
ANALYZE
postgres=# explain select * from test_like where name like '111%' ;
                                          QUERY PLAN
----------------------------------------------------------------------------------------------
 Remote Fast Query Execution  (cost=0.00..0.00 rows=0 width=0)
   Node/s: dn001, dn002, dn003
   ->  Bitmap Heap Scan on test_like  (cost=18.26..81.59 rows=33 width=37)
         Recheck Cond: ((name)::text ~~ '111%'::text)
         ->  Bitmap Index Scan on idx_test_like_name_trgm  (cost=0.00..18.25 rows=33 width=0)
               Index Cond: ((name)::text ~~ '111%'::text)
(6 rows)

postgres=# explain select * from test_like where name like '%111%' ;
                                           QUERY PLAN
------------------------------------------------------------------------------------------------
 Remote Fast Query Execution  (cost=0.00..0.00 rows=0 width=0)
   Node/s: dn001, dn002, dn003
   ->  Bitmap Heap Scan on test_like  (cost=62.16..3816.37 rows=6731 width=37)
         Recheck Cond: ((name)::text ~~ '%111%'::text)
         ->  Bitmap Index Scan on idx_test_like_name_trgm  (cost=0.00..60.48 rows=6731 width=0)
               Index Cond: ((name)::text ~~ '%111%'::text)
(6 rows)

postgres=# explain select * from test_like where name like '%111' ;
                                          QUERY PLAN
----------------------------------------------------------------------------------------------
 Remote Fast Query Execution  (cost=0.00..0.00 rows=0 width=0)
   Node/s: dn001, dn002, dn003
   ->  Bitmap Heap Scan on test_like  (cost=12.26..75.59 rows=33 width=37)
         Recheck Cond: ((name)::text ~~ '%111'::text)
         ->  Bitmap Index Scan on idx_test_like_name_trgm  (cost=0.00..12.25 rows=33 width=0)
               Index Cond: ((name)::text ~~ '%111'::text)
(6 rows)

postgres=# explain select * from test_like where name like '%111%1' ;
                                         QUERY PLAN
---------------------------------------------------------------------------------------------
 Remote Fast Query Execution  (cost=0.00..0.00 rows=0 width=0)
   Node/s: dn001, dn002, dn003
   ->  Bitmap Heap Scan on test_like  (cost=6.26..69.59 rows=33 width=37)
         Recheck Cond: ((name)::text ~~ '%111%1'::text)
         ->  Bitmap Index Scan on idx_test_like_name_trgm  (cost=0.00..6.25 rows=33 width=0)
               Index Cond: ((name)::text ~~ '%111%1'::text)
(6 rows)

缺陷 :就像上面描述的那樣,pg_trgm把字串切成多個三元組,然後做匹配,當匹配關鍵詞少於三個的時候,只有字首匹配會走索引:

postgres=# explain select * from test_like where name like '%11%' ;
                              QUERY PLAN
-----------------------------------------------------------------------
 Remote Fast Query Execution  (cost=0.00..0.00 rows=0 width=0)
   Node/s: dn001, dn002, dn003
   ->  Seq Scan on test_like  (cost=0.00..7950.56 rows=40384 width=37)
         Filter: ((name)::text ~~ '%11%'::text)
(4 rows)

postgres=# explain select * from test_like where name like '%111%' ;
                                           QUERY PLAN
------------------------------------------------------------------------------------------------
 Remote Fast Query Execution  (cost=0.00..0.00 rows=0 width=0)
   Node/s: dn001, dn002, dn003
   ->  Bitmap Heap Scan on test_like  (cost=34.08..2908.28 rows=3365 width=37)
         Recheck Cond: ((name)::text ~~ '%111%'::text)
         ->  Bitmap Index Scan on idx_test_like_name_trgm  (cost=0.00..33.24 rows=3365 width=0)
               Index Cond: ((name)::text ~~ '%111%'::text)
(6 rows)
嚴以律己、寬以待人