postgresql-JSON使用，針對value模糊匹配

阿新 • • 發佈：2019-01-05

json,jsonb區別

json和jsonb，而兩者唯一的區別在於效率,json是對輸入的完整拷貝，使用時再去解析，所以它會保留輸入的空格，重複鍵以及順序等。而jsonb是解析輸入後儲存的二進位制，它在解析時會刪除不必要的空格和重複的鍵，順序和輸入可能也不相同。使用時不用再次解析。兩者對重複鍵的處理都是保留最後一個鍵值對。效率的差別：json型別儲存快，查詢慢，jsonb型別儲存稍慢，查詢較快(支援許多額外的操作符)。

關於json和jsonb儲存和解析效率連線

這裡主要測試jsonb的增刪改查

json和jsonb共同操作符

操作符	返回型別	陣列[1,2,3]	{"a":1,"b":2,"c":3}	{"a":{"b":{"c":1}},"d":[4,5,6]}
->	json	select '[1,2,3]'::jsonb ->2 = 3	select '{"a":1,"b":2,"c":3}'::jsonb-> 'a'=1	select '{"a":{"b":{"c":1}},"d":[4,5,6]}'::jsonb ->'a'={"b": {"c": 1}}
->>	text	select '[1,2,3]'::jsonb ->>2 = 3	select '{"a":1,"b":2,"c":3}'::jsonb->> 'a'=1	select '{"a":{"b":{"c":1}},"d":[4,5,6]}'::jsonb ->>'a'={"b": {"c": 1}}
#>	json	--	--	select '{"a":{"b":{"c":1}},"d":[4,5,6]}'::jsonb #> '{a,b}' ={"c": 1}
#>>	text	--	--	select '{"a":{"b":{"c":1}},"d":[4,5,6]}'::jsonb #> '{a,b}' ={"c": 1}

jsonb額外操作符

操作符	右運算元型別	描述	例子
@>	jsonb	左邊的 JSON 值是否包含頂層右邊JSON路徑/值項?	'{"a":1, "b":2}'::jsonb @> '{"b":2}'::jsonb
<@	jsonb	左邊的JSON路徑/值是否包含在頂層右邊JSON值中？	'{"b":2}'::jsonb <@ '{"a":1, "b":2}'::jsonb
?	text	字串是否作為頂層鍵值存在於JSON值中？	'{"a":1, "b":2}'::jsonb ? 'b'
?\|	text[]	這些陣列字串中的任何一個是否作為頂層鍵值存在？	'{"a":1, "b":2, "c":3}'::jsonb ?\|array['b',c']
?&	text[]	這些陣列字串是否作為頂層鍵值存在？	'["a", "b"]'::jsonb ?& array['a', 'b']
\|\|	jsonb	連線兩個jsonb值到新的jsonb值	'["a", "b"]'::jsonb\|\| '["c", "d"]'::jsonb
-	text	從左運算元中刪除鍵/值對或字串元素。基於鍵值匹配鍵/值對。	'{"a": "b"}'::jsonb - 'a'
-	integer	刪除指定索引的陣列元素（負整數結尾）。如果頂層容器不是一個數組，那麼丟擲錯誤。	'["a", "b"]'::jsonb - 1
#-	text[]	刪除指定路徑的域或元素（JSON陣列，負整數結尾）	'["a", {"b":1}]'::jsonb #- '{1,b}'

jsonb增刪改

--1.1建表
abase=> create table test_jsonb(c_bh char(32),j_jsonb jsonb);
CREATE TABLE

--插入資料
insert into test_jsonb(c_bh,j_jsonb) values(replace(uuid_generate_v4()::text,'-',''),'{"c_xm":"張三","c_mx":{"c_ssdw":"一大隊","c_dwbm":"11"}}');
INSERT 0 1
--檢視資料
abase=# select * from test_jsonb where j_jsonb @> '{"c_xm":"張三","c_mx":{"c_ssdw":"一大隊","c_dwbm":"11"}}';              
               c_bh               |                            j_jsonb                             
----------------------------------+--------------------------------------------
 c217c624152943ab93f502117514f432 | {"c_mx": {"c_dwbm": "11", "c_ssdw": "一大隊"}, "c_xm": "張三"}
(1 row)
--1.2操作符||可用於新增元素，新增元素'{"c_id":"111"}'
abase=# update test_jsonb set j_jsonb = j_jsonb ||'{"c_id":"111"}'::jsonb  where c_bh = 'c217c624152943ab93f502117514f432'; 
UPDATE 1
abase=# select j_jsonb from test_jsonb where c_bh = 'c217c624152943ab93f502117514f432'; 
                                    j_jsonb                                    
-------------------------------------------------------------------------------
 {"c_id": "111", "c_mx": {"c_dwbm": "11", "c_ssdw": "一大隊"}, "c_xm": "張三"}
(1 row)


--1.3更新元素（方法1），如果jsonb中有相同的元素則覆蓋,使用'||'將'{"c_id":"111"}'更新為112
abase=# update test_jsonb set j_jsonb = j_jsonb ||'{"c_id":"112"}'::jsonb  where c_bh = 'c217c624152943ab93f502117514f432'; 
UPDATE 1
abase=# select j_jsonb from test_jsonb where c_bh = 'c217c624152943ab93f502117514f432'; 
                                    j_jsonb                                    
-------------------------------------------------------------------------------
 {"c_id": "112", "c_mx": {"c_dwbm": "11", "c_ssdw": "一大隊"}, "c_xm": "張三"}
(1 row)
--更新元素（方法2），使用jsonb_set，將"c_id": "112"更新為123
abase=# update test_jsonb set j_jsonb=  jsonb_set(j_jsonb,'{c_id}','"123"'::jsonb,false)  where c_bh = 'c217c624152943ab93f502117514f432';
UPDATE 1
abase=# select j_jsonb from test_jsonb where c_bh = 'c217c624152943ab93f502117514f432'; 
                                    j_jsonb                                    
-------------------------------------------------------------------------------
 {"c_id": "123", "c_mx": {"c_dwbm": "11", "c_ssdw": "一大隊"}, "c_xm": "張三"}
(1 row)


--1.4更新巢狀元素，使用jsonb_set（pg9.5以上才支援），更新c_ssdw為二大隊
abase=# update test_jsonb set j_jsonb=  jsonb_set(j_jsonb,'{c_mx,c_ssdw}','"二大隊"'::jsonb,false)  where c_bh = 'c217c624152943ab93f502117514f432';
UPDATE 1
abase=# select j_jsonb from test_jsonb where c_bh = 'c217c624152943ab93f502117514f432'; 
                                    j_jsonb                                    
-------------------------------------------------------------------------------
 {"c_id": "123", "c_mx": {"c_dwbm": "11", "c_ssdw": "二大隊"}, "c_xm": "張三"}
(1 row)


--1.5刪除元素，刪除c_id元素
abase=# update test_jsonb set  j_jsonb = j_jsonb-'c_id' where c_bh = 'c217c624152943ab93f502117514f432' ;
UPDATE 1
abase=# select j_jsonb from test_jsonb where c_bh = 'c217c624152943ab93f502117514f432'; 
                            j_jsonb                             
----------------------------------------------------------------
 {"c_mx": {"c_dwbm": "11", "c_ssdw": "二大隊"}, "c_xm": "張三"}
(1 row)

jsonb查詢

--1.隨機文字指令碼
abase=> create or replace function random_string(INTEGER)  
abase-> RETURNS TEXT AS  
abase-> $BODY$  
abase$> select array_to_string(  
abase$>     array(  
abase$>         select substring(  
abase$>             'pg社群的作風非常嚴謹，一個補丁可能在郵件組中討論幾個月甚至幾年，根據大家的意見反覆的修正，補丁合併到主幹已經非常成熟，所以pg的穩定性也是遠近聞名的'   
abase$>         from (ceil(random()*73))::int FOR 2  
abase$>         )  
abase$>         from generate_series(1,$1)  
abase$>     ),''  
abase$> )  $BODY$  
abase-> LANGUAGE sql VOLATILE; 
CREATE FUNCTION

--2.初始化資料：
abase=> insert into test_jsonb select replace(uuid_generate_v4()::text,'-',''),('{"a":'||random()*100||', "kxhbsl":"'|| random_string(10) ||'"}')::jsonb    from generate_series(1,2000000); 
INSERT 0 2000000
abase=> insert into test_jsonb select replace(uuid_generate_v4()::text,'-',''),('{"a":'||random()*100||', "kxhbsl":"索尼是大法官"}')::jsonb    from generate_series(1,10000); 
INSERT 0 10000

--3.第一種查詢：獲取包含'{"kxhbsl": "索尼是大法官"}'，全表掃描
abase=# explain analyze select j_jsonb->>'kxhbsl',j_jsonb from test_jsonb where j_jsonb @> '{"kxhbsl": "索尼是大法官"}';
                                                            QUERY PLAN                                                       
-------------------------------------------------------------------------------------------------------------
 Gather  (cost=1000.00..53379.78 rows=2010 width=134) (actual time=470.729..490.979 rows=10000 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   ->  Parallel Seq Scan on test_jsonb  (cost=0.00..52175.85 rows=838 width=134) (actual time=465.234..480.57
3 rows=3333 loops=3)
         Filter: (j_jsonb @> '{"kxhbsl": "索尼是大法官"}'::jsonb)
         Rows Removed by Filter: 666667
 Planning time: 0.318 ms
 Execution time: 506.204 ms
(8 rows)

--j_jsonb欄位建立gin索引後，可走索引
abase=# create index i_t_test_jsonb_j_jsonb on test_jsonb using gin(j_jsonb);
CREATE INDEX
abase=#  explain analyze select j_jsonb->>'kxhbsl',* from test_jsonb where j_jsonb @> '{"kxhbsl": "索尼是大法官"}';
                                                              QUERY PLAN                                                              
-------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on test_jsonb  (cost=59.58..6664.09 rows=2010 width=167) (actual time=3.579..17.065 rows=10
000 loops=1)
   Recheck Cond: (j_jsonb @> '{"kxhbsl": "索尼是大法官"}'::jsonb)
   Heap Blocks: exact=481
   ->  Bitmap Index Scan on i_t_test_jsonb_j_jsonb  (cost=0.00..59.08 rows=2010 width=0) (actual time=3.480..
3.480 rows=10000 loops=1)
         Index Cond: (j_jsonb @> '{"kxhbsl": "索尼是大法官"}'::jsonb)
 Planning time: 0.429 ms
 Execution time: 17.964 ms
(7 rows)


--4.第二種查詢，獲取包含:'索尼是大法官'，全表掃描
abase=#  explain analyze select j_jsonb->>'kxhbsl',j_jsonb from test_jsonb where j_jsonb -> 'kxhbsl' ? '索尼是大法官';
                                                             QUERY PLAN                                                           
-------------------------------------------------------------------------------------------------------------
 Gather  (cost=1000.00..55473.53 rows=2010 width=134) (actual time=1724.170..1769.543 rows=10000 loops=1)
   Workers Planned: 2
   Workers Launched: 0
   ->  Parallel Seq Scan on test_jsonb  (cost=0.00..54269.60 rows=838 width=134) (actual time=1723.752..1767.
187 rows=10000 loops=1)
         Filter: ((j_jsonb -> 'kxhbsl'::text) ? '索尼是大法官'::text)
         Rows Removed by Filter: 2000000
 Planning time: 0.267 ms
 Execution time: 1770.422 ms
(8 rows)

--針對jsonb欄位的kxhbsl元素建立gin索引。 可走索引
abase=# create index i_t_test_jsonb_j_jsonb_kxhbsl on test_jsonb using gin((j_jsonb->'kxhbsl'));
CREATE INDEX
abase=#  explain analyze select j_jsonb->>'kxhbsl',j_jsonb from test_jsonb where j_jsonb -> 'kxhbsl' ? '索尼是大法官';
                                                                  QUERY PLAN                                                                
-------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on test_jsonb  (cost=39.58..6649.12 rows=2010 width=134) (actual time=2.166..13.999 rows=10
000 loops=1)
   Recheck Cond: ((j_jsonb -> 'kxhbsl'::text) ? '索尼是大法官'::text)
   Heap Blocks: exact=481
   ->  Bitmap Index Scan on i_t_test_jsonb_j_jsonb_kxhbsl  (cost=0.00..39.08 rows=2010 width=0) (actual time=
2.045..2.045 rows=10000 loops=1)
         Index Cond: ((j_jsonb -> 'kxhbsl'::text) ? '索尼是大法官'::text)
 Planning time: 0.221 ms
 Execution time: 14.715 ms
(7 rows)
--或者等價寫法：
abase=# explain analyze select j_jsonb->>'kxhbsl',j_jsonb from test_jsonb where j_jsonb -> 'kxhbsl' @>'"索尼是大法官"';
                                                                  QUERY PLAN                                                             
-------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on test_jsonb  (cost=39.58..6649.12 rows=2010 width=134) (actual time=2.080..14.959 rows=10
000 loops=1)
   Recheck Cond: ((j_jsonb -> 'kxhbsl'::text) @> '"索尼是大法官"'::jsonb)
   Heap Blocks: exact=481
   ->  Bitmap Index Scan on i_t_test_jsonb_j_jsonb_kxhbsl  (cost=0.00..39.08 rows=2010 width=0) (actual time=
1.980..1.980 rows=10000 loops=1)
         Index Cond: ((j_jsonb -> 'kxhbsl'::text) @> '"索尼是大法官"'::jsonb)
 Planning time: 0.199 ms
 Execution time: 15.635 ms
(7 rows)

--5.第三種查詢，獲取'{"kxhbsl": "索尼是大法官"}'，全表掃描
abase=# explain analyze select * from test_jsonb where j_jsonb->>'kxhbsl' = '索尼是大法官';
                                                            QUERY PLAN                                                             
-------------------------------------------------------------------------------------------------------------
 Gather  (cost=1000.00..56272.50 rows=10050 width=135) (actual time=458.676..476.454 rows=10000 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   ->  Parallel Seq Scan on test_jsonb  (cost=0.00..54267.50 rows=4188 width=135) (actual time=453.472..466.5
44 rows=3333 loops=3)
         Filter: ((j_jsonb ->> 'kxhbsl'::text) = '索尼是大法官'::text)
         Rows Removed by Filter: 666667
 Planning time: 0.821 ms
 Execution time: 492.763 ms
(8 rows)
--針對這類查詢，j_jsonb->>'kxhbsl'返回型別為text，那麼可以考慮建立一個btree索引，也可以走索引
abase=# create index i_test_jsonb_j_jsonb_btree on test_jsonb using btree((j_jsonb ->> 'kxhbsl') );
CREATE INDEX
abase=# explain analyze select * from test_jsonb where j_jsonb->>'kxhbsl' = '索尼是大法官';
                                                                 QUERY PLAN                                                           
-------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on test_jsonb  (cost=498.44..24049.15 rows=10050 width=135) (actual time=4.150..8.168 rows=
10000 loops=1)
   Recheck Cond: ((j_jsonb ->> 'kxhbsl'::text) = '索尼是大法官'::text)
   Heap Blocks: exact=481
   ->  Bitmap Index Scan on i_test_jsonb_j_jsonb_btree  (cost=0.00..495.93 rows=10050 width=0) (actual time=4
.042..4.042 rows=10000 loops=1)
         Index Cond: ((j_jsonb ->> 'kxhbsl'::text) = '索尼是大法官'::text)
 Planning time: 0.684 ms
 Execution time: 8.991 ms
(7 rows)


--6.由於j_jsonb->>'kxhbsl'返回為text型別，所以可在其上面做許多操作，比如in，exists等
--檢視執行計劃，in查詢：
abase=# explain analyze select * from test_jsonb where j_jsonb->>'kxhbsl' in ('索尼是大法官','3');
                                                                 QUERY PLAN                                                             
-------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on test_jsonb  (cost=992.88..35800.76 rows=20100 width=135) (actual time=2.666..5.992 rows=
10000 loops=1)
   Recheck Cond: ((j_jsonb ->> 'kxhbsl'::text) = ANY ('{索尼是大法官,3}'::text[]))
   Heap Blocks: exact=481
   ->  Bitmap Index Scan on i_test_jsonb_j_jsonb_btree  (cost=0.00..987.86 rows=20100 width=0) (actual time=2
.576..2.576 rows=10000 loops=1)
         Index Cond: ((j_jsonb ->> 'kxhbsl'::text) = ANY ('{索尼是大法官,3}'::text[]))
 Planning time: 0.360 ms
 Execution time: 6.856 ms
(7 rows)

三種查詢都能得到相同的結果，可以看出第一種針對於jsonb欄位的gin索引，適用於jsonb欄位所有的元素，而第二種和第三種分別是對單個元素建立的gin和btree索引。

等值查詢方面可能單個元素的btree索引佔用空間小，且效率較高，如果單獨某個元素的查詢較為頻繁可選擇btree索引，而整個jsonb建立gin對所有元素有效。

第一種傳入的是一個json，而第二種，第三種傳入的是字串

jsonb元素值模糊匹配

--1.有時候需要對jsonb的元素值進行模糊匹配
--在前面只有j_jsonb gin索引情況下，like全模糊匹配不能走索引
abase=#  explain  analyze select * from test_jsonb where j_jsonb->>'kxhbsl' like '%大法官%';
                                                           QUERY PLAN                                                      
-------------------------------------------------------------------------------------------------------------
 Gather  (cost=1000.00..55287.60 rows=201 width=135) (actual time=832.031..857.306 rows=10000 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   ->  Parallel Seq Scan on test_jsonb  (cost=0.00..54267.50 rows=84 width=135) (actual time=826.065..844.494
 rows=3333 loops=3)
         Filter: ((j_jsonb ->> 'kxhbsl'::text) ~~ '%大法官%'::text)
         Rows Removed by Filter: 666667
 Planning time: 0.314 ms
 Execution time: 873.938 ms
(8 rows)

--由於(j_jsonb ->>'kxhbsl')返回的是text型別，所以考慮再其上面使用pg_trgm,建立gin索引。
abase=# create index i_test_jsonb_j_jsonb_gin on test_jsonb using gin((j_jsonb ->>'kxhbsl') gin_trgm_ops);
CREATE INDEX
--檢視執行計劃，模糊匹配可走索引。
abase=#  explain  analyze select * from test_jsonb where j_jsonb->>'kxhbsl' like '%大法官%';
                                                               QUERY PLAN                                                              
-------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on test_jsonb  (cost=17.56..782.71 rows=201 width=135) (actual time=3.781..16.256 rows=1000
0 loops=1)
   Recheck Cond: ((j_jsonb ->> 'kxhbsl'::text) ~~ '%大法官%'::text)
   Heap Blocks: exact=481
   ->  Bitmap Index Scan on i_test_jsonb_j_jsonb_gin  (cost=0.00..17.51 rows=201 width=0) (actual time=3.649.
.3.649 rows=10000 loops=1)
         Index Cond: ((j_jsonb ->> 'kxhbsl'::text) ~~ '%大法官%'::text)
 Planning time: 0.575 ms
 Execution time: 17.514 ms
(7 rows)


--2.當然還有一種方式就是將該jsonb欄位轉為text，然後再建立gin索引
--建立gin索引
abase=#create index i_jsonb_ops on test_jsonb using gin ((j_jsonb::text) gin_trgm_ops);
CREATE INDEX
--但是這樣的模糊匹配，可能匹配到其他元素中包含同樣的值，所以需要加上輔助條件：j_jsonb->>'kxhbsl' like '%索尼是大法官%'，用來確保是該元素
abase=#  explain  analyze select * from test_jsonb where j_jsonb->>'kxhbsl' like '%大法官%' and j_jsonb ::text like '%大法官%';
                                                         QUERY PLAN                                          
              
-------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on test_jsonb  (cost=1297.51..2064.17 rows=1 width=135) (actual time=5.318..28.149 rows=100
00 loops=1)
   Recheck Cond: ((j_jsonb)::text ~~ '%大法官%'::text)
   Filter: ((j_jsonb ->> 'kxhbsl'::text) ~~ '%大法官%'::text)
   Heap Blocks: exact=481
   ->  Bitmap Index Scan on i_jsonb_ops  (cost=0.00..1297.51 rows=201 width=0) (actual time=5.198..5.198 rows
=10000 loops=1)
         Index Cond: ((j_jsonb)::text ~~ '%大法官%'::text)
 Planning time: 0.479 ms
 Execution time: 29.147 ms
(8 rows)

第二種方法效率相對於第一種要低一點，但是所有元素都可使用

結語

1.在json和jsonb選擇上，json更加適合用於儲存，jsonb更加適用於檢索。

2.可以對整個jsonb欄位建立gin索引，同時也可以對jsonb中某個元素建立gin索引，或者btree。btree效率最高。

3.(j_jsonb ->> 'kxhbsl')返回的是一個text型別，所以可以在該屬性上建立對應型別的索引，比如btree，gin索引。

4.對於元素值的模糊匹配可以建立單個元素的gin索引，也可以建立整個jsonb欄位的gin索引，前者效率較高，後者適用所有元素。

postgresql-JSON使用，針對value模糊匹配

json,jsonb區別

json和jsonb共同操作符

jsonb額外操作符

jsonb增刪改

jsonb查詢

jsonb元素值模糊匹配

結語

postgresql-JSON使用，針對value模糊匹配

python使用rabbitmq例項五，路由鍵模糊匹配

git log --author詳解，這個是個模糊匹配

比較兩個json，key值相同的情況下判斷另一個json的value值是否為空（遞迴）

RapidJson(V1.1.0)的Value簡單操作(拼接json串，存取檔案中的json，解析json串)

JSON（帶json陣列）格式轉XML（多層巢狀，帶value）格式

js讀取json陣列時，key值如果是變數，獲取value的方法

解決在springboot+mybatis+postgresql時，資料庫欄位型別為json時，如何與mybatis進行對映

Java通過key直接擷取json字串的value，json無需轉換

【搜尋引擎】 PostgreSQL 10 實時全文檢索和分詞、相似搜尋、模糊匹配實現類似Google搜尋自動提示

logminer挖掘歸檔日誌，針對DDL誤操作的恢復

android 訪問web端與解析json，模擬用戶登錄

模塊 json，sys，pickle，logging

mysql模糊匹配like 之 %

一個偉大的發現，裝X一下。筆記本win7系統64位機器執行unity 時，屏幕模糊解決的方法

JSON之Asp.net MVC C#對象轉JSON，DataTable轉JSON，List<T>轉JSON,JSON轉List<T>,JSON轉C#對象

js讀取json，糾結。。。

關聯容器map(紅黑樹，key/value)

Mybatis自動生成Xml文件，針對字段類型為text等會默認產生XXXXWithBlobs的方法問題

通過fastjson將一個對象序列化為json，同時加入指定的序列化邏輯

postgresql-JSON使用，針對value模糊匹配

json,jsonb區別

json和jsonb共同操作符

jsonb額外操作符

jsonb增刪改

jsonb查詢

jsonb元素值模糊匹配

結語

相關推薦