postgresql-JSON使用,針對value模糊匹配
阿新 • • 發佈:2019-01-05
json,jsonb區別
json和jsonb,而兩者唯一的區別在於效率,json是對輸入的完整拷貝,使用時再去解析,所以它會保留輸入的空格,重複鍵以及順序等。而jsonb是解析輸入後儲存的二進位制,它在解析時會刪除不必要的空格和重複的鍵,順序和輸入可能也不相同。使用時不用再次解析。兩者對重複鍵的處理都是保留最後一個鍵值對。效率的差別:json型別儲存快,查詢慢,jsonb型別儲存稍慢,查詢較快(支援許多額外的操作符)。
關於json和jsonb儲存和解析效率連線
這裡主要測試jsonb的增刪改查
json和jsonb共同操作符
操作符 | 返回型別 | 陣列[1,2,3] | {"a":1,"b":2,"c":3} | {"a":{"b":{"c":1}},"d":[4,5,6]} |
---|---|---|---|---|
-> | json | select '[1,2,3]'::jsonb ->2 = 3 | select '{"a":1,"b":2,"c":3}'::jsonb-> 'a'=1 | select '{"a":{"b":{"c":1}},"d":[4,5,6]}'::jsonb ->'a'={"b": {"c": 1}} |
->> | text | select '[1,2,3]'::jsonb ->>2 = 3 | select '{"a":1,"b":2,"c":3}'::jsonb->> 'a'=1 | select '{"a":{"b":{"c":1}},"d":[4,5,6]}'::jsonb ->>'a'={"b": {"c": 1}} |
#> | json | -- | -- | select '{"a":{"b":{"c":1}},"d":[4,5,6]}'::jsonb #> '{a,b}' ={"c": 1} |
#>> | text | -- | -- | select '{"a":{"b":{"c":1}},"d":[4,5,6]}'::jsonb #> '{a,b}' ={"c": 1} |
jsonb額外操作符
操作符 | 右運算元型別 | 描述 | 例子 |
---|---|---|---|
@> | jsonb | 左邊的 JSON 值是否包含頂層右邊JSON路徑/值項? | '{"a":1, "b":2}'::jsonb @> '{"b":2}'::jsonb |
<@ | jsonb | 左邊的JSON路徑/值是否包含在頂層右邊JSON值中? | '{"b":2}'::jsonb <@ '{"a":1, "b":2}'::jsonb |
? | text | 字串是否作為頂層鍵值存在於JSON值中? | '{"a":1, "b":2}'::jsonb ? 'b' |
?| | text[] | 這些陣列字串中的任何一個是否作為頂層鍵值存在? | '{"a":1, "b":2, "c":3}'::jsonb ?|array['b',c'] |
?& | text[] | 這些陣列字串是否作為頂層鍵值存在? | '["a", "b"]'::jsonb ?& array['a', 'b'] |
|| | jsonb | 連線兩個jsonb值到新的jsonb值 | '["a", "b"]'::jsonb|| '["c", "d"]'::jsonb |
- | text | 從左運算元中刪除鍵/值對或字串元素。基於鍵值匹配鍵/值對。 | '{"a": "b"}'::jsonb - 'a' |
- | integer | 刪除指定索引的陣列元素(負整數結尾)。如果頂層容器不是一個數組,那麼丟擲錯誤。 | '["a", "b"]'::jsonb - 1 |
#- | text[] | 刪除指定路徑的域或元素(JSON陣列,負整數結尾) | '["a", {"b":1}]'::jsonb #- '{1,b}' |
jsonb增刪改
--1.1建表 abase=> create table test_jsonb(c_bh char(32),j_jsonb jsonb); CREATE TABLE --插入資料 insert into test_jsonb(c_bh,j_jsonb) values(replace(uuid_generate_v4()::text,'-',''),'{"c_xm":"張三","c_mx":{"c_ssdw":"一大隊","c_dwbm":"11"}}'); INSERT 0 1 --檢視資料 abase=# select * from test_jsonb where j_jsonb @> '{"c_xm":"張三","c_mx":{"c_ssdw":"一大隊","c_dwbm":"11"}}'; c_bh | j_jsonb ----------------------------------+-------------------------------------------- c217c624152943ab93f502117514f432 | {"c_mx": {"c_dwbm": "11", "c_ssdw": "一大隊"}, "c_xm": "張三"} (1 row) --1.2操作符||可用於新增元素,新增元素'{"c_id":"111"}' abase=# update test_jsonb set j_jsonb = j_jsonb ||'{"c_id":"111"}'::jsonb where c_bh = 'c217c624152943ab93f502117514f432'; UPDATE 1 abase=# select j_jsonb from test_jsonb where c_bh = 'c217c624152943ab93f502117514f432'; j_jsonb ------------------------------------------------------------------------------- {"c_id": "111", "c_mx": {"c_dwbm": "11", "c_ssdw": "一大隊"}, "c_xm": "張三"} (1 row) --1.3更新元素(方法1),如果jsonb中有相同的元素則覆蓋,使用'||'將'{"c_id":"111"}'更新為112 abase=# update test_jsonb set j_jsonb = j_jsonb ||'{"c_id":"112"}'::jsonb where c_bh = 'c217c624152943ab93f502117514f432'; UPDATE 1 abase=# select j_jsonb from test_jsonb where c_bh = 'c217c624152943ab93f502117514f432'; j_jsonb ------------------------------------------------------------------------------- {"c_id": "112", "c_mx": {"c_dwbm": "11", "c_ssdw": "一大隊"}, "c_xm": "張三"} (1 row) --更新元素(方法2),使用jsonb_set,將"c_id": "112"更新為123 abase=# update test_jsonb set j_jsonb= jsonb_set(j_jsonb,'{c_id}','"123"'::jsonb,false) where c_bh = 'c217c624152943ab93f502117514f432'; UPDATE 1 abase=# select j_jsonb from test_jsonb where c_bh = 'c217c624152943ab93f502117514f432'; j_jsonb ------------------------------------------------------------------------------- {"c_id": "123", "c_mx": {"c_dwbm": "11", "c_ssdw": "一大隊"}, "c_xm": "張三"} (1 row) --1.4更新巢狀元素,使用jsonb_set(pg9.5以上才支援),更新c_ssdw為二大隊 abase=# update test_jsonb set j_jsonb= jsonb_set(j_jsonb,'{c_mx,c_ssdw}','"二大隊"'::jsonb,false) where c_bh = 'c217c624152943ab93f502117514f432'; UPDATE 1 abase=# select j_jsonb from test_jsonb where c_bh = 'c217c624152943ab93f502117514f432'; j_jsonb ------------------------------------------------------------------------------- {"c_id": "123", "c_mx": {"c_dwbm": "11", "c_ssdw": "二大隊"}, "c_xm": "張三"} (1 row) --1.5刪除元素,刪除c_id元素 abase=# update test_jsonb set j_jsonb = j_jsonb-'c_id' where c_bh = 'c217c624152943ab93f502117514f432' ; UPDATE 1 abase=# select j_jsonb from test_jsonb where c_bh = 'c217c624152943ab93f502117514f432'; j_jsonb ---------------------------------------------------------------- {"c_mx": {"c_dwbm": "11", "c_ssdw": "二大隊"}, "c_xm": "張三"} (1 row)
jsonb查詢
--1.隨機文字指令碼 abase=> create or replace function random_string(INTEGER) abase-> RETURNS TEXT AS abase-> $BODY$ abase$> select array_to_string( abase$> array( abase$> select substring( abase$> 'pg社群的作風非常嚴謹,一個補丁可能在郵件組中討論幾個月甚至幾年,根據大家的意見反覆的修正,補丁合併到主幹已經非常成熟,所以pg的穩定性也是遠近聞名的' abase$> from (ceil(random()*73))::int FOR 2 abase$> ) abase$> from generate_series(1,$1) abase$> ),'' abase$> ) $BODY$ abase-> LANGUAGE sql VOLATILE; CREATE FUNCTION --2.初始化資料: abase=> insert into test_jsonb select replace(uuid_generate_v4()::text,'-',''),('{"a":'||random()*100||', "kxhbsl":"'|| random_string(10) ||'"}')::jsonb from generate_series(1,2000000); INSERT 0 2000000 abase=> insert into test_jsonb select replace(uuid_generate_v4()::text,'-',''),('{"a":'||random()*100||', "kxhbsl":"索尼是大法官"}')::jsonb from generate_series(1,10000); INSERT 0 10000 --3.第一種查詢:獲取包含'{"kxhbsl": "索尼是大法官"}',全表掃描 abase=# explain analyze select j_jsonb->>'kxhbsl',j_jsonb from test_jsonb where j_jsonb @> '{"kxhbsl": "索尼是大法官"}'; QUERY PLAN ------------------------------------------------------------------------------------------------------------- Gather (cost=1000.00..53379.78 rows=2010 width=134) (actual time=470.729..490.979 rows=10000 loops=1) Workers Planned: 2 Workers Launched: 2 -> Parallel Seq Scan on test_jsonb (cost=0.00..52175.85 rows=838 width=134) (actual time=465.234..480.57 3 rows=3333 loops=3) Filter: (j_jsonb @> '{"kxhbsl": "索尼是大法官"}'::jsonb) Rows Removed by Filter: 666667 Planning time: 0.318 ms Execution time: 506.204 ms (8 rows) --j_jsonb欄位建立gin索引後,可走索引 abase=# create index i_t_test_jsonb_j_jsonb on test_jsonb using gin(j_jsonb); CREATE INDEX abase=# explain analyze select j_jsonb->>'kxhbsl',* from test_jsonb where j_jsonb @> '{"kxhbsl": "索尼是大法官"}'; QUERY PLAN ------------------------------------------------------------------------------------------------------------- Bitmap Heap Scan on test_jsonb (cost=59.58..6664.09 rows=2010 width=167) (actual time=3.579..17.065 rows=10 000 loops=1) Recheck Cond: (j_jsonb @> '{"kxhbsl": "索尼是大法官"}'::jsonb) Heap Blocks: exact=481 -> Bitmap Index Scan on i_t_test_jsonb_j_jsonb (cost=0.00..59.08 rows=2010 width=0) (actual time=3.480.. 3.480 rows=10000 loops=1) Index Cond: (j_jsonb @> '{"kxhbsl": "索尼是大法官"}'::jsonb) Planning time: 0.429 ms Execution time: 17.964 ms (7 rows) --4.第二種查詢,獲取包含:'索尼是大法官',全表掃描 abase=# explain analyze select j_jsonb->>'kxhbsl',j_jsonb from test_jsonb where j_jsonb -> 'kxhbsl' ? '索尼是大法官'; QUERY PLAN ------------------------------------------------------------------------------------------------------------- Gather (cost=1000.00..55473.53 rows=2010 width=134) (actual time=1724.170..1769.543 rows=10000 loops=1) Workers Planned: 2 Workers Launched: 0 -> Parallel Seq Scan on test_jsonb (cost=0.00..54269.60 rows=838 width=134) (actual time=1723.752..1767. 187 rows=10000 loops=1) Filter: ((j_jsonb -> 'kxhbsl'::text) ? '索尼是大法官'::text) Rows Removed by Filter: 2000000 Planning time: 0.267 ms Execution time: 1770.422 ms (8 rows) --針對jsonb欄位的kxhbsl元素建立gin索引。 可走索引 abase=# create index i_t_test_jsonb_j_jsonb_kxhbsl on test_jsonb using gin((j_jsonb->'kxhbsl')); CREATE INDEX abase=# explain analyze select j_jsonb->>'kxhbsl',j_jsonb from test_jsonb where j_jsonb -> 'kxhbsl' ? '索尼是大法官'; QUERY PLAN ------------------------------------------------------------------------------------------------------------- Bitmap Heap Scan on test_jsonb (cost=39.58..6649.12 rows=2010 width=134) (actual time=2.166..13.999 rows=10 000 loops=1) Recheck Cond: ((j_jsonb -> 'kxhbsl'::text) ? '索尼是大法官'::text) Heap Blocks: exact=481 -> Bitmap Index Scan on i_t_test_jsonb_j_jsonb_kxhbsl (cost=0.00..39.08 rows=2010 width=0) (actual time= 2.045..2.045 rows=10000 loops=1) Index Cond: ((j_jsonb -> 'kxhbsl'::text) ? '索尼是大法官'::text) Planning time: 0.221 ms Execution time: 14.715 ms (7 rows) --或者等價寫法: abase=# explain analyze select j_jsonb->>'kxhbsl',j_jsonb from test_jsonb where j_jsonb -> 'kxhbsl' @>'"索尼是大法官"'; QUERY PLAN ------------------------------------------------------------------------------------------------------------- Bitmap Heap Scan on test_jsonb (cost=39.58..6649.12 rows=2010 width=134) (actual time=2.080..14.959 rows=10 000 loops=1) Recheck Cond: ((j_jsonb -> 'kxhbsl'::text) @> '"索尼是大法官"'::jsonb) Heap Blocks: exact=481 -> Bitmap Index Scan on i_t_test_jsonb_j_jsonb_kxhbsl (cost=0.00..39.08 rows=2010 width=0) (actual time= 1.980..1.980 rows=10000 loops=1) Index Cond: ((j_jsonb -> 'kxhbsl'::text) @> '"索尼是大法官"'::jsonb) Planning time: 0.199 ms Execution time: 15.635 ms (7 rows) --5.第三種查詢,獲取'{"kxhbsl": "索尼是大法官"}',全表掃描 abase=# explain analyze select * from test_jsonb where j_jsonb->>'kxhbsl' = '索尼是大法官'; QUERY PLAN ------------------------------------------------------------------------------------------------------------- Gather (cost=1000.00..56272.50 rows=10050 width=135) (actual time=458.676..476.454 rows=10000 loops=1) Workers Planned: 2 Workers Launched: 2 -> Parallel Seq Scan on test_jsonb (cost=0.00..54267.50 rows=4188 width=135) (actual time=453.472..466.5 44 rows=3333 loops=3) Filter: ((j_jsonb ->> 'kxhbsl'::text) = '索尼是大法官'::text) Rows Removed by Filter: 666667 Planning time: 0.821 ms Execution time: 492.763 ms (8 rows) --針對這類查詢,j_jsonb->>'kxhbsl'返回型別為text,那麼可以考慮建立一個btree索引,也可以走索引 abase=# create index i_test_jsonb_j_jsonb_btree on test_jsonb using btree((j_jsonb ->> 'kxhbsl') ); CREATE INDEX abase=# explain analyze select * from test_jsonb where j_jsonb->>'kxhbsl' = '索尼是大法官'; QUERY PLAN ------------------------------------------------------------------------------------------------------------- Bitmap Heap Scan on test_jsonb (cost=498.44..24049.15 rows=10050 width=135) (actual time=4.150..8.168 rows= 10000 loops=1) Recheck Cond: ((j_jsonb ->> 'kxhbsl'::text) = '索尼是大法官'::text) Heap Blocks: exact=481 -> Bitmap Index Scan on i_test_jsonb_j_jsonb_btree (cost=0.00..495.93 rows=10050 width=0) (actual time=4 .042..4.042 rows=10000 loops=1) Index Cond: ((j_jsonb ->> 'kxhbsl'::text) = '索尼是大法官'::text) Planning time: 0.684 ms Execution time: 8.991 ms (7 rows) --6.由於j_jsonb->>'kxhbsl'返回為text型別,所以可在其上面做許多操作,比如in,exists等 --檢視執行計劃,in查詢: abase=# explain analyze select * from test_jsonb where j_jsonb->>'kxhbsl' in ('索尼是大法官','3'); QUERY PLAN ------------------------------------------------------------------------------------------------------------- Bitmap Heap Scan on test_jsonb (cost=992.88..35800.76 rows=20100 width=135) (actual time=2.666..5.992 rows= 10000 loops=1) Recheck Cond: ((j_jsonb ->> 'kxhbsl'::text) = ANY ('{索尼是大法官,3}'::text[])) Heap Blocks: exact=481 -> Bitmap Index Scan on i_test_jsonb_j_jsonb_btree (cost=0.00..987.86 rows=20100 width=0) (actual time=2 .576..2.576 rows=10000 loops=1) Index Cond: ((j_jsonb ->> 'kxhbsl'::text) = ANY ('{索尼是大法官,3}'::text[])) Planning time: 0.360 ms Execution time: 6.856 ms (7 rows)
三種查詢都能得到相同的結果,可以看出第一種針對於jsonb欄位的gin索引,適用於jsonb欄位所有的元素,而第二種和第三種分別是對單個元素建立的gin和btree索引。
等值查詢方面可能單個元素的btree索引佔用空間小,且效率較高,如果單獨某個元素的查詢較為頻繁可選擇btree索引,而整個jsonb建立gin對所有元素有效。
第一種傳入的是一個json,而第二種,第三種傳入的是字串
jsonb元素值模糊匹配
--1.有時候需要對jsonb的元素值進行模糊匹配 --在前面只有j_jsonb gin索引情況下,like全模糊匹配不能走索引 abase=# explain analyze select * from test_jsonb where j_jsonb->>'kxhbsl' like '%大法官%'; QUERY PLAN ------------------------------------------------------------------------------------------------------------- Gather (cost=1000.00..55287.60 rows=201 width=135) (actual time=832.031..857.306 rows=10000 loops=1) Workers Planned: 2 Workers Launched: 2 -> Parallel Seq Scan on test_jsonb (cost=0.00..54267.50 rows=84 width=135) (actual time=826.065..844.494 rows=3333 loops=3) Filter: ((j_jsonb ->> 'kxhbsl'::text) ~~ '%大法官%'::text) Rows Removed by Filter: 666667 Planning time: 0.314 ms Execution time: 873.938 ms (8 rows) --由於(j_jsonb ->>'kxhbsl')返回的是text型別,所以考慮再其上面使用pg_trgm,建立gin索引。 abase=# create index i_test_jsonb_j_jsonb_gin on test_jsonb using gin((j_jsonb ->>'kxhbsl') gin_trgm_ops); CREATE INDEX --檢視執行計劃,模糊匹配可走索引。 abase=# explain analyze select * from test_jsonb where j_jsonb->>'kxhbsl' like '%大法官%'; QUERY PLAN ------------------------------------------------------------------------------------------------------------- Bitmap Heap Scan on test_jsonb (cost=17.56..782.71 rows=201 width=135) (actual time=3.781..16.256 rows=1000 0 loops=1) Recheck Cond: ((j_jsonb ->> 'kxhbsl'::text) ~~ '%大法官%'::text) Heap Blocks: exact=481 -> Bitmap Index Scan on i_test_jsonb_j_jsonb_gin (cost=0.00..17.51 rows=201 width=0) (actual time=3.649. .3.649 rows=10000 loops=1) Index Cond: ((j_jsonb ->> 'kxhbsl'::text) ~~ '%大法官%'::text) Planning time: 0.575 ms Execution time: 17.514 ms (7 rows) --2.當然還有一種方式就是將該jsonb欄位轉為text,然後再建立gin索引 --建立gin索引 abase=#create index i_jsonb_ops on test_jsonb using gin ((j_jsonb::text) gin_trgm_ops); CREATE INDEX --但是這樣的模糊匹配,可能匹配到其他元素中包含同樣的值,所以需要加上輔助條件:j_jsonb->>'kxhbsl' like '%索尼是大法官%',用來確保是該元素 abase=# explain analyze select * from test_jsonb where j_jsonb->>'kxhbsl' like '%大法官%' and j_jsonb ::text like '%大法官%'; QUERY PLAN ------------------------------------------------------------------------------------------------------------- Bitmap Heap Scan on test_jsonb (cost=1297.51..2064.17 rows=1 width=135) (actual time=5.318..28.149 rows=100 00 loops=1) Recheck Cond: ((j_jsonb)::text ~~ '%大法官%'::text) Filter: ((j_jsonb ->> 'kxhbsl'::text) ~~ '%大法官%'::text) Heap Blocks: exact=481 -> Bitmap Index Scan on i_jsonb_ops (cost=0.00..1297.51 rows=201 width=0) (actual time=5.198..5.198 rows =10000 loops=1) Index Cond: ((j_jsonb)::text ~~ '%大法官%'::text) Planning time: 0.479 ms Execution time: 29.147 ms (8 rows)
第二種方法效率相對於第一種要低一點,但是所有元素都可使用
結語
1.在json和jsonb選擇上,json更加適合用於儲存,jsonb更加適用於檢索。
2.可以對整個jsonb欄位建立gin索引,同時也可以對jsonb中某個元素建立gin索引,或者btree。btree效率最高。
3.(j_jsonb ->> 'kxhbsl')返回的是一個text型別,所以可以在該屬性上建立對應型別的索引,比如btree,gin索引。
4.對於元素值的模糊匹配可以建立單個元素的gin索引,也可以建立整個jsonb欄位的gin索引,前者效率較高,後者適用所有元素。