淺談PostgreSQL的索引

阿新 • • 發佈：2017-09-15

all b- command 唯一索引 rom hang ocs erb 5.6

1. 索引的特性

1.1 加快條件的檢索的特性

當表數據量越來越大時查詢速度會下降，在表的條件字段上使用索引，快速定位到可能滿足條件的記錄，不需要遍歷所有記錄。

create table t(id int, info text);
insert into t select generate_series(1,10000),‘lottu‘||generate_series(1,10000);
create table t1 as select * from t;
create table t2 as select * from t;
create index ind_t2_id on t2(id);

lottu= 
# analyze t1;
ANALYZE
lottu=# analyze t2;
ANALYZE
# 沒有索引
lottu=# explain (analyze,buffers,verbose) select * from t1 where id < 10;
                                             QUERY PLAN                                              
----------------------------------------------------------------------------------------------------- 

 Seq Scan on lottu.t1  (cost=0.00..180.00 rows=9 width=13) (actual time=0.073..5.650 rows=9 loops=1)
   Output: id, info
   Filter: (t1.id < 10)
   Rows Removed by Filter: 9991
   Buffers: shared hit=55
 Planning time: 25.904 ms
 Execution time: 5.741 ms
(7 rows)
# 有索引
lottu=# explain (analyze,verbose,buffers) select 
 * from t2 where id < 10;
                                                     QUERY PLAN                                                      
---------------------------------------------------------------------------------------------------------------------
 Index Scan using ind_t2_id on lottu.t2  (cost=0.29..8.44 rows=9 width=13) (actual time=0.008..0.014 rows=9 loops=1)
   Output: id, info
   Index Cond: (t2.id < 10)
   Buffers: shared hit=3
 Planning time: 0.400 ms
 Execution time: 0.052 ms
(6 rows)

#在這個案例中：執行同一條SQL。t2有索引的執行數據是0.052 ms；t1沒有索引的是：5.741 ms;

1.2 有序的特性

索引本身就是有序的。

#沒有索引
lottu=# explain (analyze,verbose,buffers) select * from t1 where id > 2 order by id;
                                                   QUERY PLAN                                                    
-----------------------------------------------------------------------------------------------------------------
Sort  (cost=844.31..869.31 rows=9999 width=13) (actual time=8.737..11.995 rows=9998 loops=1)
   Output: id, info
   Sort Key: t1.id
   Sort Method: quicksort  Memory: 853kB
   Buffers: shared hit=55
   ->  Seq Scan on lottu.t1  (cost=0.00..180.00 rows=9999 width=13) (actual time=0.038..5.133 rows=9998 loops=1)
         Output: id, info
         Filter: (t1.id > 2)
         Rows Removed by Filter: 2
         Buffers: shared hit=55
 Planning time: 0.116 ms
 Execution time: 15.205 ms
(12 rows)
 #有索引
lottu=# explain (analyze,verbose,buffers) select * from t2 where id > 2 order by id;
                                                         QUERY PLAN                                                          
-----------------------------------------------------------------------------------------------------------------------------
 Index Scan using ind_t2_id on lottu.t2  (cost=0.29..353.27 rows=9999 width=13) (actual time=0.030..5.304 rows=9998 loops=1)
   Output: id, info
   Index Cond: (t2.id > 2)
   Buffers: shared hit=84
 Planning time: 0.295 ms
 Execution time: 7.027 ms
(6 rows)

#在這個案例中：執行同一條SQL。

t2有索引的執行數據是7.027 ms；t1沒有索引的是：15.205 ms;
t1沒有索引執行還占用了 Memory: 853kB。

2. 索引掃描方式

索引的掃描方式有3種

2.1 Indexscan

先查索引找到匹配記錄的ctid，再通過ctid查堆表

2.2 bitmapscan

先查索引找到匹配記錄的ctid集合，把ctid通過bitmap做集合運算和排序後再查堆表

2.3 Indexonlyscan

如果索引字段中包含了所有返回字段，對可見性映射 (vm)中全為可見的數據塊，不查堆表直接返回索引中的值。

這裏談談Indexscan掃描方式和Indexonlyscan掃描方式
對這兩種掃描方式區別；借用oracle中索引掃描方式來講；Indexscan掃描方式會產生回表讀。根據上面解釋來說；Indexscan掃描方式：查完索引之後還需要查表。 Indexonlyscan掃描方式只需要查索引。也就是說：Indexonlyscan掃描方式要優於Indexscan掃描方式？我們來看看

現有表t；在字段id上面建來ind_t_id索引
1. t表沒有VM文件。
lottu=# \d+ t
                           Table "lottu.t"
 Column |  Type   | Modifiers | Storage  | Stats target | Description 
--------+---------+-----------+----------+--------------+-------------
 id     | integer |           | plain    |              | 
 info   | text    |           | extended |              | 
Indexes:
    "ind_t_id" btree (id)

lottu=# explain (analyze,buffers,verbose) select id from t where id < 10;
                                                      QUERY PLAN                                                       
-----------------------------------------------------------------------------------------------------------------------
 Index Only Scan using ind_t_id on lottu.t  (cost=0.29..8.44 rows=9 width=4) (actual time=0.009..0.015 rows=9 loops=1)
   Output: id
   Index Cond: (t.id < 10)
   Heap Fetches: 9
   Buffers: shared hit=3
 Planning time: 0.177 ms
 Execution time: 0.050 ms
(7 rows)
#人為更改執行計劃
lottu=# set enable_indexonlyscan = off;
SET
lottu=# explain (analyze,buffers,verbose) select id from t where id < 10;
                                                    QUERY PLAN                                                    
------------------------------------------------------------------------------------------------------------------
 Index Scan using ind_t_id on lottu.t  (cost=0.29..8.44 rows=9 width=4) (actual time=0.008..0.014 rows=9 loops=1)
   Output: id
   Index Cond: (t.id < 10)
   Buffers: shared hit=3
 Planning time: 0.188 ms
 Execution time: 0.050 ms
(6 rows)
# 可以發現兩者幾乎沒有差異；唯一不同的是Indexonlyscan掃描方式存在掃描的Heap Fetches時間。 這個時間是不在Execution time裏面的。
2. t表有VM文件
lottu=# delete from t where id >200 and id < 500;
DELETE 299
lottu=# vacuum t;
VACUUM
lottu=# analyze t;
ANALYZE
lottu=# explain (analyze,buffers,verbose) select id from t where id < 10;
                                                      QUERY PLAN                                                       
-----------------------------------------------------------------------------------------------------------------------
 Index Only Scan using ind_t_id on lottu.t  (cost=0.29..4.44 rows=9 width=4) (actual time=0.008..0.012 rows=9 loops=1)
   Output: id
   Index Cond: (t.id < 10)
   Heap Fetches: 0
   Buffers: shared hit=3
 Planning time: 0.174 ms
 Execution time: 0.048 ms
(7 rows)

lottu=# set enable_indexonlyscan = off;
SET
lottu=# explain (analyze,buffers,verbose) select id from t where id < 10;
                                                    QUERY PLAN                                                    
------------------------------------------------------------------------------------------------------------------
 Index Scan using ind_t_id on lottu.t  (cost=0.29..8.44 rows=9 width=4) (actual time=0.012..0.022 rows=9 loops=1)
   Output: id
   Index Cond: (t.id < 10)
   Buffers: shared hit=3
 Planning time: 0.179 ms
 Execution time: 0.077 ms
(6 rows)

總結：

Index Only Scan在沒有VM文件的情況下, 速度比Index Scan要慢, 因為要掃描所有的Heap page。差異幾乎不大。
Index Only Scan存在VM文件的情況下，是要比Index Scan要快。

知識點1：

VM文件：稱為可見性映射文件；該文件存在表示：該數據塊沒有需要清理的行。即已經做了vaccum操作。

知識點2：

人為選擇執行計劃。可設置enable_xxx參數有

enable_bitmapscan
enable_hashagg
enable_hashjoin
enable_indexonlyscan
enable_indexscan
enable_material
enable_mergejoin
enable_nestloop
enable_seqscan
enable_sort
enable_tidscan

參考文獻

參考德哥：《PostgreSQL 性能優化培訓 3 DAY.pdf》
https://www.postgresql.org/docs/9.6/static/runtime-config-query.html

3. 索引的類型

PostgreSQL 支持索引類型有: B-tree, Hash, GiST, SP-GiST, GIN and BRIN。

postgresql----Btree索引:http://www.cnblogs.com/alianbog/p/5621749.html
postgresql----hash索引：一般只用於簡單等值查詢。不常用。
postgresql----Gist索引:http://www.cnblogs.com/alianbog/p/5628543.html

4. 索引的管理

4.1 創建索引

創建索引語法：

lottu=# \h create index
Command:     CREATE INDEX
Description: define a new index
Syntax:
CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ [ IF NOT EXISTS ] name ] ON table_name [ USING method ]
    ( { column_name | ( expression ) } [ COLLATE collation ] [ opclass ] [ ASC | DESC ] [ NULLS { FIRST | LAST } ] [, ...] )
    [ WITH ( storage_parameter = value [, ... ] ) ]
    [ TABLESPACE tablespace_name ]
    [ WHERE predicate ]
接下來我們以t表為例。    
1. 關鍵字【UNIQUE】
#創建唯一索引；主鍵就是一種唯一索引
CREATE UNIQUE INDEX ind_t_id_1 on t (id);
2. 關鍵字【CONCURRENTLY】
# 這是並發創建索引。跟oracle的online創建索引作用是一樣的。創建索引過程中；不會阻塞表更新，插入，刪除操作。當然創建的時間就會很漫長。
CREATE INDEX CONCURRENTLY ind_t_id_2 on t (id);
3. 關鍵字【IF NOT EXISTS】
#用該命令是用於確認索引名是否存在。若存在；也不會報錯。
CREATE INDEX IF NOT EXISTS ind_t_id_3 on t (id);
4. 關鍵字【USING】
# 創建哪種類型的索引。 默認是B-tree。
CREATE INDEX ind_t_id_4 on t using btree (id);
5 關鍵字【[ ASC | DESC ] [ NULLS { FIRST | LAST]】
# 創建索引是采用降序還是升序。 若字段存在null值，是把null值放在前面還是最後：例如采用降序，null放在前面。
CREATE INDEX ind_t_id_5 on t (id desc nulls first)
6. 關鍵字【WITH ( storage_parameter = value)】
#索引的填充因子設為。例如創建索引的填充因子設為75
CREATE INDEX ind_t_id_6 on t (id) with (fillfactor = 75);
7. 關鍵字【TABLESPACE】
#是把索引創建在哪個表空間。
CREATE INDEX ind_t_id_7 on t (id) TABLESPACE tsp_lottu;
8. 關鍵字【WHERE】
#只在自己感興趣的那部分數據上創建索引，而不是對每一行數據都創建索引，此種方式創建索引就需要使用WHERE條件了。
CREATE INDEX ind_t_id_8 on t (id) WHERE id < 1000;

4.2 修改索引

修改索引語法

lottu=# \h alter index
Command:     ALTER INDEX
Description: change the definition of an index
Syntax:
#把索引重新命名
ALTER INDEX [ IF EXISTS ] name RENAME TO new_name
#把索引遷移表空間
ALTER INDEX [ IF EXISTS ] name SET TABLESPACE tablespace_name
#把索引重設置填充因子
ALTER INDEX [ IF EXISTS ] name SET ( storage_parameter = value [, ... ] )
#把索引的填充因子設置為默認值
ALTER INDEX [ IF EXISTS ] name RESET ( storage_parameter [, ... ] )
#把表空間TSP1中索引遷移到新表空間
ALTER INDEX ALL IN TABLESPACE name [ OWNED BY role_name [, ... ] ]
    SET TABLESPACE new_tablespace [ NOWAIT ]

4.3 刪除索引

刪除索引語法

lottu=# \h drop index
Command:     DROP INDEX
Description: remove an index
Syntax:
DROP INDEX [ CONCURRENTLY ] [ IF EXISTS ] name [, ...] [ CASCADE | RESTRICT ]

5. 索引的維護

索引能帶來加快對表中記錄的查詢，排序，以及唯一約束的作用。索引也是有代價

索引需要增加數據庫的存儲空間。
在表記錄執行插入，更新，刪除操作。索引也要更新。

5.1 查看索引的大小

select pg_size_pretty(pg_relation_size(‘ind_t_id‘));

5.2 索引的利用率

--通過pg_stat_user_indexes.idx_scan可檢查利用索引進行掃描的次數；這樣可以確認那些索引可以清理掉。
select idx_scan from pg_stat_user_indexes where indexrelname = ‘ind_t_id‘;

5.3 索引的重建

--如果一個表經過頻繁更新之後，索引性能不好；需要重建索引。
lottu=# select pg_size_pretty(pg_relation_size(‘ind_t_id_1‘)); 
 pg_size_pretty 
----------------
 2200 kB
(1 row)

lottu=# delete from t where id > 1000;
DELETE 99000

lottu=# analyze t;
ANALYZE
lottu=# select pg_size_pretty(pg_relation_size(‘ind_t_id_1‘)); 
 pg_size_pretty 
----------------
 2200 kB
 
lottu=# insert into t select generate_series(2000,100000),‘lottu‘;
INSERT 0 98001

lottu=# select pg_size_pretty(pg_relation_size(‘ind_t_id_1‘)); 
 pg_size_pretty 
----------------
 4336 kB
(1 row)

lottu=# vacuum full t;
VACUUM

lottu=# select pg_size_pretty(pg_relation_size(‘ind_t_id_1‘)); 
 pg_size_pretty 
----------------
 2176 kB
 
重建方法： 
1. reindex：reindex不支持並行重建【CONCURRENTLY】;索引會鎖表；會進行阻塞。
2. vacuum full; 對表進行重構；索引也會重建；同樣也會鎖表。
3. 創建一個新索引(索引名不同)；再刪除舊索引。

淺談PostgreSQL的索引

all b- command 唯一索引 rom hang ocs erb 5.6 1. 索引的特性 1.1 加快條件的檢索的特性當表數據量越來越大時查詢速度會下降，在表的條件字段上使用索引，快速定位到可能滿足條件的記錄，不需要遍歷所有記錄。 create table t

淺談PostgreSQL的索引

1. 索引的特性

1.1 加快條件的檢索的特性

1.2 有序的特性

2. 索引掃描方式

2.1 Indexscan

2.2 bitmapscan

2.3 Indexonlyscan

3. 索引的類型

4. 索引的管理

4.1 創建索引

4.2 修改索引

4.3 刪除索引

5. 索引的維護

5.1 查看索引的大小

5.2 索引的利用率

5.3 索引的重建

淺談PostgreSQL的索引

淺談資料庫索引的使用規則

淺談MySQL索引背後的資料結構及演算法

淺談PostgreSQL使用者許可權

淺談sql索引

淺談SQL優化入門：3、利用索引

淺談MySQL的B樹索引與索引優化

淺談索引系列之本地索引與全域性索引

從MySQL Bug#67718淺談B+樹索引的分裂優化（轉）

淺談sql server聚集索引與非聚集索引

淺談索引對資料庫效能的影響

淺談B樹，B+樹，B*樹及分析MySQL的索引

淺談android中手機聯絡人字母索引表的實現

淺談mysql的鎖和索引之間莫大的聯絡

淺談數據庫索引的結構設計與優化

淺談聚簇索引和非聚簇索引的區別

淺談計算機領域及職業憧憬

Java學習筆記——淺談數據結構與Java集合框架（第一篇、List）

淺談HTTP請求與響應

淺談C#解析網頁

淺談PostgreSQL的索引

1. 索引的特性

1.1 加快條件的檢索的特性

1.2 有序的特性

2. 索引掃描方式

2.1 Indexscan

2.2 bitmapscan

2.3 Indexonlyscan

3. 索引的類型

4. 索引的管理

4.1 創建索引

4.2 修改索引

4.3 刪除索引

5. 索引的維護

5.1 查看索引的大小

5.2 索引的利用率

5.3 索引的重建

相關推薦