PostgreSQL中index only scan並不總是僅掃描索引

阿新 • • 發佈：2020-11-06

postgresql從9.2開始就引入了僅索引掃描(index only scans)。但不幸的是，並不是所有的index only scans都不會再訪問表。

postgres=# create table t1(a int,b int,c int);
CREATE TABLE
postgres=# insert into t1 select a.*,a.*,a.* from generate_series(1,1000000) a;
INSERT 0 1000000
postgres-# \d+ t1
                                    Table "public.t1"
 Column |  Type   | Collation | Nullable | Default | Storage | Stats target | Description 
--------+---------+-----------+----------+---------+---------+--------------+-------------
 a      | integer |           |          |         | plain   |              | 
 b      | integer |           |          |         | plain   |              | 
 c      | integer |           |          |         | plain   |              | 

postgres-#

執行下面這種沒有索引可用的查詢，需要讀取整個表獲取資料：

postgres=# explain (analyze,buffers,costs off) select a from t1 where b = 5;
                                QUERY PLAN                                 
---------------------------------------------------------------------------
 Gather (actual time=1.069..70.557 rows=1 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   Buffers: shared hit=5406
   ->  Parallel Seq Scan on t1 (actual time=11.805..34.050 rows=0 loops=3)
         Filter: (b = 5)
         Rows Removed by Filter: 333333
         Buffers: shared hit=5406
 Planning Time: 0.414 ms
 Execution Time: 70.612 ms
(10 rows)

postgres=#

這裡，postgresql決定使用並行順序掃描(parallel sequential scan)是對的。當然在沒有索引的情況下，還有另一個選擇是使用序列順序掃描(serial sequential scan)。通常，我們會在表上建立索引。

postgres=# create index i1 on t1(b);
CREATE INDEX
postgres=# \d t1
                 Table "public.t1"
 Column |  Type   | Collation | Nullable | Default
--------+---------+-----------+----------+---------
 a      | integer |           |          | 
 b      | integer |           |          | 
 c      | integer |           |          | 
Indexes:
    "i1" btree (b)

這樣就可以使用索引返回資料：

postgres=# explain (analyze,buffers,costs off) select a from t1 where b = 5;
                             QUERY PLAN                              
---------------------------------------------------------------------
 Index Scan using i1 on t1 (actual time=0.066..0.068 rows=1 loops=1)
   Index Cond: (b = 5)
   Buffers: shared hit=1 read=3
 Planning Time: 0.773 ms
 Execution Time: 0.128 ms
(5 rows)

postgres=#

從執行計劃就可以看到，使用了索引，但是postgresql仍然需要訪問表獲取列a的值。我們還可以建立一個索引，包含我們需要的所有列：

postgres=# create index i2 on t1(b,a);
CREATE INDEX
postgres=# \d+ t1
                                    Table "public.t1"
 Column |  Type   | Collation | Nullable | Default | Storage | Stats target | Description 
--------+---------+-----------+----------+---------+---------+--------------+-------------
 a      | integer |           |          |         | plain   |              | 
 b      | integer |           |          |         | plain   |              | 
 c      | integer |           |          |         | plain   |              | 
Indexes:
    "i1" btree (b)
    "i2" btree (b, a)

postgres=#

再來看看剛才的查詢語句的執行情況：

postgres=# explain (analyze,buffers,costs off) select a from t1 where b = 5;
                                QUERY PLAN                                
--------------------------------------------------------------------------
 Index Only Scan using i2 on t1 (actual time=0.346..0.353 rows=1 loops=1)
   Index Cond: (b = 5)
   Heap Fetches: 1
   Buffers: shared hit=1 read=3
 Planning Time: 0.402 ms
 Execution Time: 0.401 ms
(6 rows)

postgres=#

但是仍然有一個Heap Fetches：1

為什麼呢？為了回答這個問題，我們先看看t1表在磁碟上的檔案：

postgres=# select pg_relation_filepath('t1');
 pg_relation_filepath 
----------------------
 base/13878/74982
(1 row)

postgres=# \! ls -l /pg/11/data/base/13878/74982*
-rw------- 1 postgres postgres 44285952 Oct 31 15:12 /pg/11/data/base/13878/74982
-rw------- 1 postgres postgres    32768 Oct 31 15:08 /pg/11/data/base/13878/74982_fsm
postgres=#

這個表有個free space map檔案，但是還沒有visibility map檔案。沒有visibility map，postgresql就不知道是否所有的行對當前事務都是可見的，因此需要去訪問表獲取資料。當建立了visibility map之後：

postgres=# vacuum t1;
VACUUM
postgres=# \! ls -l /pg/11/data/base/13878/74982*
-rw------- 1 postgres postgres 44285952 Oct 31 15:12 /pg/11/data/base/13878/74982
-rw------- 1 postgres postgres    32768 Oct 31 15:08 /pg/11/data/base/13878/74982_fsm
-rw------- 1 postgres postgres     8192 Oct 31 15:39 /pg/11/data/base/13878/74982_vm
postgres=# explain (analyze,buffers,costs off) select a from t1 where b = 5;
                                QUERY PLAN                                
--------------------------------------------------------------------------
 Index Only Scan using i2 on t1 (actual time=0.044..0.045 rows=1 loops=1)
   Index Cond: (b = 5)
   Heap Fetches: 0
   Buffers: shared hit=4
 Planning Time: 0.230 ms
 Execution Time: 0.102 ms
(6 rows)

postgres=#

這裡，Heap Fetches：0

說明沒有從表獲取資料，真正做到了僅索引掃描(不過或掃描visiblity map)

為了描述更清楚點，來看看行的物理位置：

postgres=# select ctid,* from t1 where b=5;
 ctid  | a | b | c 
-------+---+---+---
 (0,5) | 5 | 5 | 5
(1 row)

postgres=#

可以看到，行位於block 0，且是第五行。我們來看看block中的行是否對所有事務都可見：

postgres=# create extension pg_visibility;
CREATE EXTENSION
postgres=# select pg_visibility_map('t1'::regclass, 0);
 pg_visibility_map 
-------------------
 (t,f)
(1 row)

postgres=#

t表示所有可見。如果，我們在另一個會話中更新一行會怎麼樣？

在session2中執行：

postgres=# update t1 set a=8 where b=5;
UPDATE 1
postgres=#

回來原來的會話，再次檢視：

postgres=# select pg_visibility_map('t1'::regclass, 0);
 pg_visibility_map 
-------------------
 (f,f)
(1 row)

postgres=#

這裡可以看到：

1.對頁的修改清除了visibility map

2.僅索引掃描需要回表

postgres=# explain (analyze,buffers,costs off) select a from t1 where b = 5;
                                QUERY PLAN                                
--------------------------------------------------------------------------
 Index Only Scan using i2 on t1 (actual time=0.080..0.082 rows=1 loops=1)
   Index Cond: (b = 5)
   Heap Fetches: 2
   Buffers: shared hit=6 dirtied=3
 Planning Time: 0.132 ms
 Execution Time: 0.120 ms
(6 rows)

postgres=#

現在的問題是：為什麼Heap Fetches：2

首先，postgresql中每個update都會建立一個新行：

postgres=# select ctid,* from t1 where b=5;
   ctid    | a | b | c 
-----------+---+---+---
 (5405,76) | 8 | 5 | 5
(1 row)

postgres=#

現在，這行資料在新的block中(即使是在同一個block中，也是在另一個地方)，這當然也會影響指向該行的索引條目。索引仍然指向該行的老版本，同時有一個指標指向行的當前版本，因此需要兩次Heap Fetches(當你更新的列不在索引中時，被稱作hot update)。

下一次執行，我們可以看到只有一次訪問表：

postgres=# explain (analyze,buffers,costs off) select a from t1 where b = 5;
                                QUERY PLAN                                
--------------------------------------------------------------------------
 Index Only Scan using i2 on t1 (actual time=0.039..0.042 rows=1 loops=1)
   Index Cond: (b = 5)
   Heap Fetches: 1
   Buffers: shared hit=5
 Planning Time: 0.112 ms
 Execution Time: 0.071 ms
(6 rows)

postgres=#

這裡，還不清楚為什麼變成了一次！！！

需要明白的是，index only scans並不總是僅掃描索引。

PostgreSQL中index only scan並不總是僅掃描索引

PostgreSQL中index only scan並不總是僅掃描索引

PostgreSQL執行計劃：Bitmap scan VS index only scan

postgresql中的ltree型別使用方法

PostgreSQL中使用陣列改進效能例項程式碼

PostgreSQL中Slony-I同步複製部署教程

詳解python中index()、find()方法

PostgreSQL中三種自增列sequence，serial，identity區別

PostgreSQL中的索引(七)--GIN

PostgreSQL中的索引(八)--RUM

PostgreSQL中的索引(十)--Bloom

postgresql中的諮詢鎖（advisory lock）

PostgreSQL中的索引(四) --Btree

PostgreSQL 中的shared buffer

PostgreSQL 中如何實現group_concat

MySQL 8.0 之索引跳躍掃描(Index Skip Scan)

PostgreSQL中的schema和user

淺談vue中index.html、main.js、App.vue、index.js之前的關係以及載入過程

PostgreSQL中enable、disable和validate外來鍵約束

PostgreSQL中對IN、EXISTS、ANY/ALL、JOIN的sql優化

PostgreSQL中的孤兒檔案(orphaned data files)

PostgreSQL中index only scan並不總是僅掃描索引

相關推薦