重建索引要多久_讓PostgreSQL建立索引速度更快一點

阿新 • • 發佈：2021-01-05

技術標籤：重建索引要多久

前言

最近有套XXX系統的資料庫要割接，我們團隊已經模擬割接幾遍了，各種時間算下來基本上要4小時。

突然QQ群滴滴聲一響，我一看是XXX系統的開發妹子頭像在跳動，我是真滴怕了她啊。

她嬌滴滴的說道：“DBA小哥哥，我們的割接計劃定下來了，割接時間2小時，能搞定不？”

我們幾個一盤算：“我去，這不是不可能完成的任務嗎？”

然後一番討價還價，各砍50大板，定到3個小時。

為了提高速度，我們首先對錶進行了分類。一部分表的資料是靜態的，比如那些歷史月表，因為資料不會變化，這些表可以提前遷移過去。剩下的動態資料表就只能當晚停了庫遷。

備端的表事先都建立好了，再遷移資料之前，我們會把表上的索引刪掉，等遷完之後再重建。而有的表資料量很大，重建索引的速度會很慢。我們今天就要從這塊開刀，壓縮一下索引重建的時間。

並行

首先我們應該想到的就是利用並行來增加速度。當前我們的版本是PostgreSQL 12。預設建立索引會根據引數max_parallel_maintenance_workers來決定和並行度。這個引數的含義是控制可用於建立btree索引的輔助程序的最大數量。該程序預設值為2，當前在我們的伺服器上設定是4.

e2e=> show max_parallel_maintenance_workers;
 max_parallel_maintenance_workers 
----------------------------------
 4
(1 row)

e2e=> select pg_size_pretty(pg_relation_size('e2e_busi_accept'));
 pg_size_pretty 
----------------
 70 GB
(1 row)

e2e=> create index idx1 on e2e_busi_accept(cust_id,staff_id,create_date);
CREATE INDEX
Time: 217000.335 ms (03:37.000)

可以從另外一個程序觀察pg_stat_activity檢視看到開啟了4個程序並行建立

postgres=# select query from pg_stat_activity WHERE backend_type = 'parallel worker';
                                query                                
---------------------------------------------------------------------
 create index idx1 on e2e_busi_accept(cust_id,staff_id,create_date);
 create index idx1 on e2e_busi_accept(cust_id,staff_id,create_date);
 create index idx1 on e2e_busi_accept(cust_id,staff_id,create_date);
 create index idx1 on e2e_busi_accept(cust_id,staff_id,create_date);
(4 rows)

可以看到在4個並行程序下建立時間是3分37秒。這速度還行，我們把並行程序在加一下試試，將引數改到64在測試。

e2e=> SET max_parallel_maintenance_workers TO 64;
SET
Time: 1.102 ms
e2e=> create index idx1 on e2e_busi_accept(cust_id,staff_id,create_date);

CREATE INDEX
Time: 156543.775 ms (02:36.544)

可見此次速度提示到了2分36秒，雖然引數max_parallel_maintenance_workers開了64個，確沒有達到64。只使用了9個。

postgres=# select query from pg_stat_activity WHERE backend_type = 'parallel worker';
                                query                                
---------------------------------------------------------------------
 create index idx1 on e2e_busi_accept(cust_id,staff_id,create_date);
 create index idx1 on e2e_busi_accept(cust_id,staff_id,create_date);
 create index idx1 on e2e_busi_accept(cust_id,staff_id,create_date);
 create index idx1 on e2e_busi_accept(cust_id,staff_id,create_date);
 create index idx1 on e2e_busi_accept(cust_id,staff_id,create_date);
 create index idx1 on e2e_busi_accept(cust_id,staff_id,create_date);
 create index idx1 on e2e_busi_accept(cust_id,staff_id,create_date);
 create index idx1 on e2e_busi_accept(cust_id,staff_id,create_date);
 create index idx1 on e2e_busi_accept(cust_id,staff_id,create_date);
(9 rows)

這有點奇怪，通過研究發現不僅僅系統引數max_parallel_maintenance_workers要設定，還需要設定表上的並行度。通過設定表上的並行度也為64，就可以徹底的開啟64個並行。

e2e=> alter table e2e_busi_accept set (parallel_workers=64);
ALTER TABLE
Time: 4.798 ms
e2e=> create index idx1 on e2e_busi_accept(cust_id,staff_id,create_date);
CREATE INDEX
Time: 144274.618 ms (02:24.275)

在另外一個視窗檢視，只能開出47個並行程序。這主要是受限於max_parallel_workers引數的限制。

postgres=# select count(1) from pg_stat_activity WHERE backend_type = 'parallel worker';     
 count 
-------
    47
(1 row)
postgres=# show max_parallel_workers;
 max_parallel_workers 
----------------------
 48
(1 row)

我們從8個調整到48個程序，而建立時間只從2分36秒提升到2分24秒，縮減了12秒，上升空間不是特別大。並行這條路暫時無法得到線性的提升。

修改引數

預設情況下，建立索引需要排序，而排序我們更希望它能夠多在記憶體中執行。所以我們需要調整引數maintenance_work_mem。它指定每次索引構建操作整體可用的最大記憶體量。我們當前資料庫設定的是2GB，我們把它先設定成4GB。

postgres=# show maintenance_work_mem;
 maintenance_work_mem 
----------------------
 2GB
(1 row)

e2e=> SET maintenance_work_mem TO '4 GB';
SET

e2e=> create index idx1 on e2e_busi_accept(cust_id,staff_id,create_date);
CREATE INDEX
Time: 134956.833 ms (02:14.957)

設定成4GB又能提升10s。提升也是比較小。這是因為我們伺服器的maintenance_work_mem引數設定的本身不小。如果你是預設值，那提升到4GB的影響還是很顯著的。

縮寫鍵

最後說到的一個演算法叫縮寫鍵abbreviated keys，這個演算法加快了字元/字串列（文字和varchar（n））的排序和索引建立速度。

首先我們需要將資料在外部進行預先排序，匯入之後建立索引，這樣索引的建立速度非常快。但是這對我們遷移的要求就高起來了。

以下用個小案例進行演示。

e2e=> set max_parallel_workers_per_gather=48;
SET

e2e=> select cust_id,staff_id,create_date into indexing_e2e_busi_accept_sorted from e2e_busi_accept order by cust_id,staff_id,create_date;
SELECT 73101417
Time: 160949.947 ms (02:40.950)

首先我們造一個表，是預排序插入資料的。

hbe2e=> create index idx1 on indexing_e2e_busi_accept_sorted(cust_id,staff_id,create_date);
CREATE INDEX
Time: 74960.184 ms (01:14.960)

接下類在預排序的表上建立索引可以發現，速度是1分14秒。

如果在C語言環境和排序規則下，據稱速度還會得到提升。

create database testdb lc_collate "C" lc_ctype "C" template template0;

通過pg_dump匯入indexing_e2e_busi_accept_sorted的資料

testdb=# select cust_id,staff_id,create_date into indexing_e2e_busi_accept_sorted2 from indexing_e2e_busi_accept_sorted order by cust_id,staff_id,create_date;
SELECT 73101417
Time: 116103.450 ms (01:56.103)

testdb=# create index idx1 on  indexing_e2e_busi_accept_sorted2(cust_id,staff_id,create_date);
CREATE INDEX
Time: 64462.081 ms (01:02.462)

wiki上表示能提升20倍的速度，但是通過測試速度大概只能提升15-17%%，

需要注意的是，使用非C排序規則可能會產生Bug，出現缺少索引掃描的行。具體Bug參考： https:// wiki.postgresql.org/wik i/Abbreviated_keys_glibc_issue

總結

我們在建立索引的過程中，可以通過並行和調整記憶體引數來加快建立速度。同時我們還可以使用預排序技術。

最後在割接的過程中我們通過並行+引數的方式，將建立索引的速度大幅提升。滿足了開發妹子的要求。

重建索引要多久_讓PostgreSQL建立索引速度更快一點

前言

並行

修改引數

縮寫鍵

總結

重建索引要多久_讓PostgreSQL建立索引速度更快一點

win10系統開啟Edge瀏覽器隱藏的“診斷”設定項，讓速度更快

webpack學習---優化-- babel快取(讓第二次打包構建速度更快)

讓你的筆記本更快一點——我的筆記本的效能測試和虛擬硬碟（把記憶體當成硬碟）的使用感覺

輪播圖js首先獲取要使用的元素，建立index是圖片的索引，num是小圓點的索引，先實現自動滾因為預設向右滾所以向右滾和自動滾一樣，通過建立計時器將封裝好的自動滾函式匯入，向左滾則與之相反讓index

TF-IDF計算相似度為什麼要對稀疏向量建立索引？

postgresql/lightdb並行建立索引（parallel）

頂級程式設計師要多久才能獨自寫完Win10程式碼？

win10鎖屏快捷鍵如何設定_讓你win10電腦一鍵秒鎖屏的方法

建立索引不會級聯收集表的統計資訊？

elk蒐集日誌，實現logstash根據message中結構不同動態建立索引並擴充套件功能，區分message中json和非json資料簡單方式

mysql 資料庫執行建立索引語句異常 Specified key was too long; max key length is 767 bytes

mongo 停止建立索引 --noIndexBuildRetry

.net core 如何向elasticsearch中建立索引，插入資料。

RT-Thread程式碼_執行緒建立

PostgreSQL 建立資料庫

PostgreSQL 建立表格

Django如何在不停機的情況下建立索引

小知識：後臺執行Oracle建立索引免受會話中斷影響

（轉）Mysql哪些欄位適合建立索引

重建索引要多久_讓PostgreSQL建立索引速度更快一點

前言

並行

修改引數

縮寫鍵

總結

相關推薦