1. 程式人生 > >聚簇索引對數據插入的影響

聚簇索引對數據插入的影響

logs span visio 引導 systemd 刪除數據 left join 技術分享 records

聚簇索引對數據插入的影響

背景

開發人員反饋系統執行某存儲過程特別慢,經排查是由於存儲過程執行過程中需要向新建的任務表插入大量數據,該任務表的主鍵是聚簇索引造成的。聚簇索引為什麽會導致插入慢呢?聚簇索引會對數據插入造成多大影響呢?

原理

  • 在非聚簇索引中,物理數據的存儲順序與索引不同,索引的最低級別包含指向數據頁上的行的指針。

技術分享

  • 在聚簇索引中,物理數據的存儲順序與索引相同,索引的最低級別包含實際的數據頁。
    聚簇索引導致數據記錄必須按照鍵大小順序存儲,插入和刪除須進行移動數據記錄,導致額外的磁盤IO。

技術分享

測試

一、基本環境信息

  • 查看操作系統版本
    [[email protected]
/* */ home]# lsb_release -a LSB Version: :base-4.0-amd64:base-4.0-noarch:core-4.0-amd64:core-4.0-noarch:graphics-4.0-amd64:graphics-4.0-noarch:printing-4.0-amd64:printing-4.0-noarch Distributor ID: CentOS Description: CentOS release 6.4 (Final) Release: 6.4 Codename: Final

  • 查看磁盤信息
    [[email protected]
/* */ home]# cat /proc/scsi/scsi Attached devices: Host: scsi0 Channel: 02 Id: 00 Lun: 00 Vendor: IBM Model: ServeRAID M5110 Rev: 3.24 Type: Direct-Access ANSI SCSI revision: 05 Host: scsi1 Channel: 00 Id: 00 Lun: 00 Vendor: IBM SATA Model: DEVICE 81Y3674 Rev: IB01 Type: CD-ROM ANSI SCSI revision: 05

  • 查看磁盤讀寫速度
    [[email protected] home]# time dd if=/dev/zero of=/home/4kb.1GBFILE bs=4k count=262144
    262144+0 records in
    262144+0 records out
    1073741824 bytes (1.1 GB) copied, 1.58541 s, 677 MB/s
    real 0m1.589s
    user 0m0.050s
    sys 0m1.533s

  • 查看數據庫版本
    1> select @@version
    2> go
    --------------------------------------------------------------------------------------
    Adaptive Server Enterprise/15.7/EBF 21708 SMP SP110 /P/x86_64/Enterprise Linux/ase157sp11x/3546/64-bit/FBO/Fri Nov 8 05:39:38 2013
    (1 row affected)

二、數據準備

  • 建立聚簇索引表(sybase主鍵默認為聚簇索引)
    USE DB_TASK
    GO
    CREATE TABLE T_TASKITEM_CI (
    C_BH char(32) primary key,
    C_BH_TASK char(32) null,
    C_BH_AJ varchar(32) null,
    N_AJBS numeric(15,0) null,
    C_AJLB varchar(6) null,
    N_JBFY int null,
    N_ZT int null,
    C_AH varchar(75) null
    )
    go

  • 建立非聚簇索引表
    USE DB_TASK
    go
    CREATE TABLE T_TASKITEM_NCI (
    C_BH char(32) NOT NULL,
    C_BH_TASK char(32) null,
    C_BH_AJ varchar(32) null,
    N_AJBS numeric(15,0) null,
    C_AJLB varchar(6) null,
    N_JBFY int null,
    N_ZT int null,
    C_AH varchar(75) null
    )
    go
    CREATE UNIQUE INDEX PK_TASKITEM ON DB_TASK.dbo.T_TASKITEM_NCI (C_BH)
    go

  • 構造數據
    構造一張同構的數據表T_TASKITEM_CC,使用如下SQL向該張表構造50W左右數據。
    SELECT newid ()
    , a.C_BH
    , 1 AS N_ZT
    , a.N_AJBS
    , a.N_JBFY
    , 5813b6d7ce8847d68b34daa956776659 AS C_BH_TASK
    , (CASE WHEN (a.N_YWLX = 20100) THEN 0201 WHEN (a.N_YWLX = 20200) THEN 0202 WHEN (a.N_YWLX = 20304) THEN 0207 WHEN (a.N_YWLX = 20501) THEN 0210 WHEN (a.N_YWLX = 20801) THEN 0224 WHEN (a.N_YWLX = 20601) THEN 0214 WHEN (a.N_YWLX = 20603) THEN 0216 WHEN (a.N_YWLX = 20602) THEN 0215 END) AS C_AJLB
    , a.C_AH
    FROM YWST..T_XS_AJ a

數據量為 501132

三、插入對比

  • 非聚簇索引表
    1> insert into T_TASKITEM_NCI SELECT newid(),C_BH_TASK,C_BH_AJ,N_AJBS,C_AJLB,N_JBFY,N_ZT,C_AH FROM T_TASKITEM_CC
    2> GO
    Parse and Compile Time 0.
    Adaptive Server cpu time: 0 ms.
    Parse and Compile Time 0.
    Adaptive Server cpu time: 0 ms.
    Table: T_TASKITEM_NCI scan count 0, logical reads: (regular=2025588 apf=0 total=2025588), physical reads: (regular=0 apf=0 total=0), apf IOs used=0
    Table: T_TASKITEM_CC scan count 1, logical reads: (regular=10957 apf=27 total=10984), physical reads: (regular=0 apf=0 total=0), apf IOs used=0
    Total writes for this command: 3538
    Execution Time 97.
    Adaptive Server cpu time: 9688 ms. Adaptive Server elapsed time: 13381 ms.
    (501132 rows affected)

  • 聚簇索引表
        1> insert into T_TASKITEM_CI SELECT newid(),C_BH_TASK,C_BH_AJ,N_AJBS,C_AJLB,N_JBFY,N_ZT,C_AH FROM T_TASKITEM_CC
        2> GO
        Parse and Compile Time 0.
        Adaptive Server cpu time: 0 ms.
        Parse and Compile Time 0.
        Adaptive Server cpu time: 0 ms.
        Table: T_TASKITEM_CI scan count 0, logical reads: (regular=6422447 apf=0 total=6422447), physical reads: (regular=0 apf=0 total=0), apf IOs used=0
        Table: T_TASKITEM_CC scan count 1, logical reads: (regular=10957 apf=27 total=10984), physical reads: (regular=0 apf=0 total=0), apf IOs used=0
        Total writes for this command: 11945
        Execution Time 176.
        Adaptive Server cpu time: 17350 ms. Adaptive Server elapsed time: 28206 ms.
        (501132 rows affected)

類別聚簇索引非聚簇索引
寫入 11945 3538
讀入 6422447 2025588
執行時間 28206 ms 13381 ms

結論:插入同樣的數據量,非聚簇索引表比聚簇索引表時間上快一倍,IO減小2/3。

四、刪除對比

  • 構造刪除數據

按照索引字段C_BH排序,獲取物理位置為於100行、200行…5000行的C_BH,將要刪除的編號分別存儲在T_DELETE_CI_BH 和 T_DELETE_NCI_BH表中。

    select C_BH,N_ORDER = identity(10) INTO T_ALL_CI_BH FROM T_TASKITEM_CI ORDER BY C_BH asc
    SELECT C_BH,N_ORDER INTO T_DELETE_CI_BH FROM T_ALL_CI_BH WHERE N_ORDER%100 = 0
    select C_BH,N_ORDER = identity(10) INTO T_ALL_NCI_BH FROM T_TASKITEM_NCI ORDER BY C_BH asc
    SELECT C_BH,N_ORDER INTO T_DELETE_NCI_BH FROM T_ALL_NCI_BH WHERE N_ORDER%100 = 0

  • 聚簇索引表執行刪除
    1> DELETE FROM T_TASKITEM_CI where C_BH IN (SELECT C_BH FROM T_DELETE_CI_BH)
    2> go
    Parse and Compile Time 0.
    Adaptive Server cpu time: 0 ms.
    Parse and Compile Time 0.
    Adaptive Server cpu time: 0 ms.
    Parse and Compile Time 0.
    Adaptive Server cpu time: 0 ms.
    Table: T_TASKITEM_CI scan count 0, logical reads: (regular=20004 apf=0 total=20004), physical reads: (regular=0 apf=0 total=0), apf IOs used=0
    Table: T_DELETE_CI_BH scan count 1, logical reads: (regular=31 apf=0 total=31), physical reads: (regular=0 apf=0 total=0), apf IOs used=0
    Table: T_TASKITEM_CI scan count 5001, logical reads: (regular=15070 apf=0 total=15070), physical reads: (regular=0 apf=0 total=0), apf IOs used=0
    Total writes for this command: 241
    Execution Time 1.
    Adaptive Server cpu time: 128 ms. Adaptive Server elapsed time: 379 ms.
    (5001 rows affected)

  • 非聚簇索引表執行刪除
    1> DELETE FROM T_TASKITEM_NCI where C_BH IN (SELECT C_BH FROM T_DELETE_NCI_BH)
    2> go
    Parse and Compile Time 0.
    Adaptive Server cpu time: 0 ms.
    Parse and Compile Time 0.
    Adaptive Server cpu time: 0 ms.
    Parse and Compile Time 0.
    Adaptive Server cpu time: 0 ms.
    Table: T_TASKITEM_NCI scan count 0, logical reads: (regular=20004 apf=0 total=20004), physical reads: (regular=0 apf=0 total=0), apf IOs used=0
    Table: T_DELETE_NCI_BH scan count 1, logical reads: (regular=31 apf=0 total=31), physical reads: (regular=0 apf=0 total=0), apf IOs used=0
    Table: T_TASKITEM_NCI scan count 5001, logical reads: (regular=15070 apf=0 total=15070), physical reads: (regular=0 apf=0 total=0), apf IOs used=0
    Total writes for this command: 242
    Execution Time 1.
    Adaptive Server cpu time: 128 ms. Adaptive Server elapsed time: 403 ms.
    (5001 rows affected)

結論:按照索引字段刪除,聚簇索引和非聚簇索引IO和效率一樣。

排查聚簇索引

聚簇索引表插入無序主鍵(GUID/UUID)數據時會造成額外的磁盤IO和時間消耗,采用無序主鍵(GUID/UUID)的項目設計上是禁止使用聚簇索引,那麽如何排查項目中非法使用聚簇索引的表呢?使用dba團隊出品的sp_dba_citable存儲過程檢索
核心代碼:

    use sybsystemprocs
    GO
    if object_id(sp_dba_citable) is not null
    drop procedure sp_dba_citable
    GO
    create procedure sp_dba_citable
    AS
    --查看聚簇索引表
    --add by wangzhen 2017-07-17
    begin
    declare @temp_sql varchar(500)
    declare @sql varchar(1000)
    declare @dbname varchar(100)
    declare dbname_cursor cursor for select name from master..sysdatabases
    create table #objectinfo (
    dbname varchar(100),
    objid int,
    tablename varchar(300),
    indexid int,
    indexname varchar(300),
    keycnt int,
    indextype varchar(100)
    )
    set @temp_sql = insert into #objectinfo 
        + select ‘‘@dbname#‘‘ , 
        + ‘  obj.id , 
        + ‘  obj.name , 
        + ‘  ind.indid , 
        + ‘  ind.name , 
        + ‘  ind.keycnt , 
        + ‘  ‘‘culster index‘‘ 
        + from @dbname#..sysindexes ind left join @dbname#..sysobjects obj on ind.id = obj.id 
        + where (ind.status2 & 512 = 512 or ind.indid = 1) and obj.type = ‘‘U‘‘ 
    open dbname_cursor
    while @@sqlstatus =0
    BEGIN
      FETCH dbname_cursor into @dbname
      set @sql = str_replace(@temp_sql,@dbname#,@dbname)
      EXECUTE(@sql)
    END
    close dbname_cursor
    select
        t.dbname as "庫名",
        t.objid as "對象ID",
        t.tablename as "表名",
        t.indexname as "索引名"
    from #objectinfo t where t.dbname not in (master,tempdb,sybsecurity,sybsystemdb,sybsystemprocs) group by t.dbname,t.objid,t.tablename,t.indexname,t.keycnt,t.indextype order by t.dbname asc,t.tablename asc
    end
    go

總結

在聚簇索引中,物理數據的存儲順序與索引相同,索引的最低級別包含實際的數據頁,在無序字段上(GUID/UUID)上使用聚簇索引插入大量數據會比非聚簇索引時間慢一倍,IO高三倍。其實,NP在設計之初已經規定業務表中不能定義物理主鍵(含聚簇索引),而應該定義邏輯主鍵(唯一約束+索引+不為空)。對於采用無序主鍵(GUID/UUID)的項目可以使用sp_dba_citable排查聚簇索引表!

聚簇索引對數據插入的影響