1. 程式人生 > >SQL Server中LIKE %search_string% 走索引查詢(Index Seek)淺析

SQL Server中LIKE %search_string% 走索引查詢(Index Seek)淺析

在SQL Server的SQL優化過程中,如果遇到WHERE條件中包含LIKE '%search_string%'是一件非常頭痛的事情。這種情況下,一般要修改業務邏輯或改寫SQL才能解決SQL執行計劃走索引掃描或全表掃描的問題。最近在優化SQL語句的時候,遇到了一個很有意思的問題。某些使用LIKE '%' + @search_string + '%'(或者 LIKE @search_string)這樣寫法的SQL語句的執行計劃居然走索引查詢(Index Seek)。下面這篇文章來分析一下這個奇怪的現象。

 

首先,我們來看看WHERE查詢條件中使用LIKE的幾種情況,這些是我們對LIKE的一些常規認識:

 

1: LIKE 'condition%'

   

    執行計劃會走索引查詢(Index Seek or Clustered Index Seek)。

   

2:  LIKE '%condition'

 

    執行計劃會走索引掃描(Index Scan or Clustered Index Scan)或全表掃描(Table Scan)

 

3:  LIKE '%condition%'

   

    執行計劃會走索引掃描(Index Scan or Clustered Index Scan)或全表掃描(Table Scan)

 

4: LIKE 'condition1%condition%';

 

    執行計劃會走索引查詢(Index Seek)

 

下面我們以AdventureWorks2014示例資料庫為測試環境(測試環境為SQL Server 2014 SP2),測試上面四種情況,如下所示:

 

clip_image001

 

 

clip_image002

 

 

clip_image003

 

clip_image004

 

 

其實複雜的情況下,LIKE 'search_string%'也有走索引掃描(Index Scan)的情況,上面情況並不是唯一、絕對的。如下所示

 

在表Person.Person的 rowguid欄位上建立有唯一索引AK_Person_rowguid

 

 

clip_image005

 

 

那麼我們來看看上面所說的這個特殊案例(這裡使用一個現成的案例,懶得構造案例了),如何讓LIKE %search_string%走索引查詢(Index Seek),這個技巧就是使用變數,如下SQL對比所示:

 

如下所示,表[dbo].[GEN_CUSTOMER]在欄位CUSTOMER_CD有聚集索引。

 

 

clip_image006

 

 

可以看到CUSTOMER_CD LIKE '%' + @CUSTOMER_CD + '%'這樣的SQL寫法(或者CUSTOMER_CD LIKE @CUSTOMER_CD也可以), 執行計劃就走聚集索引查詢(Clustered Index Seek)了, 而條件中直接使用CUSTOMER_CD LIKE '%00630%' 反而走聚集索引掃描(Clustered Index Scan),另外可以看到實際執行的Cost開銷比為4% VS 96% ,初一看,還真的以為第一個執行計劃比第二個執行的代價要小很多。但是從IO開銷,以及CPU time、elapsed time對比來看,兩者幾乎沒有什麼差異。在這個案例中,並不是走索引查詢(Index Seek)就真的開銷代價小很多。

 

 

clip_image007

 

 

考慮到這裡資料量較小,我使用網上的一個指令碼,在AdventureWorks2014資料庫構造了一個10000000的大表,然後順便做了一些測試對比

 

CREATE TABLE dbo.TestLIKESearches
(
     ID1         INT
    ,ID2         INT
    ,AString     VARCHAR(100)
    ,Value       INT
    ,PRIMARY KEY (ID1, ID2)
);
 
WITH Tally (n) AS
(
SELECT TOP 10000000 ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM sys.all_columns a CROSS JOIN sys.all_columns b
)
INSERT INTO dbo.TestLIKESearches
    (ID1, ID2, AString, Value)
SELECT 1+n/500, n%500
    ,CASE WHEN n%500 > 299 THEN
            SUBSTRING('abcdefghijklmnopqrstuvwxyz', 1+ABS(CHECKSUM(NEWID()))%26, 1) +
            SUBSTRING('abcdefghijklmnopqrstuvwxyz', 1+ABS(CHECKSUM(NEWID()))%26, 1) +
            SUBSTRING('abcdefghijklmnopqrstuvwxyz', 1+ABS(CHECKSUM(NEWID()))%26, 1) +
            RIGHT(1000+n%1000, 3) +
            SUBSTRING('abcdefghijklmnopqrstuvwxyz', 1+ABS(CHECKSUM(NEWID()))%26, 1) +
            SUBSTRING('abcdefghijklmnopqrstuvwxyz', 1+ABS(CHECKSUM(NEWID()))%26, 1) +
            SUBSTRING('abcdefghijklmnopqrstuvwxyz', 1+ABS(CHECKSUM(NEWID()))%26, 1)
          END
    ,1+ABS(CHECKSUM(NEWID()))%100
FROM Tally;
 
 
CREATE INDEX IX_TestLIKESearches_N1 ON dbo.TestLIKESearches(AString);

 

如下測試所示,在一個大表上面,LIKE @search_string這種SQL寫法,IO開銷確實要小一些,CPU Time也要小一些。個人多次測試都是這種結果。也就是說對於資料量較大的表,這種SQL寫法效能確實要好一些。

 

clip_image008

 

clip_image009

 

 

現在回到最開始那個SQL語句,個人對執行計劃有些疑惑,檢視執行計劃,你會看到優化器對CUSTOMER_CD LIKE '%' + @CUSTOMER_CD + '%' 進行了轉換。如下截圖或通過執行計劃的XML,你會發現上面轉換為使用三個內部函式LikeRangeStart, LikeRangeEnd,  LikeRangeInfo.

 

clip_image010

 

<OutputList>
                    <ColumnReference Column="Expr1007" />
                    <ColumnReference Column="Expr1008" />
                    <ColumnReference Column="Expr1009" />
                  </OutputList>
                  <ComputeScalar>
                    <DefinedValues>
                      <DefinedValue>
                        <ColumnReference Column="Expr1007" />
                        <ScalarOperator ScalarString="LikeRangeStart((N'%'+[@CUSTOMER_CD])+N'%')">
                          <Identifier>
                            <ColumnReference Column="ConstExpr1004">
                              <ScalarOperator>
                                <Intrinsic FunctionName="LikeRangeStart">
                                  <ScalarOperator>
                                    <Arithmetic Operation="ADD">
                                      <ScalarOperator>
                                        <Arithmetic Operation="ADD">
                                          <ScalarOperator>
                                            <Const ConstValue="N'%'" />
                                          </ScalarOperator>
                                          <ScalarOperator>
                                            <Identifier>
                                              <ColumnReference Column="@CUSTOMER_CD" />
                                            </Identifier>
                                          </ScalarOperator>
                                        </Arithmetic>
                                      </ScalarOperator>
                                      <ScalarOperator>
                                        <Const ConstValue="N'%'" />
                                      </ScalarOperator>
                                    </Arithmetic>
                                  </ScalarOperator>
                                  <ScalarOperator>
                                    <Const ConstValue="" />
                                  </ScalarOperator>
                                </Intrinsic>
                              </ScalarOperator>
                            </ColumnReference>
                          </Identifier>
                        </ScalarOperator>
                      </DefinedValue>
                      <DefinedValue>
                        <ColumnReference Column="Expr1008" />
                        <ScalarOperator ScalarString="LikeRangeEnd((N'%'+[@CUSTOMER_CD])+N'%')">
                          <Identifier>
                            <ColumnReference Column="ConstExpr1005">
                              <ScalarOperator>
                                <Intrinsic FunctionName="LikeRangeEnd">
                                  <ScalarOperator>
                                    <Arithmetic Operation="ADD">
                                      <ScalarOperator>
                                        <Arithmetic Operation="ADD">
                                          <ScalarOperator>
                                            <Const ConstValue="N'%'" />
                                          </ScalarOperator>
                                          <ScalarOperator>
                                            <Identifier>
                                              <ColumnReference Column="@CUSTOMER_CD" />
                                            </Identifier>
                                          </ScalarOperator>
                                        </Arithmetic>
                                      </ScalarOperator>
                                      <ScalarOperator>
                                        <Const ConstValue="N'%'" />
                                      </ScalarOperator>
                                    </Arithmetic>
                                  </ScalarOperator>
                                  <ScalarOperator>
                                    <Const ConstValue="" />
                                  </ScalarOperator>
                                </Intrinsic>
                              </ScalarOperator>
                            </ColumnReference>
                          </Identifier>
                        </ScalarOperator>
                      </DefinedValue>
                      <DefinedValue>
                        <ColumnReference Column="Expr1009" />
                        <ScalarOperator ScalarString="LikeRangeInfo((N'%'+[@CUSTOMER_CD])+N'%')">
                          <Identifier>
                            <ColumnReference Column="ConstExpr1006">
                              <ScalarOperator>
                                <Intrinsic FunctionName="LikeRangeInfo">
                                  <ScalarOperator>
                                    <Arithmetic Operation="ADD">
                                      <ScalarOperator>
                                        <Arithmetic Operation="ADD">
                                          <ScalarOperator>
                                            <Const ConstValue="N'%'" />
                                          </ScalarOperator>
                                          <ScalarOperator>
                                            <Identifier>
                                              <ColumnReference Column="@CUSTOMER_CD" />
                                            </Identifier>
                                          </ScalarOperator>
                                        </Arithmetic>
                                      </ScalarOperator>
                                      <ScalarOperator>
                                        <Const ConstValue="N'%'" />
                                      </ScalarOperator>
                                    </Arithmetic>
                                  </ScalarOperator>
                                  <ScalarOperator>
                                    <Const ConstValue="" />
                                  </ScalarOperator>
                                </Intrinsic>
                              </ScalarOperator>
                            </ColumnReference>
                          </Identifier>
                        </ScalarOperator>
                      </DefinedValue>
                    </DefinedValues>

 

 

另外,你會發現Nested Loops & Compute Scalar 等步驟的Cost都為0.後面在“Dynamic Seeks and Hidden Implicit Conversions”這篇部落格裡面看到了一個新名詞Dynamic Seeks。文字提到因為成本估算為0,所以,你看到的執行計劃的Cost又是“不準確”的,具體描述如下:

 

The plan now contains an extra Constant Scan,  a Compute Scalar and a Nested Loops Join.  These operators are interesting because they have zero cost estimates: no CPU, no I/O, nothing.  That’s because they are purely architectural: a workaround for the fact that SQL Server cannot currently perform a dynamic seek within the Index Seek operator itself.  To avoid affecting plan choices, this extra machinery is costed at zero.

The Constant Scan produces a single in-memory row with no columns.  The Compute Scalar defines expressions to describe the covering seek range (using the runtime value of the @Like variable).  Finally, the Nested Loops Join drives the seek using the computed range information as correlated values.

The upper tooltip shows that the Compute Scalar uses three internal functions,LikeRangeStartLikeRangeEnd, and LikeRangeInfo.  The first two functions describe the range as an open interval.  The third function returns a set of flags encoded in an integer, that are used internally to define certain seek properties for the Storage Engine.  The lower tooltip shows the seek on the open interval described by the result ofLikeRangeStart and LikeRangeEnd, and the application of the residual predicate ‘LIKE @Like’.

 

 

 不管你返回的記錄有多少,執行計劃Nested Loops & Compute Scalar 等步驟的Cost都為0,如下測試所示,返回1000條記錄,它的成本估算依然為0 ,顯然這樣是不夠精確的。深層次的原因就不太清楚了。執行計劃Cost不可靠的案例很多。

 

SET STATISTICS IO ON;
 
SET STATISTICS TIME ON;
 
DECLARE @CUSTOMER_CD NVARCHAR(10);
 
SET @CUSTOMER_CD=N'%44%'
 
 
 
SELECT * FROM  [dbo].[GEN_CUSTOMER] WHERE CUSTOMER_CD LIKE @CUSTOMER_CD

 

 

 

另外,其實還一點沒有搞清楚的時候在什麼條件下出現Index Seek的情況。有些情況下,使用變數的方式,依然是索引掃描

 

 

clip_image011

 

 

不過我在測試過程,發現有一個原因是書籤查詢(Bookmark Lookup:鍵查詢(Key Lookup)或RID查詢 (RID Lookup))開銷過大會導致索引掃描。如下測試對比所示:

 

CREATE NONCLUSTERED INDEX [IX_xriteWhite_N1] ON.[dbo].[xriteWhite] ([Item_NO]) INCLUDE ([Iden],[WI_CE],[CIE],[Operate_Time])

 

 

clip_image012

 

作者:瀟湘隱者

出處:http://www.cnblogs.com/kerrycode/

 

 

鄭州不孕不育檢查

鄭州不孕不育

鄭州最好的不孕不育醫院

鄭州不孕不育