1. 程式人生 > >索引易錯點:索引與max(),min()

索引易錯點:索引與max(),min()

前面說完了索引與count(*)的關係,現在來討論另外一種聚合查詢max(),min()與索引的關係,大家覺得這個聚合查詢能用的到索引嗎?

通過上一小節的學習後,可能有人會回答:“可以用得上,但是索引列必須要建主鍵或者要寫where column is not null就可以用到了。”對於這樣的回答應該值得肯定,非常正確!看來前面沒白講了。不過用上了什麼樣的索引掃描方式呢?上一小節的方式是INDEX FULL SCAN,大家一定有印象,現在如果是要讓max()和min()利用上索引,也是走這個INDEX FULL SCAN掃描方式嗎?

大家想一想索引的結構是什麼樣的?索引結構是從root到branch最後到leaf,好象一個金字塔。最下面的葉子層(也就是金字塔的底部)其實是有序的,比如從左到右值是從小到大,或者從大到小。這樣一來大家認為取max()或者 min()還需要INDEX FULL SCAN嗎,找到頭或尾不就找到最大或最小值,還需要遍歷leaf嗎?

於是ORACLE的另一種索引掃描型別就橫空出世了index full scan(max/min)。多了(max/min)的關鍵字!index full scan(max/min)蘊含著stopkey的機制,從最左邊或者最右邊的葉子節點開始掃描,讀到第一個值後就停止掃描。

檢視max()的查詢,發現果然是走  INDEX FULL SCAN (MIN/MAX)

SQL> explain plan for select max(object_id) from ljb_test where object_id is not null;

Explained

SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT

----------------------------------------------------------------------------------------------------------------------

Plan hash value: 613051030

----------------------------------------------------------------------------------------------------------------------

| Id  | Operation                   | Name         | Rows  | Bytes | Cost (%CPU)

----------------------------------------------------------------------------------------------------------------------

|   0 | SELECT STATEMENT            |              |     1 |    13 |     2   (0)

|   1 |  SORT AGGREGATE             |              |     1 |    13 |

|   2 |   FIRST ROW                 |              | 49190 |   624K|     2   (0)

|*  3 |    INDEX FULL SCAN (MIN/MAX)| IDX_LJB_TEST | 49190 |   624K|     2   (0)

----------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

   3 - filter("OBJECT_ID" IS NOT NULL)

Note

   - dynamic sampling used for this statement

19 rows selected

檢視min()的查詢,發現也走了INDEX FULL SCAN (MIN/MAX)

SQL> explain plan for select min(object_id) from ljb_test where object_id is not null;

Explained

SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT

-----------------------------------------------------------------------------------------------------------------------

Plan hash value: 613051030

-----------------------------------------------------------------------------------------------------------------------

| Id  | Operation                   | Name         | Rows  | Bytes | Cost (%CPU)

-----------------------------------------------------------------------------------------------------------------------

|   0 | SELECT STATEMENT            |              |     1 |    13 |     2   (0)

|   1 |  SORT AGGREGATE             |              |     1 |    13 |

|   2 |   FIRST ROW                 |              | 49190 |   624K|     2   (0)

|*  3 |    INDEX FULL SCAN (MIN/MAX)| IDX_LJB_TEST | 49190 |   624K|     2   (0)

-----------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

   3 - filter("OBJECT_ID" IS NOT NULL)

Note

   - dynamic sampling used for this statement

19 rows selected

到此大家應該完全明白了max()和min()的時候,執行計劃中會走INDEX FULL SCAN (MIN/MAX)的原因了吧,在獲取正確的資訊後,ORACLE對此類查詢自然就會選擇這樣的掃描方式,希望大家能理解其中選擇這樣方式掃描的原理!也許有人說,知道這個也沒用,ORACLE自己就會選怎麼走索引吧,這個NDEX FULL SCAN (MIN/MAX)的知識點知道也沒意義。其實我認為,多理解點東西總是有用的,尤其是原理性方面,比如我現在再問這樣一個問題:select min(object_id),max(object_id) fromljb_test where object_id is not null 這個語句ORACLE怎麼處理?大家怎麼回答?

讓我實驗一下吧(很多人猜還是INDEX FULL SCAN (MIN/MAX)):

下面執行結果出來了,走的索引掃描型別是INDEX FULL SCAN,看不到(MIN/MAX)的關鍵字,咋回事?

 SQL> explain plan for select min(object_id),max(object_id) from ljb_test where object_id is not null;

Explained

SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT

-------------------------------------------------------------------------------------------------------------------

Plan hash value: 1341606234

-------------------------------------------------------------------------------------------------------------------

| Id  | Operation             | Name         | Rows  | Bytes | Cost (%CPU)| Time

-------------------------------------------------------------------------------------------------------------------

|   0 | SELECT STATEMENT      |              |     1 |    13 |    61   (4)| 00:0

|   1 |  SORT AGGREGATE       |              |     1 |    13 |            |

|*  2 |   INDEX FAST FULL SCAN| IDX_LJB_TEST | 49190 |   624K|    61   (4)| 00:0

-------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

   2 - filter("OBJECT_ID" IS NOT NULL)

Note

-----

   - dynamic sampling used for this statement

18 rows selected

原來這樣的SQL語句是表示ORACLE要利用該索引同時取到這兩個值,INDEX FULL SCAN (MIN/MAX)是無法一次取到兩個值的,所以ORACLE不得不選擇了INDEX FULL SCAN ,把葉子的索引掃了個遍,同時取到了兩個值。

明白了原理,處理起問題就簡單了,改寫程式碼如下:

select (select max(object_id) from test1) c, (select min(object_id) from test1) b from dual;

現在終於走了INDEX FULL SCAN (MIN/MAX)索引了,大家看到這個INDEX FULL SCAN (MIN/MAX)威力還是非常大的,走了兩次INDEX FULL SCAN (MIN/MAX),居然代價才4,遠遠低於一次INDEX FULL SCAN的代價61

SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT

-----------------------------------------------------------------------------------------------------------------------------------------

Plan hash value: 3189180828

-----------------------------------------------------------------------------------------------------------------------------------------

| Id  | Operation                     | Name         | Rows  | Bytes | Cost (%CPU)| Time     |

-----------------------------------------------------------------------------------------------------------------------------------------

|   0 | SELECT STATEMENT              |              |     1 |    26 |     4   (0)| 00:00:01 |

|   1 |  NESTED LOOPS                 |              |     1 |    26 |     4   (0)| 00:00:01 |

|   2 |   VIEW                        |              |     1 |    13 |     2   (0)| 00:00:01 |

|   3 |    SORT AGGREGATE             |              |     1 |    13 |            |          |

|   4 |     FIRST ROW                 |              | 49190 |   624K|     2   (0)| 00:00:01 |

|*  5 |      INDEX FULL SCAN (MIN/MAX)| IDX_LJB_TEST | 49190 |   624K|     2   (0)| 00:00:01 |

|   6 |   VIEW                        |              |     1 |    13 |     2   (0)| 00:00:01 |

|   7 |    SORT AGGREGATE             |              |     1 |    13 |            |          |

|   8 |     FIRST ROW                 |              | 49190 |   624K|     2   (0)| 00:00:01 |

|*  9 |      INDEX FULL SCAN (MIN/MAX)| IDX_LJB_TEST | 49190 |   624K|     2   (0)| 00:00:01 |

-----------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

   5 - filter("OBJECT_ID" IS NOT NULL)

   9 - filter("OBJECT_ID" IS NOT NULL)

PLAN_TABLE_OUTPUT

----------------------------------------------------------------------------------

Note

   - dynamic sampling used for this statement

總結:max() 和 min() 是大家常用的使用頻率很高的sql寫法,計費專案各種報表中需要這樣編寫的地方比比皆是!希望大家能對這樣的查詢建立索引,在保證該列不空的情況下,就有可能利用到INDEX FULL SCAN (MIN/MAX)這個索引掃描方式,能為查詢效能帶來很大的提高,另外只要善於思考,還可以通過改寫SQL的方式,將原本利用不到INDEX FULL SCAN (MIN/MAX)查詢方式的語句select min(object_id),max(object_id) from ljb_test where object_id is not null改造後,利用上INDEX FULL SCAN (MIN/MAX)。希望這個能啟發開發人員多利用現有的SQL知識,編寫出高效的SQL語句。

引申聯想:大家記得前面我有提到index full scan(max/min)蘊含著stopkey的機制,有優化基礎的朋友一定認識這個stopkey,經常在分頁查詢的執行計劃中,看到有這樣的關鍵字,基本上可以認為這個查詢的執行計劃是正確的。

比如select * from (select * from table where id= order by name desc) where rownum<11;這樣的語句具體的意思就是id為某個值的時候,根據name做排序,然後取前10行.這個語句存在2個部分:id為某個值,name降序。假設我現在存在這一個索引(id,name desc)這個索引的結構也是id相同的情況下按照name的降序排列,這個索引同時滿足前面的兩個條件,因此就能提高速度,只要從索引中讀取出10個rowid,然後根據這10個rowid來回表,這時候速度肯定很快的,因此類似這類的分頁語句可以根據sql語句的原意來建立索引,就能提高速度,但是如果where條件裡出現非等於的條件,那麼不管怎麼建立索引都無法滿足前面的2個條件(根據索引的結構就很容易明白這點),就必須根據欄位的選擇性來建立合適的索引.