oracle hash join和nested loop下的驅動表相關測試
Oracle 驅動表
Oracle驅動表也叫做外部表,也叫外層表,是在多表關聯查詢中首先遍歷的表,驅動表的每一行都要到另一個表中尋找相應的記錄,然後計算返回最終資料。
驅動表的概念只在nested loops和hash join時存在。
原則:
1. 驅動表一般是小表,但不絕對,看下邊
2. 驅動表一般是通過where條件篩選後剩餘行數較少的表。
3. 如果表的一條記錄很長,佔用幾個資料塊也適合做驅動表
4. CBO和RBO中,對於驅動表的選擇是不同的,CBO中通過對統計資訊的參考進行計算來選擇驅動表,而RBO中按照既定原則選擇驅動表。
5.
6. 涉及驅動表的查詢,連線條件的索引很重要,驅動表連線欄位可以沒有索引,但是被驅動表需要被掃描驅動表經過篩選後剩餘條數的遍數,所以被驅動表的連線欄位上有一條索引是非常重要的。
分析:
假設a表10行記錄,b表1000行記錄,兩個表都有id列,查詢時使用id列進行關聯
Select * from a,b where a.id=b.id anda.id=100;
A表作為驅動表比較合適,假設a.id=100只有1行,即使全表掃描a表也就幾個塊,假設a表佔用10個塊。
B表的id假如非唯一,如果b
那麼這條語句的成本(以塊計算,下同):
A表(10個塊)*b表索引(10個塊)+b表id為100的2個塊=102個塊
如果b表沒有索引,成本為:
A表(10個塊)*b表(100個塊)=1000個塊
如果a,b表都沒有索引,可以看出不管哪個表作為驅動表,語句的執行成本都是一樣的。
如果a,b表id列都有索引,a表id列索引佔2個塊,成本為:
A表id列索引(2個塊)*b表id列索引(10個塊)+ b表id為100的2個塊=22個塊
如果B表的記錄很長,可以作為驅動表的情況比較複雜,大家可以自己想象適合的場景。
可以看出,在連線中,如果連線列有索引是多麼的重要。
實驗支撐
SQL> create table a(id,name) as selectobject_id,object_name from all_objects where rownum < 200;
Table created.
SQL>
SQL> create table b as select * fromall_objects ;
Table created.
SQL> select count(*) from a;
COUNT(*)
----------
199
SQL> select count(*) from b
SQL>
COUNT(*)
----------
89083
SQL>
SQL> execdbms_stats.gather_table_stats('TEST','A');
PL/SQL procedure successfully completed.
SQL>
SQL> execdbms_stats.gather_table_stats('TEST','B');
PL/SQL procedure successfully completed.
兩個表都沒有索引
Select count(*) from a,b wherea.id=b.object_id
And a.id=53
執行計劃:(B表驅動)
SQL> Select count(*) from a,b wherea.id=b.object_id
2 And a.id=53
3 /
COUNT(*)
----------
1
Execution Plan
----------------------------------------------------------
Plan hash value: 319234518
----------------------------------------------------------------------------
| Id | Operation | Name |Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------
| 0| SELECT STATEMENT | | 1 | 9 | 420 (1)| 00:00:01 |
| 1| SORT AGGREGATE | | 1 | 9 | | |
|* 2| HASH JOIN | | 1 | 9 | 420 (1)| 00:00:01 |
|* 3| TABLE ACCESS FULL| B | 1 | 5 | 417 (1)| 00:00:01 |
|* 4| TABLE ACCESS FULL| A | 1 | 4 | 3 (0)| 00:00:01 |
----------------------------------------------------------------------------
Predicate Information (identified byoperation id):
---------------------------------------------------
2- access("A"."ID"="B"."OBJECT_ID")
3- filter("B"."OBJECT_ID"=53)
4- filter("A"."ID"=53)
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
1506 consistent gets
0 physical reads
0 redo size
542 bytes sent via SQL*Net toclient
543 bytes received via SQL*Netfrom client
2 SQL*Net roundtrips to/fromclient
0 sorts (memory)
0 sorts (disk)
1 rows processed
SQL>
A表作為驅動表
SQL> Select /*+ ordered use_nl(a) */count(*) from a,b where a.id=b.object_id
2 Anda.id=53;
COUNT(*)
----------
1
1 row selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 1397777030
----------------------------------------------------------------------------
| Id | Operation | Name |Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------
| 0| SELECT STATEMENT | | 1 | 9 | 420 (1)| 00:00:01 |
| 1| SORT AGGREGATE | | 1 | 9 | | |
|* 2| HASH JOIN | | 1 | 9 | 420 (1)| 00:00:01 |
|* 3| TABLE ACCESS FULL| A | 1 | 4 | 3 (0)| 00:00:01 |
|* 4| TABLE ACCESS FULL| B | 1 | 5 | 417 (1)| 00:00:01 |
----------------------------------------------------------------------------
Predicate Information (identified byoperation id):
---------------------------------------------------
2 -access("A"."ID"="B"."OBJECT_ID")
3- filter("A"."ID"=53)
4- filter("B"."OBJECT_ID"=53)
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
1506 consistent gets
0 physical reads
0 redo size
542 bytes sent via SQL*Net toclient
543 bytes received via SQL*Netfrom client
2 SQL*Net roundtrips to/fromclient
0 sorts (memory)
0 sorts (disk)
1 rows processed
SQL>
發現上面兩個語句的代價是一樣的
/*+ Ordered use_nl(table_name) */ --使用hint強制表作為驅動表,另外,這裡使用的use_nl,但是走的是hash join,說明在沒有索引的情況下,oracle優化器更傾向hash join。
執行計劃中,hash join下第一個表為驅動表,此處為A表。
表B object_id列有索引的情況
SQL> create index id_b_object_id onb(object_id);
Index created.
SQL> execdbms_stats.gather_table_stats(ownname => 'TEST',TABNAME => 'B',CASCADE=> TRUE);
PL/SQL procedure successfully completed.
SQL>
執行計劃:
SQL> Select count(*) from a,b wherea.id=b.object_id
2 And a.id=53;
COUNT(*)
----------
1
1 row selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 3168189658
----------------------------------------------------------------------------------------
| Id | Operation |Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------
| 0| SELECT STATEMENT | | 1 | 9 | 4 (0)| 00:00:01 |
| 1| SORT AGGREGATE | | 1 | 9 | | |
| 2| MERGE JOINCARTESIAN| | 1 | 9 | 4 (0)| 00:00:01 |
|* 3| TABLE ACCESS FULL | A | 1 | 4 | 3 (0)| 00:00:01 |
| 4| BUFFER SORT | | 1 | 5 | 1 (0)| 00:00:01 |
|* 5| INDEX RANGE SCAN | ID_B_OBJECT_ID | 1 | 5 | 1 (0)| 00:00:01 |
----------------------------------------------------------------------------------------
Predicate Information (identified byoperation id):
---------------------------------------------------
3- filter("A"."ID"=53)
5 -access("B"."OBJECT_ID"=53)
Statistics
----------------------------------------------------------
92 recursive calls
0 db block gets
134 consistent gets
23 physical reads
0 redo size
542 bytes sent via SQL*Net toclient
543 bytes received via SQL*Netfrom client
2 SQL*Net roundtrips to/fromclient
12 sorts (memory)
0 sorts (disk)
1 rows processed
SQL>
發現執行計劃並沒有使用nested loop和hash join,不過走索引後,執行代價明顯減少。Merge join發生了排序,如果記憶體夠用還好,不夠用就比較耗時了。
強制hash
A表驅動
SQL> Select /*+ use_hash(a,b) */count(*) from a,b where a.id=b.object_id
2 And a.id=53;
COUNT(*)
----------
1
1 row selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 895278611
--------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------
| 0| SELECT STATEMENT | | 1 | 9 | 4 (0)| 00:00:01 |
| 1| SORT AGGREGATE | | 1 | 9 | | |
|* 2| HASH JOIN | | 1 | 9 | 4 (0)| 00:00:01 |
|* 3| TABLE ACCESS FULL| A | 1 | 4 | 3 (0)| 00:00:01 |
|* 4| INDEX RANGE SCAN | ID_B_OBJECT_ID| 1 | 5 | 1 (0)| 00:00:01 |
--------------------------------------------------------------------------------------
Predicate Information (identified byoperation id):
---------------------------------------------------
2- access("A"."ID"="B"."OBJECT_ID")
3- filter("A"."ID"=53)
4- access("B"."OBJECT_ID"=53)
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
5 consistent gets
0 physical reads
0 redo size
542 bytes sent via SQL*Net toclient
543 bytes received via SQL*Netfrom client
2 SQL*Net roundtrips to/fromclient
0 sorts (memory)
0 sorts (disk)
1 rows processed
SQL>
--強制使用hash join,a表預設變為了驅動表,執行代價很低,符合要求
B表驅動
SQL> Select /*+ ordered use_hash(b) */count(*) from a,b where a.id=b.object_id
2 And a.id=53;
COUNT(*)
----------
1
1 row selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 895278611
--------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------
| 0| SELECT STATEMENT | | 1 | 9 | 4 (0)| 00:00:01 |
| 1| SORT AGGREGATE | | 1 | 9 | | |
|* 2| HASH JOIN | | 1 | 9 | 4 (0)| 00:00:01 |
|* 3| TABLE ACCESS FULL| A | 1 | 4 | 3 (0)| 00:00:01 |
|* 4| INDEX RANGE SCAN | ID_B_OBJECT_ID| 1 | 5| 1 (0)| 00:00:01 |
--------------------------------------------------------------------------------------
Predicate Information (identified byoperation id):
---------------------------------------------------
2- access("A"."ID"="B"."OBJECT_ID")
3- filter("A"."ID"=53)
4- access("B"."OBJECT_ID"=53)
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
5 consistent gets
0 physical reads
0 redo size
542 bytes sent via SQL*Net toclient
543 bytes received via SQL*Netfrom client
2 SQL*Net roundtrips to/fromclient
0 sorts (memory)
0 sorts (disk)
1 rows processed
SQL>
發現有索引,並且有統計資訊的情況下,無法強制B表作為驅動表,oracle對hint進行了忽略。
刪除統計資訊試試:
SQL> EXEC dbms_stats.delete_table_stats(user,'B',cascade_parts =>TRUE);
PL/SQL procedure successfully completed
SQL> EXEC dbms_stats.delete_table_stats(user,'A',cascade_parts =>TRUE);
PL/SQL procedure successfully completed
SQL>
--測試發現仍然不能將B表作為驅動表,修改optimizer_mode為rule
alter session set optimizer_mode=rule;
SQL> Select /*+ ordered use_nl(b) */count(*) from a,b where a.id=b.object_id
2 And object_id=53;
--發現仍然不能將B表作為驅動表
強制nested loop
SQL> Select /*+ ordered use_nl(b) */count(*) from a,b where a.id=b.object_id
2 And object_id=53;
COUNT(*)
----------
1
1 row selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 1183094437
--------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------
| 0| SELECT STATEMENT | | 1 | 26 | 4 (0)| 00:00:01 |
| 1| SORT AGGREGATE | | 1 | 26 | | |
| 2| NESTED LOOPS | | 1 | 26 | 4 (0)| 00:00:01 |
|* 3| TABLE ACCESS FULL| A | 1 | 13 | 3 (0)| 00:00:01 |
|* 4| INDEX RANGE SCAN | ID_B_OBJECT_ID| 1 | 13 | 1 (0)| 00:00:01 |
--------------------------------------------------------------------------------------
Predicate Information (identified byoperation id):
---------------------------------------------------
3- filter("A"."ID"=53)
4- access("OBJECT_ID"=53)
Note
-----
-dynamic statistics used: dynamic sampling (level=2)
Statistics
----------------------------------------------------------
10 recursive calls
0 db block gets
73 consistent gets
1 physical reads
0 redo size
542 bytes sent via SQL*Net toclient
543 bytes received via SQL*Netfrom client
2 SQL*Net roundtrips to/fromclient
0 sorts (memory)
0 sorts (disk)
1 rows processed
SQL>
--代價和hash join差不多,另外,即使強制B表作為驅動表,仍然不能將B表作為驅動表。
兩個都有索引的情況
SQL> create index id_a_id on a(id);
Index created.
SQL> execdbms_stats.gather_table_stats(user,'A',CASCADE=>TRUE);
PL/SQL procedure successfully completed.
SQL> execdbms_stats.gather_table_stats(user,'B',cascade => true);
PL/SQL procedure successfully completed.
SQL>
SQL> Select /*+ ordered use_nl(b) */count(*) from a,b where a.id=b.object_id
2 And object_id=53;
COUNT(*)
----------
1
1 row selected.
Elapsed: 00:00:00.01
Execution Plan
----------------------------------------------------------
Plan hash value: 2751652919
-------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------
| 0| SELECT STATEMENT | | 1 | 9 | 2 (0)| 00:00:01 |
| 1| SORT AGGREGATE | | 1 | 9 | | |
| 2| NESTED LOOPS | | 1 | 9 | 2 (0)| 00:00:01 |
|* 3| INDEX RANGE SCAN| ID_A_ID | 1 | 4 | 1 (0)| 00:00:01 |
|* 4| INDEX RANGE SCAN| ID_B_OBJECT_ID| 1 | 5 | 1 (0)| 00:00:01 |
-------------------------------------------------------------------------------------
Predicate Information (identified byoperation id):
---------------------------------------------------
3- access("A"."ID"=53)
4- access("OBJECT_ID"=53)
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
3 consistent gets
0 physical reads
0 redo size
542 bytes sent via SQL*Net toclient
543 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/fromclient
0 sorts (memory)
0 sorts (disk)
1 rows processed
SQL>
--hint強制不能將B表作為驅動表
代價明顯變小,又減少一倍(索引是多麼重要)
我這裡使用的是12c的庫,發現12c對於執行計劃的準確性確實有提高,hint作為輔助手段越來越顯得必要性很小,這是dba要失業的勁頭還是幫助dba減輕負擔,~~