在oracle数据库中,两个表之间的表连接方法有排序合并连接、嵌套循环连接、哈希连接和笛卡尔连接四种
1.排序合并连接(sort merge join)
排序合并连接是一种两表在做表连接时用排序(sort)操作和合并(merge)操作来得到连接结果集的表连接方法
如果t1表和t2表在做表连接时使用的是排序合并连接,那么oracle会依次执行如下步骤:
a.以目标sql中指定的谓词条件访问t1表,然后对访问结果按照t1表的连接列排序,排好序后的结果集记为s1
b.以目标sql中指定的谓词条件访问t2表,然后对访问结果按照t2表的连接列排序,排好序后的结果集记为s2
c.对s1和s2进行合并操作,从中取出匹配记录作为最终的结果集
排序合并连接的优缺点及适用场景:
a.通常情况下hash join的效果都比sort merge join要好,但是,如果行源已经被排过序,在执行sort merge join时不需要再排序,这时sort merge join的性能会优于hash join
b.通常情况下,只有在以下情况发生时,才会使用排序合并连接:
1)rbo模式
2)不等值连接(>,<,>=,<=)
3)哈希连接被禁用时(_hash_join_enabled=false)
4)数据源已排序
–示例
1 sql> select * from scott.emp t1,scott.emp t2 where t1.empno > t2.mgr;
2
3 89 rows selected.
4
5
6 execution plan
7 ----------------------------------------------------------
8 plan hash value: 3950110903
9
10 ----------------------------------------------------------------------------------------
11 | id | operation | name | rows | bytes | cost (%cpu)| time |
12 ----------------------------------------------------------------------------------------
13 | 0 | select statement | | 62 | 4712 | 6 (17)| 00:00:01 |
14 | 1 | merge join | | 62 | 4712 | 6 (17)| 00:00:01 |
15 | 2 | sort join | | 14 | 532 | 2 (0)| 00:00:01 |
16 | 3 | table access by index rowid| emp | 14 | 532 | 2 (0)| 00:00:01 |
17 | 4 | index full scan | pk_emp | 14 | | 1 (0)| 00:00:01 |
18 |* 5 | sort join | | 14 | 532 | 4 (25)| 00:00:01 |
19 | 6 | table access full | emp | 14 | 532 | 3 (0)| 00:00:01 |
20 ----------------------------------------------------------------------------------------
21
22 predicate information (identified by operation id):
23 ---------------------------------------------------
24
25 5 - access(internal_function("t1"."empno")>internal_function("t2"."mgr"))
26 filter(internal_function("t1"."empno")>internal_function("t2"."mgr"))
27
28
29 statistics
30 ----------------------------------------------------------
31 1 recursive calls
32 0 db block gets
33 8 consistent gets
34 0 physical reads
35 0 redo size
36 6612 bytes sent via sql*net to client
37 575 bytes received via sql*net from client
38 7 sql*net roundtrips to/from client
39 2 sorts (memory)
40 0 sorts (disk)
41 89 rows processed
42
43 sql>
2.嵌套循环连接(nested loops join)
嵌套循环连接是一种两表在做表连接时依靠两层嵌套循环(外层循环/内层循环)来得到连接结果集的表连接方法
如果t1表和t2表在做表连接时使用的是嵌套循环连接,那么oracle会依次执行如下步骤:
a.首先,优化器会按照一定的规则来决定t1和t2谁是驱动表谁是被驱动表,驱动表用于外层循环,被驱动表用于内存循环。假设t1是驱动表
b.以目标sql中指定的谓词条件访问驱动表t1,得到结果集s1
c.遍历s1,同时遍历被驱动表t2,即取出s1中的记录按照连接条件和被驱动表t2做匹配。最终将得到的结果集返回
嵌套循环连接的优缺点及适用场景:
a.能够实现快速响应,即可以第一时间先返回已经连接过且满足连接条件的记录,而不必等待所有的连接操作全部做完后才返回连接结果
b.适用于驱动表所对应的驱动结果集的记录数较少,同时在被驱动表的连接列上又存在唯一性索引(或者在被驱动表的连接列上存在选择性很好的非唯一性索引)的情况
–示例
sql> select /*+ gather_plan_statistics use_nl(t1,t2)*/* from scott.emp t1,scott.dept t2 where t1.deptno = t2.deptno;
sql> select * from table(dbms_xplan.display_cursor(null,0,'allstats,last'));
plan_table_output
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
sql_id dcsf9m1rzzga5, child number 0
-------------------------------------
select /*+ gather_plan_statistics use_nl(t1,t2)*/* from scott.emp
t1,scott.dept t2 where t1.deptno = t2.deptno
plan hash value: 4192419542
-------------------------------------------------------------------------------------
| id | operation | name | starts | e-rows | a-rows | a-time | buffers |
-------------------------------------------------------------------------------------
| 0 | select statement | | 1 | | 14 |00:00:00.01 | 32 |
| 1 | nested loops | | 1 | 14 | 14 |00:00:00.01 | 32 |
| 2 | table access full| dept | 1 | 4 | 4 |00:00:00.01 | 7 |
|* 3 | table access full| emp | 4 | 4 | 14 |00:00:00.01 | 25 |
-------------------------------------------------------------------------------------
predicate information (identified by operation id):
---------------------------------------------------
3 - filter("t1"."deptno"="t2"."deptno")
21 rows selected.
sql>
3.哈希连接(hash join)
哈希连接是一种两表在做表连接时依靠哈希运算来得到连接结果集的表连接方法,oracle 7.3之后引入
hash join的工作方式是将一个表(通常是小一点的那个表)做hash运算并存储到hash列表中,从另一个表中抽取记录,做hash运算,到hash 列表中找到相应的值,做匹配
哈希连接只适用于cbo,也只能用于等值连接条件
哈希连接很适合于小表和大表做连接,特别是在小表的连接列的可选择性非常好的情况下,这时候哈希连接的执行时间就可以近似看作是和全表扫描那个大表所耗费的时间相当
哈希连接时,驱动结果集对应的hash table能够完全被容纳在内存中(pga的工作区),此时的哈希连接的执行效率非常高
哈希连接的性能问题可以通过10104事件来诊断,相关说明如下:
number of in-memory partitions (may have changed): hash partition
final number of hash buckets: hash bucket数量
total buckets: empty buckets: non-empty buckets: hash bucket中空记录及非空记录的情况
total number of rows: 驱动结果集的记录数
maximum number of rows in a bucket: 包含记录数最多的hash bucket所含记录的数量
disabled bitmap filtering: 是否启用位图过滤
–示例
1 sql> select /*+ gather_plan_statistics use_hash(t1,t2)*/* from scott.emp t1,scott.dept t2 where t1.deptno = t2.deptno;
2 sql> select * from table(dbms_xplan.display_cursor(null,0,'allstats,last'));
3
4 plan_table_output
5 --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
6 sql_id 0j83q86ara5u2, child number 0
7 -------------------------------------
8 select /*+ gather_plan_statistics use_hash(t1,t2)*/* from scott.emp
9 t1,scott.dept t2 where t1.deptno = t2.deptno
10
11 plan hash value: 615168685
12
13 ----------------------------------------------------------------------------------------------------------------
14 | id | operation | name | starts | e-rows | a-rows | a-time | buffers | omem | 1mem | used-mem |
15 ----------------------------------------------------------------------------------------------------------------
16 | 0 | select statement | | 1 | | 14 |00:00:00.01 | 13 | | | |
17 |* 1 | hash join | | 1 | 14 | 14 |00:00:00.01 | 13 | 1321k| 1321k| 1070k (0)|
18 | 2 | table access full| dept | 1 | 4 | 4 |00:00:00.01 | 6 | | | |
19 | 3 | table access full| emp | 1 | 14 | 14 |00:00:00.01 | 7 | | | |
20 ----------------------------------------------------------------------------------------------------------------
21
22 predicate information (identified by operation id):
23 ---------------------------------------------------
24
25 1 - access("t1"."deptno"="t2"."deptno")
26
27
28 21 rows selected.
29
30 sql>