oralce之10046对HashJoin分析

前两天解决了一个优化sql的case，sql语句如下，big_table为150g大小，small_table很小，9000多条记录，不到1m大小，hash_area_size, sort_area_size均设置足够大，可以进行optimal hash join和memory sort。

1 2 3 4 5 6

select /*+ leading(b) use_hash(a b) */ distinct a.id from big_table a, small_table b where (a.category = b.from_cat or a.category2 = b.from_cat) and a.site_id = b.site_id and a.sale_end >= sysdate;

执行计划如下：

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

————————————————————————– | id | operation | name | rows | bytes | cost (%cpu)| ————————————————————————– | 0 | select statement | | 2 | 174 | 18 (17)| | 1 | sort unique | | 2 | 174 | 18 (17)| |* 2 | hash join | | 2 | 174 | 17 (12)| | 3 | table access full | small_table | 1879 | 48854 | 14 (8)| |* 4 | table access full | big_table | 4 | 244 | 3 (34)| ————————————————————————– predicate information (identified by operation id): ————————————————— 2 – access(“a”.”site_id”=”b”.”site_id”) filter(“a”.”category”=”b”.”from_cat” or “a”.”category2″=”b”.”from_cat”) 4 – filter(“a”.”sale_end”>=sysdate@!)

粗略来看，plan非常的完美，sql hint写的也很到位，小表在内build hash table,大表在外进行probe操作，根据经验来看，整个sql执行的时间应该和fts（full table scan） big_table的时间差不多。

但是fts big_table的时间大约是8分钟，而真个sql执行的时间长达3~4小时。

那么问题究竟出在哪里?

fts时间应该不会有太大变化，那么问题应该在hash join，设置event来trace一下hash join的过程：

1	alter session set events ‘10104 trace name context forever, level 2’;

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28

### hash table ### # note: the calculated number of rows in non-empty buckets may be smaller # than the true number. number of buckets with 0 rows: 16373 number of buckets with 1 rows: 0 number of buckets with 2 rows: 0 number of buckets with 3 rows: 1 number of buckets with 4 rows: 0 number of buckets with 5 rows: 0 number of buckets with 6 rows: 0 number of buckets with 7 rows: 1 number of buckets with 8 rows: 0 number of buckets with 9 rows: 0 number of buckets with between 10 and 19 rows: 1 number of buckets with between 20 and 29 rows: 1 number of buckets with between 30 and 39 rows: 3 number of buckets with between 40 and 49 rows: 0 number of buckets with between 50 and 59 rows: 0 number of buckets with between 60 and 69 rows: 0 number of buckets with between 70 and 79 rows: 0 number of buckets with between 80 and 89 rows: 0 number of buckets with between 90 and 99 rows: 0 number of buckets with 100 or more rows: 4 ### hash table overall statistics ### total buckets: 16384 empty buckets: 16373 non-empty buckets: 11 total number of rows: 9232 maximum number of rows in a bucket: 2531 average number of rows in non-empty buckets: 839.272705

仔细看，在一个bucket中最多的行数竟然有2531行，因为bucket中是一个链表的结构，所以这几千行都是串在一个链表上。
由这一点想到这个hash table所依赖的hash key的distinct value可能太少，重复值太多。否则不应该会有这么多行在同一个bucket里面。

因为join条件里面有两个列from_cat和site_id,穷举法有三种情况：

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

sql> select site_id,from_cat,count(*) from small_table group by site_id,from_cat having count(*)>100; no rows selected 2. build hash table based on (from_cat): sql> select from_cat,count(*) from small_table group by from_cat having count(*)>100; no rows selected 3. build hash table based on (site_id): sql> select site_id,count(*) from small_table group by site_id having count(*)>100; site_id count(*) ———- ———- 0 2531 2 2527 146 1490 210 2526

到这里可以发现，基于site_id这种情况和trace file中这两行很相符：

1 2	number of buckets with 100 or more rows: 4 maximum number of rows in a bucket: 2531

注：这判断过程可以从执行计划的“predicate information”部分看出：

1	access(“a”.”site_id”=”b”.”site_id”)

所以推断这个hash table是基于site_id而建的，而big_table中大量的行site_id=0，都落在这个linked list最长的bucket中，而大部分行都会扫描完整个链表而最后被丢弃掉,所以这个hash join的操作效率非常差，几乎变为了nest loop操作。

找到了根本原因，问题也就迎刃而解了。

理想状况下，hash table应当建立于(site_id,from_cat)上，那么问题肯定出在这个or上，把or用union改写：

1 2 3 4 5 6 7 8 9 10 11

select /*+ leading(b) use_hash(a b) */ distinct a.id from big_table a, small_table b where a.category = b.from_cat and a.site_id = b.site_id and a.sale_end >= sysdate union select /*+ leading(b) use_hash(a b) */ distinct a.id from big_table a, small_table b where a.category2 = b.from_cat and a.site_id = b.site_id and a.sale_end >= sysdate;

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22

————————————————————————– | id | operation | name | rows | bytes | cost (%cpu)| ————————————————————————– | 0 | select statement | | 2 | 148 | 36 (59)| | 1 | sort unique | | 2 | 148 | 36 (59)| | 2 | union-all | | | | | |* 3 | hash join | | 1 | 74 | 17 (12)| | 4 | table access full| small_table | 1879 | 48854 | 14 (8)| |* 5 | table access full| big_table | 4 | 192 | 3 (34)| |* 6 | hash join | | 1 | 74 | 17 (12)| | 7 | table access full| small_table | 1879 | 48854 | 14 (8)| |* 8 | table access full| big_table | 4 | 192 | 3 (34)| ————————————————————————– predicate information (identified by operation id): ————————————————— 3 – access(“a”.”category”=”b”.”from_cat” and “a”.”site_id”=”b”.”site_id”) 5 – filter(“a”.”sale_end”>=sysdate@!) 6 – access(“a”.”category2″=”b”.”from_cat” and “a”.”site_id”=”b”.”site_id”) 8 – filter(“a”.”sale_end”>=sysdate@!)

初看这个plan好像不如第一个plan，因为执行了两次big_table的fts，但是让我们在来看看hash table的结构

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28

### hash table ### # note: the calculated number of rows in non-empty buckets may be smaller # than the true number. number of buckets with 0 rows: 9306 number of buckets with 1 rows: 5310 number of buckets with 2 rows: 1436 number of buckets with 3 rows: 285 number of buckets with 4 rows: 43 number of buckets with 5 rows: 4 number of buckets with 6 rows: 0 number of buckets with 7 rows: 0 number of buckets with 8 rows: 0 number of buckets with 9 rows: 0 number of buckets with between 10 and 19 rows: 0 number of buckets with between 20 and 29 rows: 0 number of buckets with between 30 and 39 rows: 0 number of buckets with between 40 and 49 rows: 0 number of buckets with between 50 and 59 rows: 0 number of buckets with between 60 and 69 rows: 0 number of buckets with between 70 and 79 rows: 0 number of buckets with between 80 and 89 rows: 0 number of buckets with between 90 and 99 rows: 0 number of buckets with 100 or more rows: 0 ### hash table overall statistics ### total buckets: 16384 empty buckets: 9306 non-empty buckets: 7078 total number of rows: 9232 maximum number of rows in a bucket: 5 average number of rows in non-empty buckets: 1.304323

这就是我们所需要的hash table，最长的链表只有五行数据。

整个sql的执行时间从三四个小时缩短为16分钟，大大超出了developer的预期。

这个sql单纯从plan上很难看出问题所在，需要了解hash join的机制，进行更深一步的分析。

oralce之10046对HashJoin分析

相关推荐

oralce之10046对HashJoin分析