Sort merge joinsql
排序合并总结:session
1 一般状况下,排序合并链接的效率远不如hash join,前者适用范围更广,hj只使用于等值链接,smj范围更广(<,>,>=,<=)oracle
2 一般状况下,smj并不适合OLTP系统,排序操做是很是昂贵的操做,ide
优化器会根据必定的规则来肯定表T1,T2谁是驱动表,谁是被驱动表,驱动表用于外层循环,被驱动表用于内存循环,这里假设驱动表时T1,被驱动表时T2oop
目标sql中指定的谓词条件去访问T1,获得的结果集为1性能
而后遍历驱动结果集1同时遍历被驱动表T2,即先取出1中的第一条记录,接着遍历T2并按照条件去判断T2中是否存在配匹的记录,而后在取出1中的第二条记录。。。。优化
嵌套循环总结:this
1 若是t1对应的驱动结果集较少,同时t2的链接列上又有惟一性索引,则效率会很高spa
2 只要驱动结果集不多就具有嵌套循环的前提条件3d
3 嵌套循环能够实现快速响应,便可以第一时间返回通过链接且知足条件的记录,而没必要等待全部的链接操做所有作完才返回链接结果
若是使用了nl链接,而且t2的链接列上index,那么oracle访问该index是一般会使用单块读,则t1的返回n条结果,就会是t2访问该index n次,若是要回表,
则会回表n次,这就使得不在index 或者data buffer cache中的数据,发生物理I/O,
Oracle 11g使用了向量I/O,提升nl的链接效率
nested loop
outer table --驱动表
inner table
The second picture, shown in Figure 11-2, includes a representation of working through
an index on the second table, because an index is usually involved in this way when there is a
nested loop around.
例 create table t1 (col1 number, col2 varchar2(1)); create table t2 (col2 varchar2(1), col3 varchar2(2)); insert into t1 values(1,'A'); insert into t1 values(2,'B'); insert into t1 values(3,'C'); insert into t2 values('A','A1'); insert into t2 values('B','B1'); insert into t2 values('D','D1'); Connected to: Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - 64bit Production With the Partitioning, OLAP and Data Mining options SQL> set linesize 1000 SQL> set pagesize 1000 SQL> set timing on SQL> set autot trace only SP2-0158: unknown SET option "only" SQL> set autotrace traceonly; SQL> select t1.col1,t1.col2,t2.col3 2 from t1,t2 3 where t1.col2=t2.col2; Elapsed: 00:00:00.04 Execution Plan ---------------------------------------------------------- Plan hash value: 2253255382 -------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | -------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 3 | 60 | 4 (0)| 00:00:01 | | 1 | TABLE ACCESS BY INDEX ROWID| T2 | 1 | 5 | 1 (0)| 00:00:01 | | 2 | NESTED LOOPS | | 3 | 60 | 4 (0)| 00:00:01 | | 3 | TABLE ACCESS FULL | T1 | 3 | 45 | 3 (0)| 00:00:01 | |* 4 | INDEX RANGE SCAN | IDX_T2 | 1 | | 0 (0)| 00:00:01 | -------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 4 - access("T1"."COL2"="T2"."COL2") Note ----- - dynamic sampling used for this statement Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 13 consistent gets 0 physical reads 0 redo size 469 bytes sent via SQL*Net to client 337 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 2 rows processed
哈希链接是两个表在作链接时只要依靠哈希运算来获得结果集(仅适合CBO),在解析目标sql 时是否考虑哈希链接受限制与隐含参数(_HASH_JOIN_ENABLED),
默认值TRUE,若是值为false,强制使用hint,也是会走hj的
1 oracle会根据参数HASH_AREA_SIZE,DB_BLOCK_SIZE,_HASH_MULTIBLOCK_IO_COUNT来决定hash partition的数量,全部hash partition的集合称为Hash table,
2 表t1,t2在目标sql中的谓词条件后,获得结果集中的数据量较少的那个结果集会被oracle选为哈希链接的驱动结果集,假设t1的结果集1较少(驱动结果集),t2的结果2(被驱动结果集)
3 oracle会遍历结果集1,读取1中的每一条记录,并对每一条记录按照该记录t1中的链接列作哈希运算,
--小表在数据在指定谓词后作哈希运算放入pga中(超过放入temp),大表数据按照链接列作哈希运算,而后大表去配匹pga中的值,遍历完为止
哈希链接的优缺点:
1 哈希链接不必定会排序,大多数状况下不须要排序
2 哈希链接的驱动表所对应的链接列的可选择性尽量的好,会影响hash bucket中的记录数,哈希链接中,遍历hash bucket的动做发生在pga工做区中,不消耗逻辑读,
3 哈希链接适用于CBO,等值链接
4 哈希链接适合大表跟小表的链接,2个表作哈希链接,在指定了谓词后的sql中获得的数量较少的结果集所对应的hash table能彻底容纳在pga中,则效率会很高。
SQL> select /*+ leading (t1) use_hash(t2) */ 2 t1.col1,t1.col2,t2.col3 3 from t1,t2 4 where t1.col2=t2.col2; Elapsed: 00:00:00.25 Execution Plan ---------------------------------------------------------- Plan hash value: 1838229974 --------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | --------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 3 | 60 | 7 (15)| 00:00:01 | |* 1 | HASH JOIN | | 3 | 60 | 7 (15)| 00:00:01 | | 2 | TABLE ACCESS FULL| T1 | 3 | 45 | 3 (0)| 00:00:01 | | 3 | TABLE ACCESS FULL| T2 | 3 | 15 | 3 (0)| 00:00:01 | --------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - access("T1"."COL2"="T2"."COL2") Note ----- - dynamic sampling used for this statement Statistics ---------------------------------------------------------- 7 recursive calls 0 db block gets 32 consistent gets 0 physical reads 0 redo size 469 bytes sent via SQL*Net to client 337 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 2 sorts (memory) 0 sorts (disk) 2 rows processed
2个表在作链接是,没有指定任何链接条件的链接
SQL> select 2 t1.col1,t1.col2,t2.col3 3 from t1,t2; 9 rows selected. Elapsed: 00:00:00.03 Execution Plan ---------------------------------------------------------- Plan hash value: 787647388 ----------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ----------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 9 | 162 | 8 (0)| 00:00:01 | | 1 | MERGE JOIN CARTESIAN| | 9 | 162 | 8 (0)| 00:00:01 | | 2 | TABLE ACCESS FULL | T1 | 3 | 45 | 3 (0)| 00:00:01 | | 3 | BUFFER SORT | | 3 | 9 | 5 (0)| 00:00:01 | | 4 | TABLE ACCESS FULL | T2 | 3 | 9 | 2 (0)| 00:00:01 | -----------------------------------------------------------------------------
作子查询展开时,oracle会常常把那些外部where条件为 no exists,not in ,<>all的子查询转换成对应的反链接
SQL> select * from t1 2 where t1.col2 not in (select col2 from t2); Elapsed: 00:00:00.01 Execution Plan ---------------------------------------------------------- Plan hash value: 895956251 --------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | --------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 15 | 5 (0)| 00:00:01 | |* 1 | FILTER | | | | | | | 2 | TABLE ACCESS FULL| T1 | 3 | 45 | 3 (0)| 00:00:01 | |* 3 | TABLE ACCESS FULL| T2 | 3 | 6 | 2 (0)| 00:00:01 | --------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter( NOT EXISTS (SELECT /*+ */ 0 FROM "T2" "T2" WHERE LNNVL("COL2"<>:B1))) 3 - filter(LNNVL("COL2"<>:B1)) SQL> select * from t1 2 where not exists (select 1 from t2 where t1.col2=t2.col2); Elapsed: 00:00:00.01 Execution Plan ---------------------------------------------------------- Plan hash value: 1534930707 ----------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ----------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 17 | 3 (0)| 00:00:01 | | 1 | NESTED LOOPS ANTI | | 1 | 17 | 3 (0)| 00:00:01 | | 2 | TABLE ACCESS FULL| T1 | 3 | 45 | 3 (0)| 00:00:01 | |* 3 | INDEX RANGE SCAN | IDX_T2 | 3 | 6 | 0 (0)| 00:00:01 | ----------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 3 - access("T1"."COL2"="T2"."COL2") alter session set "_optimizer_null_aware_antijoin"=false
半链接跟普通的链接不一样,半链接会去重?
对子查询展开,exists,in等
SQL> select * from t1 2 where t1.col2 in (select col2 from t2); Elapsed: 00:00:00.01 Execution Plan ---------------------------------------------------------- Plan hash value: 3783859632 ----------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ----------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 3 | 51 | 3 (0)| 00:00:01 | | 1 | NESTED LOOPS SEMI | | 3 | 51 | 3 (0)| 00:00:01 | | 2 | TABLE ACCESS FULL| T1 | 3 | 45 | 3 (0)| 00:00:01 | |* 3 | INDEX RANGE SCAN | IDX_T2 | 3 | 6 | 0 (0)| 00:00:01 | ----------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 3 - access("T1"."COL2"="COL2") SQL> select * from t1 2 where exists (select 1 from t2 where t1.col2=t2.col2); Elapsed: 00:00:00.01 Execution Plan ---------------------------------------------------------- Plan hash value: 3783859632 ----------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ----------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 3 | 51 | 3 (0)| 00:00:01 | | 1 | NESTED LOOPS SEMI | | 3 | 51 | 3 (0)| 00:00:01 | | 2 | TABLE ACCESS FULL| T1 | 3 | 45 | 3 (0)| 00:00:01 | |* 3 | INDEX RANGE SCAN | IDX_T2 | 3 | 6 | 0 (0)| 00:00:01 | ----------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 3 - access("T1"."COL2"="T2"."COL2")
A)排序合并链接(Sort Merge Join, SMJ):
a) 对于非等值链接,这种链接方式的效率是比较高的。
b) 若是在关联的列上都有索引,效果更好。
c) 对于将2个较大的表源作链接,该链接方法比NL链接要好一些。
B)嵌套循环(Nested Loops, NL):
a) 若是驱动表(外部表)比较小,而且在被驱动表(内部表)上有惟一索引,或有高选择性非惟一索引时,使用这种方法能够获得较好的效率。
b)嵌套循环链接有其它链接方法没有的的一个优势是:能够先返回已经链接的行,而没必要等待全部的链接操做处理完才返回数据,这能够实现快速的响应时间。
C)哈希链接(Hash Join, HJ):
a) 这种方法是在oracle7后来引入的,使用了比较先进的链接理论,通常来讲,其效率应该好于其它2种链接,可是这种链接只能用在CBO优化器中,
并且须要设置合适的hash_area_size参数,才能取得较好的性能。
b) 在2个较大的表源之间链接时会取得相对较好的效率,在一个表源较小时则能取得更好的效率。
c) 只能用于等值链接中