一、exists和not exists的用法
exists : 强调的是是否返回结果集,不要求知道返回什么, 比如: select name from student where sex = 'm' and mark exists(select 1 from grade where ...) ,只要exists引导的子句有结果集返回,那么exists这个条件就算成立了,大家注意返回的字段始终为1,如果改成“select 2 from grade where ...”,那么返回的字段就是2,这个数字没有意义。所以exists子句不在乎返回什么,而是在乎是不是有结果集返回。
而 exists 与 in 最大的区别在于 in引导的子句只能返回一个字段,比如:
select name from student where sex = 'm' and mark in (select 1,2,3 from grade where ...),in子句返回了三个字段,这是不正确的,exists子句是允许的,但in只允许有一个字段返回,在1,2,3中随便去了两个字段即可。
而not exists 和not in 分别是exists 和 in 的 对立面。
exists (sql 返回结果集为真)
not exists (sql 不返回结果集为真)
表A
ID NAME
1 A1
2 A2
3 A3
表B
ID AID NAME
1 1 B1
2 2 B2
3 2 B3
表A和表B是1对多的关系 A.ID => B.AID
实验如下
SQL> create table a (id number,name varchar2(10));
Table created.
SQL> create table b (id number,aid number,name varchar2(10));
Table created.
SQL> insert into a values(1,'A1');
1 row created.
SQL> insert into a values(2,'A2');
1 row created.
SQL> insert into a values(3,'A3');
1 row created.
SQL> insert into b values(1,1,'B1');
1 row created.
SQL> insert into b values(2,2,'B2');
1 row created.
SQL> insert into b values(3,2,'B3');
1 row created.
SQL> commit;
SQL> select id,name from a;
ID NAME
---------- ----------
1 A1
2 A2
3 A3
SQL> select * from b;
ID AID NAME
---------- ---------- ----------
1 1 B1
2 2 B2
3 2 B3
SQL> select * from b,a where a.id=b.aid;
ID AID NAME ID NAME
---------- ---------- ---------- ---------- ----------
1 1 B1 1 A1
3 2 B3 2 A2
2 2 B2 2 A2
SQL> select id,name from a where exists (select 1 from b where a.id=b.aid);
ID NAME
---------- ----------
1 A1
2 A2
SQL> select id,name from a where exists (select * from b where a.id=b.aid);
ID NAME
---------- ----------
1 A1
2 A2
上述结果分析:原因可以按照如下分析
SELECT ID,NAME FROM A WHERE EXISTS (SELECT * FROM B WHERE B.AID=1)
--->SELECT * FROM B WHERE B.AID=1有值返回真所以有数据
SELECT ID,NAME FROM A WHERE EXISTS (SELECT * FROM B WHERE B.AID=2)
--->SELECT * FROM B WHERE B.AID=2有值返回真所以有数据
SELECT ID,NAME FROM A WHERE EXISTS (SELECT * FROM B WHERE B.AID=3)
--->SELECT * FROM B WHERE B.AID=3无值返回真所以没有数据
2.NOT EXISTS 就是反过来
SQL> select id,name from a where not exists (select * from b where a.id=b.aid);
ID NAME
---------- ---------- 3 A3
3.in /not in
EXISTS = IN,意思相同不过语法上有点点区别,好像使用IN效率要差点,应该是不会执行索引的原因
SQL> SELECT ID,NAME FROM A WHERE ID IN (SELECT AID FROM B);
ID NAME
---------- --------------------
1 A1
2 A2
NOT EXISTS = NOT IN ,意思相同不过语法上有点点区别
SQL> SELECT ID,NAME FROM A WHERE ID NOT IN(SELECT AID FROM B);
ID NAME
---------- --------------------
3 A3
二、in与exist , not in与not exist 的区别
in和exists in 是把外表和内表作hash 连接,而exists是对外表作loop循环,每次loop循环再对内表进行查询。一直以来认为exists比in效率高的说法是不准确的。
如果查询的两个表大小相当,那么用in和exists差别不大。
如果两个表中一个较小,一个是大表,则子查询表大的用exists,子查询表小的用in:
例如:表A(小表),表B(大表)
1:select * from A where cc in (select cc from B)
效率低,用到了A表上cc列的索引;
select * from A where exists(select cc from B where cc=A.cc)
效率高,用到了B表上cc列的索引。
相反的2:
select * from B where cc in (select cc from A)
效率高,用到了B表上cc列的索引;
select * from B where exists(select cc from A where cc=B.cc)
效率低,用到了A表上cc列的索引。
not in 和not exists
如果查询语句使用了not in 那么内外表都进行全表扫描,没有用到索引;而not extsts 的子查询依然能用到表上的索引。所以无论那个表大,用not exists都比not in要快。
not in 逻辑上不完全等同于not exists,如果你误用了not in,小心你的程序存在致命的BUG:
请看下面的例子:
create table t1 (c1 number,c2 number);
create table t2 (c1 number,c2 number);
insert into t1 values (1,2);
insert into t1 values (1,3);
insert into t2 values (1,2);
insert into t2 values (1,null);
SQL> select * from t1;
C1 C2
---------- ----------
1 2
1 3
SQL>
SQL> select * from t2;
C1 C2
---------- ----------
1 2
1
--查询如下:
SQL> select * from t1 where c2 not in (select c2 from t2);
no rows selected
SQL> select * from t1 where not exists (select 1 from t2 where t1.c2=t2.c2);
C1 C2
---------- ----------
1 3
正如所看到的,not in 出现了不期望的结果集,存在逻辑错误。如果看一下上述两个select语句的执行计划,也会不同。后者使用了hash_aj。
因此,请尽量不要使用not in(它会调用子查询),而尽量使用not exists(它会调用关联子查询)。如果子查询中返回的任意一条记录含有空值,则查询将不返回任何记录,正如上面例子所示。 除非子查询字段有非空限制,这时可以使用not in ,并且也可以通过提示让它使用hasg_aj或merge_aj连接
三、实验测试in,exists,not in, not exists
创建测试表test1/test2:
SQL> create table test1 as select object_id,object_name,object_type from dba_objects;
Table created.
SQL> select count(1) from test1;
COUNT(1)
----------
87042
SQL> create table test2 as select object_id,object_name,object_type from dba_objects where rownum <5000;
Table created.
SQL> select count(1) from test2;
COUNT(1)
----------
4999
SQL> alter table test1 add constraint pk_test1 primary key(object_id);
Table altered.
SQL> alter table test2 add constraint pk_test2 primary key(object_id);
Table altered.
SQL> create index idx_test1_name on test1(object_name);
Index created.
SQL> create index idx_test2_name on test2(object_name);
Index created.
SQL>
SQL> select TABLE_OWNER,table_name,index_name,status from dba_indexes where table_name like 'TEST%';
TABLE_OWNER TABLE_NAME INDEX_NAME STATUS
------------------------------ ------------------------------ ------------------------------ --------
HR TEST1 PK_TEST1 VALID
HR TEST1 IDX_TEST1_NAME VALID
HR TEST2 PK_TEST2 VALID
HR TEST2 IDX_TEST2_NAME VALID
SQL> col COLUMN_NAME for a15
SQL> select TABLE_OWNER,TABLE_NAME,INDEX_NAME,COLUMN_NAME,COLUMN_POSITION from dba_ind_columns where table_name like 'TEST%';
TABLE_OWNER TABLE_NAME INDEX_NAME COLUMN_NAME COLUMN_POSITION
------------------------------ ------------------------------ ------------------------------ --------------- ---------------
HR TEST1 PK_TEST1 OBJECT_ID 1
HR TEST1 IDX_TEST1_NAME OBJECT_NAME 1
HR TEST2 PK_TEST2 OBJECT_ID 1
HR TEST2 IDX_TEST2_NAME OBJECT_NAME 1
1.in & exists 测试
select a.* from test1 a where a.object_name in (select b.object_name from test2 b );
select a.* from test1 a where exists (select 1 from test2 b where a.object_name=b.object_name);
1.1 in 的测试
SQL> set autot trace;
SQL> set lines 200
SQL> select a.* from test1 a where a.object_name in (select b.object_name from test2 b );
5017 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 311842540
----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 41686 | 6350K| 158 (1)| 00:00:02 |
|* 1 | HASH JOIN RIGHT SEMI | | 41686 | 6350K| 158 (1)| 00:00:02 |
| 2 | INDEX FAST FULL SCAN| IDX_TEST2_NAME | 4999 | 322K| 9 (0)| 00:00:01 |
| 3 | TABLE ACCESS FULL | TEST1 | 93550 | 8222K| 149 (1)| 00:00:02 |
----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("A"."OBJECT_NAME"="B"."OBJECT_NAME")
Note
-----
- dynamic sampling used for this statement (level=2)
Statistics
----------------------------------------------------------
26 recursive calls
0 db block gets
1009 consistent gets
20 physical reads
0 redo size
205691 bytes sent via SQL*Net to client
4193 bytes received via SQL*Net from client
336 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
5017 rows processed
1.2 exists的测试
SQL> select a.* from test1 a where exists (select 1 from test2 b where a.object_name=b.object_name);
5017 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 311842540
----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 41686 | 6350K| 158 (1)| 00:00:02 |
|* 1 | HASH JOIN RIGHT SEMI | | 41686 | 6350K| 158 (1)| 00:00:02 |
| 2 | INDEX FAST FULL SCAN| IDX_TEST2_NAME | 4999 | 322K| 9 (0)| 00:00:01 |
| 3 | TABLE ACCESS FULL | TEST1 | 93550 | 8222K| 149 (1)| 00:00:02 |
----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
--------------------------------------------------
1 - access("A"."OBJECT_NAME"="B"."OBJECT_NAME")
Note
-----
- dynamic sampling used for this statement (level=2)
Statistics
----------------------------------------------------------
7 recursive calls
0 db block gets
989 consistent gets
0 physical reads
0 redo size
205691 bytes sent via SQL*Net to client
4193 bytes received via SQL*Net from client
336 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
5017 rows processed
1.3总结
通过比较执行计划我们可以发现in和exists的执行计划基本是一致的,都以小表(test2)作为外部循环表,然后大表(test1)作为内部循环表,所以执行效率是一样的。
注意: 这里我们把test1和test2的位置交换一下oracle依然能做出正确的判断,以小表作为外部循环表:
select a.* from test2 a where a.object_name in (select b.object_name from test1 b );
select a.* from test2 a where exists (select 1 from test1 b where b.object_name=a.object_name);
SQL> select a.* from test2 a where a.object_name in (select b.object_name from test1 b );
4999 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 2045358773
----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 4999 | 761K| 135 (1)| 00:00:02 |
|* 1 | HASH JOIN SEMI | | 4999 | 761K| 135 (1)| 00:00:02 |
| 2 | TABLE ACCESS FULL | TEST2 | 4999 | 439K| 9 (0)| 00:00:01 |
| 3 | INDEX FAST FULL SCAN| IDX_TEST1_NAME | 93550 | 6029K| 125 (0)| 00:00:02 |
----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("A"."OBJECT_NAME"="B"."OBJECT_NAME")
Note
-----
- dynamic sampling used for this statement (level=2)
Statistics
----------------------------------------------------------
7 recursive calls
0 db block gets
836 consistent gets
369 physical reads
0 redo size
190890 bytes sent via SQL*Net to client
4182 bytes received via SQL*Net from client
335 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
4999 rows processed
SQL>
SQL> select a.* from test2 a where exists (select 1 from test1 b where b.object_name=a.object_name);
4999 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 2045358773
----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 4999 | 761K| 135 (1)| 00:00:02 |
|* 1 | HASH JOIN SEMI | | 4999 | 761K| 135 (1)| 00:00:02 |
| 2 | TABLE ACCESS FULL | TEST2 | 4999 | 439K| 9 (0)| 00:00:01 |
| 3 | INDEX FAST FULL SCAN| IDX_TEST1_NAME | 93550 | 6029K| 125 (0)| 00:00:02 |
----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("B"."OBJECT_NAME"="A"."OBJECT_NAME")
Note
-----
- dynamic sampling used for this statement (level=2)
Statistics
----------------------------------------------------------
7 recursive calls
0 db block gets
836 consistent gets
0 physical reads
0 redo size
190890 bytes sent via SQL*Net to client
4182 bytes received via SQL*Net from client
335 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
4999 rows processed
2. not in & not exists 测试
2.1 not in 测试
select a.* from test1 a where a.object_name not in (select b.object_name from test2 b );
select a.* from test2 a where a.object_name not in (select b.object_name from test1 b );
SQL> select a.* from test1 a where a.object_name not in (select b.object_name from test2 b );
82025 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 3618458784
---------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 93550 | 13M| 158 (1)| 00:00:02 |
|* 1 | HASH JOIN RIGHT ANTI NA| | 93550 | 13M| 158 (1)| 00:00:02 |
| 2 | TABLE ACCESS FULL | TEST2 | 4999 | 322K| 9 (0)| 00:00:01 |
| 3 | TABLE ACCESS FULL | TEST1 | 93550 | 8222K| 149 (1)| 00:00:02 |
---------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("A"."OBJECT_NAME"="B"."OBJECT_NAME")
Note
-----
- dynamic sampling used for this statement (level=2)
Statistics
----------------------------------------------------------
7 recursive calls
0 db block gets
6089 consistent gets
0 physical reads
0 redo size
4297213 bytes sent via SQL*Net to client
60667 bytes received via SQL*Net from client
5470 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
82025 rows processed
SQL>
SQL> select a.* from test2 a where a.object_name not in (select b.object_name from test1 b );
no rows selected
Execution Plan
----------------------------------------------------------
Plan hash value: 1072756877
----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 50 | 2600 | 158 (1)| 00:00:02 |
|* 1 | HASH JOIN ANTI NA | | 50 | 2600 | 158 (1)| 00:00:02 |
| 2 | TABLE ACCESS FULL| TEST2 | 4999 | 131K| 9 (0)| 00:00:01 |
| 3 | TABLE ACCESS FULL| TEST1 | 87042 | 2125K| 149 (1)| 00:00:02 |
----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("A"."OBJECT_NAME"="B"."OBJECT_NAME")
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
551 consistent gets
17 physical reads
0 redo size
483 bytes sent via SQL*Net to client
508 bytes received via SQL*Net from client
1 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
0 rows processed
2.2 not exists 测试
select a.* from test1 a where not exists (select 1 from test2 b where a.object_name=b.object_name);
select a.* from test2 a where not exists (select 1 from test1 b where a.object_name=b.object_name);
SQL> select a.* from test1 a where not exists (select 1 from test2 b where a.object_name=b.object_name);
82025 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 3712868752
----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 93550 | 13M| 158 (1)| 00:00:02 |
|* 1 | HASH JOIN RIGHT ANTI | | 93550 | 13M| 158 (1)| 00:00:02 |
| 2 | INDEX FAST FULL SCAN| IDX_TEST2_NAME | 4999 | 322K| 9 (0)| 00:00:01 |
| 3 | TABLE ACCESS FULL | TEST1 | 93550 | 8222K| 149 (1)| 00:00:02 |
----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("A"."OBJECT_NAME"="B"."OBJECT_NAME")
Note
-----
- dynamic sampling used for this statement (level=2)
Statistics
----------------------------------------------------------
7 recursive calls
0 db block gets
6088 consistent gets
0 physical reads
0 redo size
4297213 bytes sent via SQL*Net to client
60667 bytes received via SQL*Net from client
5470 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
82025 rows processed
SQL> select a.* from test2 a where not exists (select 1 from test1 b where a.object_name=b.object_name);
no rows selected
Execution Plan
----------------------------------------------------------
Plan hash value: 319202148
----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 4999 | 761K| 135 (1)| 00:00:02 |
|* 1 | HASH JOIN ANTI | | 4999 | 761K| 135 (1)| 00:00:02 |
| 2 | TABLE ACCESS FULL | TEST2 | 4999 | 439K| 9 (0)| 00:00:01 |
| 3 | INDEX FAST FULL SCAN| IDX_TEST1_NAME | 93550 | 6029K| 125 (0)| 00:00:02 |
----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("A"."OBJECT_NAME"="B"."OBJECT_NAME")
Note
-----
- dynamic sampling used for this statement (level=2)
Statistics
----------------------------------------------------------
7 recursive calls
0 db block gets
572 consistent gets
0 physical reads
0 redo size
483 bytes sent via SQL*Net to client
508 bytes received via SQL*Net from client
1 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
0 rows processed
2.3 总结
通过观察这个两个执行计划我们可以清楚的看到not in 用的是filter来连接两张表,而not exists 用的hash join 来连接两张表。对于只有两个子节点的filter来说,我们可以把这个filter看成是一个nested loop。
通过执行计划我们可以清楚的看到,oracle把test2表的结果集作为构造输入(build input),test1的结果集作为探测数据(probe input),然后把这两个子集做hash join。
(对于hash join 一般把小表作为build input,这些数据将存放在PGA内存中,如果内存不够将放在temporary tablespace)