exists和not exists及in和not in的用法与区别

一、exists和not exists的用法

exists : 强调的是是否返回结果集,不要求知道返回什么, 比如: select name from student where sex = 'm' and mark exists(select 1 from grade where ...) ,只要exists引导的子句有结果集返回,那么exists这个条件就算成立了,大家注意返回的字段始终为1,如果改成“select 2 from grade where ...”,那么返回的字段就是2,这个数字没有意义。所以exists子句不在乎返回什么,而是在乎是不是有结果集返回。

而 exists 与 in 最大的区别在于 in引导的子句只能返回一个字段,比如:
select name from student where sex = 'm' and mark in (select 1,2,3 from grade where ...),in子句返回了三个字段,这是不正确的,exists子句是允许的,但in只允许有一个字段返回,在1,2,3中随便去了两个字段即可。

而not exists 和not in 分别是exists 和 in 的 对立面。

exists (sql 返回结果集为真)
not exists (sql 不返回结果集为真)


表A
ID   NAME
1    A1
2    A2
3    A3

表B
ID   AID NAME
1    1   B1
2    2   B2
3    2   B3

表A和表B是1对多的关系 A.ID => B.AID

实验如下


SQL> create table a (id number,name varchar2(10));

Table created.

SQL> create table b (id number,aid number,name varchar2(10));

Table created.

SQL> insert into a values(1,'A1');

1 row created.

SQL> insert into a values(2,'A2');

1 row created.

SQL> insert into a values(3,'A3');

1 row created.

SQL> insert into b values(1,1,'B1');

1 row created.

SQL> insert into b values(2,2,'B2');

1 row created.

SQL> insert into b values(3,2,'B3');

1 row created.

SQL> commit;

SQL> select id,name from a;

        ID NAME
---------- ----------
         1 A1
         2 A2
         3 A3

SQL> select * from b;

        ID        AID NAME
---------- ---------- ----------
         1          1 B1
         2          2 B2
         3          2 B3

SQL> select * from b,a where a.id=b.aid;

        ID        AID NAME               ID NAME
---------- ---------- ---------- ---------- ----------
         1          1 B1                  1 A1
         3          2 B3                  2 A2
         2          2 B2                  2 A2

SQL> select id,name from a where exists (select 1 from b where a.id=b.aid);

        ID NAME
---------- ----------
         1 A1
         2 A2
         
SQL> select id,name from a where exists (select * from b where a.id=b.aid);

        ID NAME
---------- ----------
         1 A1
         2 A2
         
上述结果分析:原因可以按照如下分析
SELECT ID,NAME FROM A WHERE EXISTS (SELECT * FROM B WHERE B.AID=1)
--->SELECT * FROM B WHERE B.AID=1有值返回真所以有数据

SELECT ID,NAME FROM A WHERE EXISTS (SELECT * FROM B WHERE B.AID=2)
--->SELECT * FROM B WHERE B.AID=2有值返回真所以有数据

SELECT ID,NAME FROM A WHERE EXISTS (SELECT * FROM B WHERE B.AID=3)
--->SELECT * FROM B WHERE B.AID=3无值返回真所以没有数据

2.NOT EXISTS 就是反过来         
SQL> select id,name from a where not exists (select * from b where a.id=b.aid);

        ID NAME
---------- ----------         3 A3

3.in /not in
EXISTS = IN,意思相同不过语法上有点点区别,好像使用IN效率要差点,应该是不会执行索引的原因
SQL> SELECT ID,NAME FROM A WHERE ID IN (SELECT AID FROM B);

        ID NAME
---------- --------------------
         1 A1
         2 A2

NOT EXISTS = NOT IN ,意思相同不过语法上有点点区别
SQL> SELECT ID,NAME FROM A WHERE ID NOT IN(SELECT AID FROM B);

        ID NAME
---------- --------------------
         3 A3

二、in与exist , not in与not exist 的区别


in和exists   in 是把外表和内表作hash 连接,而exists是对外表作loop循环,每次loop循环再对内表进行查询。一直以来认为exists比in效率高的说法是不准确的。  

如果查询的两个表大小相当,那么用in和exists差别不大。  
如果两个表中一个较小,一个是大表,则子查询表大的用exists,子查询表小的用in:  
例如:表A(小表),表B(大表)
1:select * from A where cc in (select cc from B)  
效率低,用到了A表上cc列的索引;
select * from A where exists(select cc from B where cc=A.cc)  
效率高,用到了B表上cc列的索引。
 
相反的2:
select * from B where cc in (select cc from A)  
效率高,用到了B表上cc列的索引;
select * from B where exists(select cc from A where cc=B.cc)  
效率低,用到了A表上cc列的索引。  
 
not in 和not exists  
 
如果查询语句使用了not in 那么内外表都进行全表扫描,没有用到索引;而not extsts 的子查询依然能用到表上的索引。所以无论那个表大,用not exists都比not in要快。  

not in 逻辑上不完全等同于not exists,如果你误用了not in,小心你的程序存在致命的BUG:  
 
请看下面的例子:  
create table t1 (c1 number,c2 number);  
create table t2 (c1 number,c2 number);  
 
insert into t1 values (1,2);  
insert into t1 values (1,3);  
insert into t2 values (1,2);  
insert into t2 values (1,null);      

SQL> select * from t1;

        C1         C2
---------- ----------
         1          2
         1          3

SQL>
SQL> select * from t2;

        C1         C2
---------- ----------
         1          2
         1

--查询如下:
SQL> select * from t1 where c2 not in (select c2 from t2);  

no rows selected

SQL> select * from t1 where not exists (select 1 from t2 where t1.c2=t2.c2);

        C1         C2
---------- ----------
         1          3
 
正如所看到的,not in 出现了不期望的结果集,存在逻辑错误。如果看一下上述两个select语句的执行计划,也会不同。后者使用了hash_aj。  
因此,请尽量不要使用not in(它会调用子查询),而尽量使用not exists(它会调用关联子查询)。如果子查询中返回的任意一条记录含有空值,则查询将不返回任何记录,正如上面例子所示。  除非子查询字段有非空限制,这时可以使用not in ,并且也可以通过提示让它使用hasg_aj或merge_aj连接 

三、实验测试in,exists,not in, not exists

创建测试表test1/test2:

SQL> create table test1 as select object_id,object_name,object_type from dba_objects;

Table created.

SQL> select count(1) from test1;

  COUNT(1)
----------
     87042

SQL> create table test2 as select object_id,object_name,object_type from dba_objects where rownum <5000;

Table created.

SQL> select count(1) from test2;

  COUNT(1)
----------
      4999
      
SQL> alter table test1 add constraint pk_test1 primary key(object_id);

Table altered.

SQL> alter table test2 add constraint pk_test2 primary key(object_id);

Table altered.

SQL> create index idx_test1_name on test1(object_name);

Index created.

SQL> create index idx_test2_name on test2(object_name);

Index created.

SQL>
SQL> select TABLE_OWNER,table_name,index_name,status from dba_indexes where table_name like 'TEST%';

TABLE_OWNER                    TABLE_NAME                     INDEX_NAME                     STATUS
------------------------------ ------------------------------ ------------------------------ --------
HR                             TEST1                          PK_TEST1                       VALID
HR                             TEST1                          IDX_TEST1_NAME                 VALID
HR                             TEST2                          PK_TEST2                       VALID
HR                             TEST2                          IDX_TEST2_NAME                 VALID

SQL> col COLUMN_NAME for a15
SQL> select TABLE_OWNER,TABLE_NAME,INDEX_NAME,COLUMN_NAME,COLUMN_POSITION from dba_ind_columns where table_name like 'TEST%';

TABLE_OWNER                    TABLE_NAME                     INDEX_NAME                     COLUMN_NAME     COLUMN_POSITION
------------------------------ ------------------------------ ------------------------------ --------------- ---------------
HR                             TEST1                          PK_TEST1                       OBJECT_ID                     1
HR                             TEST1                          IDX_TEST1_NAME                 OBJECT_NAME                   1
HR                             TEST2                          PK_TEST2                       OBJECT_ID                     1
HR                             TEST2                          IDX_TEST2_NAME                 OBJECT_NAME                   1

1.in & exists 测试

select a.* from test1 a where a.object_name in (select b.object_name from test2 b );

select a.* from test1 a where exists (select 1 from test2 b where a.object_name=b.object_name);

1.1 in 的测试
SQL>  set autot trace;
SQL> set lines 200
SQL> select a.* from test1 a where a.object_name in (select b.object_name from test2 b );

5017 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 311842540
----------------------------------------------------------------------------------------
| Id  | Operation             | Name           | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |                | 41686 |  6350K|   158   (1)| 00:00:02 |
|*  1 |  HASH JOIN RIGHT SEMI |                | 41686 |  6350K|   158   (1)| 00:00:02 |
|   2 |   INDEX FAST FULL SCAN| IDX_TEST2_NAME |  4999 |   322K|     9   (0)| 00:00:01 |
|   3 |   TABLE ACCESS FULL   | TEST1          | 93550 |  8222K|   149   (1)| 00:00:02 |
----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("A"."OBJECT_NAME"="B"."OBJECT_NAME")

Note
-----
   - dynamic sampling used for this statement (level=2)

Statistics
----------------------------------------------------------
         26  recursive calls
          0  db block gets
       1009  consistent gets
         20  physical reads
          0  redo size
     205691  bytes sent via SQL*Net to client
       4193  bytes received via SQL*Net from client
        336  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
       5017  rows processed


1.2 exists的测试
SQL> select a.* from test1 a where exists (select 1 from test2 b where a.object_name=b.object_name);

5017 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 311842540
----------------------------------------------------------------------------------------
| Id  | Operation             | Name           | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |                | 41686 |  6350K|   158   (1)| 00:00:02 |
|*  1 |  HASH JOIN RIGHT SEMI |                | 41686 |  6350K|   158   (1)| 00:00:02 |
|   2 |   INDEX FAST FULL SCAN| IDX_TEST2_NAME |  4999 |   322K|     9   (0)| 00:00:01 |
|   3 |   TABLE ACCESS FULL   | TEST1          | 93550 |  8222K|   149   (1)| 00:00:02 |
----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
--------------------------------------------------
   1 - access("A"."OBJECT_NAME"="B"."OBJECT_NAME")

Note
-----
   - dynamic sampling used for this statement (level=2)

Statistics
----------------------------------------------------------
          7  recursive calls
          0  db block gets
        989  consistent gets
          0  physical reads
          0  redo size
     205691  bytes sent via SQL*Net to client
       4193  bytes received via SQL*Net from client
        336  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
       5017  rows processed

1.3总结
通过比较执行计划我们可以发现in和exists的执行计划基本是一致的,都以小表(test2)作为外部循环表,然后大表(test1)作为内部循环表,所以执行效率是一样的。

注意: 这里我们把test1和test2的位置交换一下oracle依然能做出正确的判断,以小表作为外部循环表:

select a.* from test2 a where a.object_name in (select b.object_name from test1 b );

select a.* from test2 a where exists (select 1 from test1 b where b.object_name=a.object_name);


SQL> select a.* from test2 a where a.object_name in (select b.object_name from test1 b );

4999 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 2045358773
----------------------------------------------------------------------------------------
| Id  | Operation             | Name           | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |                |  4999 |   761K|   135   (1)| 00:00:02 |
|*  1 |  HASH JOIN SEMI       |                |  4999 |   761K|   135   (1)| 00:00:02 |
|   2 |   TABLE ACCESS FULL   | TEST2          |  4999 |   439K|     9   (0)| 00:00:01 |
|   3 |   INDEX FAST FULL SCAN| IDX_TEST1_NAME | 93550 |  6029K|   125   (0)| 00:00:02 |
----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("A"."OBJECT_NAME"="B"."OBJECT_NAME")

Note
-----
   - dynamic sampling used for this statement (level=2)

Statistics
----------------------------------------------------------
          7  recursive calls
          0  db block gets
        836  consistent gets
        369  physical reads
          0  redo size
     190890  bytes sent via SQL*Net to client
       4182  bytes received via SQL*Net from client
        335  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
       4999  rows processed

SQL>
SQL> select a.* from test2 a where exists (select 1 from test1 b where b.object_name=a.object_name);

4999 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 2045358773
----------------------------------------------------------------------------------------
| Id  | Operation             | Name           | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |                |  4999 |   761K|   135   (1)| 00:00:02 |
|*  1 |  HASH JOIN SEMI       |                |  4999 |   761K|   135   (1)| 00:00:02 |
|   2 |   TABLE ACCESS FULL   | TEST2          |  4999 |   439K|     9   (0)| 00:00:01 |
|   3 |   INDEX FAST FULL SCAN| IDX_TEST1_NAME | 93550 |  6029K|   125   (0)| 00:00:02 |
----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("B"."OBJECT_NAME"="A"."OBJECT_NAME")
Note
-----
   - dynamic sampling used for this statement (level=2)

Statistics
----------------------------------------------------------
          7  recursive calls
          0  db block gets
        836  consistent gets
          0  physical reads
          0  redo size
     190890  bytes sent via SQL*Net to client
       4182  bytes received via SQL*Net from client
        335  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
       4999  rows processed

2. not in & not exists 测试
2.1 not in 测试

select a.* from test1 a where a.object_name not in (select b.object_name from test2 b );

select a.* from test2 a where a.object_name not in (select b.object_name from test1 b );

SQL> select a.* from test1 a where a.object_name not in (select b.object_name from test2 b );

82025 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 3618458784
---------------------------------------------------------------------------------
| Id  | Operation               | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |       | 93550 |    13M|   158   (1)| 00:00:02 |
|*  1 |  HASH JOIN RIGHT ANTI NA|       | 93550 |    13M|   158   (1)| 00:00:02 |
|   2 |   TABLE ACCESS FULL     | TEST2 |  4999 |   322K|     9   (0)| 00:00:01 |
|   3 |   TABLE ACCESS FULL     | TEST1 | 93550 |  8222K|   149   (1)| 00:00:02 |
---------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("A"."OBJECT_NAME"="B"."OBJECT_NAME")

Note
-----
   - dynamic sampling used for this statement (level=2)

Statistics
----------------------------------------------------------
          7  recursive calls
          0  db block gets
       6089  consistent gets
          0  physical reads
          0  redo size
    4297213  bytes sent via SQL*Net to client
      60667  bytes received via SQL*Net from client
       5470  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
      82025  rows processed

SQL>
SQL> select a.* from test2 a where a.object_name not in (select b.object_name from test1 b );

no rows selected

Execution Plan
----------------------------------------------------------
Plan hash value: 1072756877
----------------------------------------------------------------------------
| Id  | Operation          | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |       |    50 |  2600 |   158   (1)| 00:00:02 |
|*  1 |  HASH JOIN ANTI NA |       |    50 |  2600 |   158   (1)| 00:00:02 |
|   2 |   TABLE ACCESS FULL| TEST2 |  4999 |   131K|     9   (0)| 00:00:01 |
|   3 |   TABLE ACCESS FULL| TEST1 | 87042 |  2125K|   149   (1)| 00:00:02 |
----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("A"."OBJECT_NAME"="B"."OBJECT_NAME")

Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
        551  consistent gets
         17  physical reads
          0  redo size
        483  bytes sent via SQL*Net to client
        508  bytes received via SQL*Net from client
          1  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          0  rows processed


2.2 not exists 测试

select a.* from test1 a where not exists (select 1 from test2 b where a.object_name=b.object_name);

select a.* from test2 a where not exists (select 1 from test1 b where a.object_name=b.object_name);

SQL> select a.* from test1 a where not exists (select 1 from test2 b where a.object_name=b.object_name);

82025 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 3712868752
----------------------------------------------------------------------------------------
| Id  | Operation             | Name           | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |                | 93550 |    13M|   158   (1)| 00:00:02 |
|*  1 |  HASH JOIN RIGHT ANTI |                | 93550 |    13M|   158   (1)| 00:00:02 |
|   2 |   INDEX FAST FULL SCAN| IDX_TEST2_NAME |  4999 |   322K|     9   (0)| 00:00:01 |
|   3 |   TABLE ACCESS FULL   | TEST1          | 93550 |  8222K|   149   (1)| 00:00:02 |
----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("A"."OBJECT_NAME"="B"."OBJECT_NAME")

Note
-----
   - dynamic sampling used for this statement (level=2)

Statistics
----------------------------------------------------------
          7  recursive calls
          0  db block gets
       6088  consistent gets
          0  physical reads
          0  redo size
    4297213  bytes sent via SQL*Net to client
      60667  bytes received via SQL*Net from client
       5470  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
      82025  rows processed

SQL>  select a.* from test2 a where not exists (select 1 from test1 b where a.object_name=b.object_name);

no rows selected

Execution Plan
----------------------------------------------------------
Plan hash value: 319202148
----------------------------------------------------------------------------------------
| Id  | Operation             | Name           | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |                |  4999 |   761K|   135   (1)| 00:00:02 |
|*  1 |  HASH JOIN ANTI       |                |  4999 |   761K|   135   (1)| 00:00:02 |
|   2 |   TABLE ACCESS FULL   | TEST2          |  4999 |   439K|     9   (0)| 00:00:01 |
|   3 |   INDEX FAST FULL SCAN| IDX_TEST1_NAME | 93550 |  6029K|   125   (0)| 00:00:02 |
----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("A"."OBJECT_NAME"="B"."OBJECT_NAME")

Note
-----
   - dynamic sampling used for this statement (level=2)

Statistics
----------------------------------------------------------
          7  recursive calls
          0  db block gets
        572  consistent gets
          0  physical reads
          0  redo size
        483  bytes sent via SQL*Net to client
        508  bytes received via SQL*Net from client
          1  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          0  rows processed

2.3 总结
通过观察这个两个执行计划我们可以清楚的看到not in 用的是filter来连接两张表,而not exists 用的hash join 来连接两张表。对于只有两个子节点的filter来说,我们可以把这个filter看成是一个nested loop。

通过执行计划我们可以清楚的看到,oracle把test2表的结果集作为构造输入(build input),test1的结果集作为探测数据(probe input),然后把这两个子集做hash join。
(对于hash join 一般把小表作为build input,这些数据将存放在PGA内存中,如果内存不够将放在temporary tablespace)


请使用浏览器的分享功能分享到微信等