no zuo no die系列，来自于pg的wiki。

这是第四部分，不要使用NOT IN。
理由是:

1.NOT IN behaves in unexpected ways if there is a null present
2.Because of point 1 above, NOT IN (SELECT …) does not optimize very well. In particular, the planner can’t transform it into an anti-join, and so it becomes either a hashed Subplan or a plain Subplan. The hashed subplan is fast, but the planner only allows that plan for small result sets; the plain subplan is horrifically slow (in fact O(N²)). This means that the performance can look good in small-scale tests but then slow down by 5 or more orders of magnitude once a size threshold is crossed; you do not want this to happen.

第一个理由是如存在null的话，会有未如预期的结果。

[local]:5432 pg12@testdb=# select * from tbl;
 id | value 
----+-------
  1 |     2
(1 row)
Time: 1.359 ms
[local]:5432 pg12@testdb=# select * from tbl where id not in (1,null);
 id | value 
----+-------
(0 rows)
Time: 1.443 ms

原因是任意值跟null的运算结果都是null，不是false也不是true，因此没有结果返回。

[local]:5432 pg12@testdb=# create table t_null(id int);
CREATE TABLE
Time: 168.932 ms
[local]:5432 pg12@testdb=# insert into t_null values(1);
INSERT 0 1
Time: 6.136 ms
[local]:5432 pg12@testdb=# insert into t_null values(null);
INSERT 0 1
Time: 2.611 ms
[local]:5432 pg12@testdb=# select * from tbl a where a.id not in (select b.id from t_null b);
 id | value 
----+-------
(0 rows)
Time: 6.682 ms
[local]:5432 pg12@testdb=#

第二个理由是在结果集很大的情况下会有性能问题（实际上，o( ￣︶￣)o）

[local]:5432 pg12@testdb=# explain verbose select * from tbl a where a.id not in (select b.id from t_big_null b);
                                         QUERY PLAN                                          
---------------------------------------------------------------------------------------------
 Seq Scan on public.tbl a  (cost=0.00..291892417.45 rows=1130 width=8)
   Output: a.id, a.value
   Filter: (NOT (SubPlan 1))
   SubPlan 1
     ->  Materialize  (cost=0.00..233311.72 rows=10000048 width=4)
           Output: b.id
           ->  Seq Scan on public.t_big_null b  (cost=0.00..144248.48 rows=10000048 width=4)
                 Output: b.id
(8 rows)
Time: 2.552 ms
[local]:5432 pg12@testdb=# explain verbose select * from tbl a where not exists (select b.id from t_big_null b where a.id = b.id);
                                        QUERY PLAN                                         
-------------------------------------------------------------------------------------------
 Hash Anti Join  (cost=308312.08..347459.58 rows=1 width=8)
   Output: a.id, a.value
   Hash Cond: (a.id = b.id)
   ->  Seq Scan on public.tbl a  (cost=0.00..32.60 rows=2260 width=8)
         Output: a.id, a.value
   ->  Hash  (cost=144248.48..144248.48 rows=10000048 width=4)
         Output: b.id
         ->  Seq Scan on public.t_big_null b  (cost=0.00..144248.48 rows=10000048 width=4)
               Output: b.id
(9 rows)
Time: 2.960 ms

实际执行SQL：

[local]:5432 pg12@testdb=# select * from tbl a where a.id not in (select b.id from t_big_null b);
 id | value 
----+-------
(0 rows)
Time: 2.176 ms
[local]:5432 pg12@testdb=# select * from tbl a where not exists (select b.id from t_big_null b where a.id = b.id);
 id | value 
----+-------
(0 rows)
Time: 3540.645 ms (00:03.541)
[local]:5432 pg12@testdb=#

执行计划，not exists的成本比not in要低，但实际上，not in的执行要比not exists要快很多！

参考资料
Don’t Do This

PostgreSQL DBA(106) - pgAdmin(Don't do this：NOT IN)