NULL列时,如何使得IS NULL或者IS NOT NULL可以使用索引来提高查询效率
-
zhang41082
2019-06-30 09:18:04
-
数据库开发技术
-
原创
虽然我们一直强调在进行数据库设计的时候,要尽量设置所有的列为NOT
NULL,这样的最直接的好处就是假如这个列上有一个索引,那么对这个表进行COUNT(*)统计的时候(假设这个表只有这一列有索引,并且索引全扫描比
全表扫描COST低),ORACLE会直接对这个列的索引进行一个快速扫描得到COUNT结果,而不是进行一个全表扫描。虽然B*TREE索引是不存储
NULL值的,但是因为你定义列的时候明确的告诉ORACLE这一列是NOT
NULL的,所以ORACLE会知道扫描索引得到的结果跟全表扫描得到的结果是一致的(因为列的值都是NOT
NULL的,也就是说列的所有的值都会存储在了索引中)。但是,很多时候,有些列确实就是有空值存在,不能设置为NOT
NULL。当然,这个时候可以使用DEFAULT值来代替空值从而使得列非空。或者单纯为了统计的时候可以在列上建BITMAP索引使得NULL也被包含
在索引中。
这里提到的将是另外方法,不改变列的属性,而是通过改变索引来实现同样的功能。[@more@]
create table bear(id number,c1 char(1000),c2 char(1000));
首先创建一个表,为了使得扫描索引的COST跟扫描全表的COST拉开差距,这里跟了两个CHAR(1000)的列
create sequence seq_bear;
建一个SEQUENCE用来生成ID列的数据
SQL> begin
2 for r in 1..10000 loop
3 insert into bear(id,c1,c2) values(seq_bear.nextval,'c1','c2');
4 end loop;
5 commit;
6 end;
7 /
SQL> create index idx_bear on bear(id);
SQL> analyze table bear compute statistics for table for all indexes for all indexed columns;
插入1万条数据并建立ID字段上的索引,同时对表和索引进行分析。
1、先来看直接对表进行COUNT的情况:
SQL> select count(*) from bear;
Execution Plan
----------------------------------------------------------
Plan hash value: 2093678280
-------------------------------------------------------------------
| Id | Operation | Name | Rows | Cost (%CPU)| Time |
-------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 907 (1)| 00:00:11 |
| 1 | SORT AGGREGATE | | 1 | | |
| 2 | TABLE ACCESS FULL| BEAR | 10000 | 907 (1)| 00:00:11 |
-------------------------------------------------------------------
这个时候虽然ID列上有索引,并且所有值都非空,但是ORACLE无法判断里面是否有空值,所以是一个全表扫描。同样IS NULL查询也是一个全表扫描:
SQL> select count(*) from bear where id is null;
Execution Plan
----------------------------------------------------------
Plan hash value: 2093678280
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 3 | 907 (1)| 00:00:11 |
| 1 | SORT AGGREGATE | | 1 | 3 | | |
|* 2 | TABLE ACCESS FULL| BEAR | 1 | 3 | 907 (1)| 00:00:11 |
---------------------------------------------------------------------------
但是IS NOT NULL查询会是一个索引扫描
SQL> select count(*) from bear where id is not null;
Execution Plan
----------------------------------------------------------
Plan hash value: 4145915393
----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 3 | 7 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 3 | | |
|* 2 | INDEX FAST FULL SCAN| IDX_BEAR | 10000 | 30000 | 7 (0)| 00:00:01 |
----------------------------------------------------------------------------------
2、修改ID列为NOT NULL,然后从新进行COUNT:
SQL> alter table bear modify id not null;
Table altered.
SQL> select count(*) from bear;
Execution Plan
----------------------------------------------------------
Plan hash value: 4145915393
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 7 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | | |
| 2 | INDEX FAST FULL SCAN| IDX_BEAR | 10000 | 7 (0)| 00:00:01 |
--------------------------------------------------------------------------
可以看到,修改为NOT NULL后,ORACLE明确的知道这个索引中不包含NULL值,所以进行了一个索引快速全扫描。同时,IS NULL和IS NOT NULL也都会使用一个索引扫描:
SQL> select count(*) from bear where id is null;
Execution Plan
----------------------------------------------------------
Plan hash value: 1833075309
-----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 3 | 0 (0)| |
| 1 | SORT AGGREGATE | | 1 | 3 | | |
|* 2 | FILTER | | | | | |
| 3 | INDEX FAST FULL SCAN| IDX_BEAR | 10000 | 30000 | 7 (0)| 00:00:01 |
-----------------------------------------------------------------------------------
其实这里只是一个执行计划,ORACLE貌似不走这个SCAN直接就返回结果了,这个从执行后的consistent gets为0可以看的出来
SQL> select count(*) from bear where id is not null;
Execution Plan
----------------------------------------------------------
Plan hash value: 4145915393
----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 3 | 7 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 3 | | |
| 2 | INDEX FAST FULL SCAN| IDX_BEAR | 10000 | 30000 | 7 (0)| 00:00:01 |
----------------------------------------------------------------------------------
4、接下来我们修改ID列为允许NULL,并且调整ID列上的索引:
SQL> alter table bear modify id null;
Table altered.
SQL> drop index idx_bear;
Index dropped.
SQL> create index idx_bear_null on bear(id,0);
Index created.
5、再次执行COUNT,IS NULL和IS NOT NULL相关查询:
SQL> select count(*) from bear;
Execution Plan
----------------------------------------------------------
Plan hash value: 2000879051
-------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Cost (%CPU)| Time |
-------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 8 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | | |
| 2 | INDEX FAST FULL SCAN| IDX_BEAR_NULL | 10000 | 8 (0)| 00:00:01 |
-------------------------------------------------------------------------------
SQL> select count(*) from bear where id is null;
Execution Plan
----------------------------------------------------------
Plan hash value: 1944383045
-----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 3 | 1 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 3 | | |
|* 2 | INDEX RANGE SCAN| IDX_BEAR_NULL | 1 | 3 | 1 (0)| 00:00:01 |
-----------------------------------------------------------------------------------
注意这里是一个RANGE SCAN,而不是FULL SCAN
SQL> select count(*) from bear where id is not null;
Execution Plan
----------------------------------------------------------
Plan hash value: 2000879051
---------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 3 | 8 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 3 | | |
|* 2 | INDEX FAST FULL SCAN| IDX_BEAR_NULL | 10000 | 30000 | 8 (0)| 00:00:01 |
---------------------------------------------------------------------------------------
可以看到,这三种情况都正确的使用了索引,从而提高了查询效率,但需要牺牲的就是单列索引变成了复合索引,增加了维护成本,占用了更多的空间,同时增加了索引扫描的成本。但这样索引中也就可以把ID列为NULL的值存了进去,从而实现了对NULL和IS NOT NULL时候的正确查询。
优化没有最好的办法,只有最合适的,很多时候,我们要根据不同的场景来选择最适合的办法。