Height-Balanced Histograms:列中不同值的个数小于直方图的桶数,每个桶的存放值的数量不同。
Frequency Histograms:列中不同值的个数大于直方图的桶数,每个桶存放值的数量相同,最后一个桶除外。
The database automatically creates frequency histograms instead of height-balanced histograms under the following conditions:
-
The number of distinct values is less than or equal to the number of histogram buckets specified (up to 254).
-
It is not true that each column value repeats only once.
收集方法:GATHER_TABLE_STATS==》method_opt 通过size参数指定直方图的桶数:
SIZE {integer | REPEAT | AUTO | SKEWONLY}
auto:基于列的使用信息(sys.col_usage$)和是否存在数据倾斜来收集直方图
integer:人为的指定创建直方图的桶数范围是1到254,如果size 1意味着不创建直方图
repeat:只会对已经存在直方图的列重新生成直方图.如果是一个分区表,repeat会确保对在全局级别存在直方图的列重新生成直方图.这是不被推荐的设置的.当前直方图的桶数将会作为重新生成直方图所使用的桶数的最大值.比如,当前直方图的桶数是5,那么生成的直方图最大桶数就是5,说的直白点就是刷新现有直方图的列上的统计信息.
skewonly:对任何数据分布出现倾斜列的自动创建直方图
建立测试表:
Begin
Create table histest as select rownum id,10000 sal from dual connect by level <10000;
update histest set sal=id where rownum< =10;
update histest set sal=9 where id between '100' and '200';
update histest set sal=8 where id between '1000' and '5000';
commit;
select sal,count(*) from histest group by sal;
不收集柱状图:
exec dbms_stats.gather_table_stats('SCOTT','HISTEST',method_opt=>'for all columns size 1');
select column_name,num_distinct,density,HISTOGRAM from user_tab_col_statistics where table_name='HISTEST';
//density=1/num_distinct
COLUMN_NAME NUM_DISTINCT DENSITY HISTOGRAM
-------------------------------------------------- ------------ ---------- ---------------
ID 9999 .00010001 NONE
SAL 11 .090909091 NONE
收集柱状图:
exec dbms_stats.gather_table_stats('SCOTT','HISTEST',method_opt=>'for all columns size 11');
select column_name,num_distinct,density,HISTOGRAM from user_tab_col_statistics where table_name='HISTEST' and COLUMN_NAME='SAL';
COLUMN_NAME NUM_DISTINCT DENSITY HISTOGRAM
-------------------------------------------------- ------------ ---------- ---------------
SAL 11 .000050005 FREQUENCY
--查看直方图信息:
select column_name,endpoint_number,endpoint_value from user_tab_histograms where
table_name='HISTEST' and COLUMN_NAME='SAL';
ENDPOINT_NUMBER:累计的行数
ENDPOINT_VALUE:真实列值
COLUMN_NAME ENDPOINT_NUMBER ENDPOINT_VALUE
-------------------------------------------------- --------------- --------------
SAL 1 1
SAL 2 2
SAL 3 3
SAL 4 4
SAL 5 5
SAL 6 6
SAL 7 7
SAL 4009 8
SAL 4111 9
SAL 4112 10 <=== 计算列值为10的个数4112-4111
SAL 9999 10000 <=== 计算列值为10000的个数9999-4112
高度平衡直方图:
exec dbms_stats.gather_table_stats('SCOTT','HISTEST',method_opt=>'for all columns size 5');
COLUMN_NAME NUM_DISTINCT DENSITY HISTOGRAM
-------------------------------------------------- ------------ ---------- ---------------
SAL 11 .009466401 HEIGHT BALANCED
COLUMN_NAME ENDPOINT_NUMBER ENDPOINT_VALUE
-------------------------------------------------- --------------- --------------
SAL 0 1
SAL 2 8
SAL 5 10000
----这里buckets 1,2都是用8作为它的endpoint所以bucket 1为了节省空间没有被存储.下面的查询能用来显示桶数和它的endpoinit值
----这里buckets 3到5都是用10000作为它的endpoint所以bucket 3到4为了节省空间没有被存储.下面的查询能用来显示桶数和它的endpoinit值
BUCKET_NUMBER:桶的编号。
ENDPOINT_VALUE:桶中的值的最大值。
为了节省空间进行了合并:
| BUCKET_NUMBER | ENDPOINT_VALUE | BUCKET_NUMBER | ENDPOINT_VALUE | |
| 0 | 1 | 2 | 8 | |
| 1 | 8 | 1 | 8 | |
| 2 | 8 | 5 | 10000 | |
| 3 | 10000 | |||
| 4 | 10000 | |||
| 5 | 10000 |
BUCKET_NUMBER ENDPOINT_VALUE
------------- --------------
1 8
2 8
3 10000
4 10000
5 10000
---
SELECT bucket_number, max(sal) AS endpoint_value
FROM (
SELECT sal, ntile(5) OVER (ORDER BY sal) AS bucket_number
FROM histest)
GROUP BY bucket_number
ORDER BY bucket_number;