这里介绍的的是一种SQL查询方法,颠覆您日常以select开始的SQL查询写法。
这种神奇的使用方法的背后隐藏着性能的提升!
我们一起来体验一下其中的奥妙。
1.创建表T,并简单初始化几条数据
sec@ora10g> create table t (x number(10), y number(10));
sec@ora10g> insert into t values (1,110);
sec@ora10g> insert into t values (2,120);
sec@ora10g> insert into t values (2,80);
sec@ora10g> insert into t values (3,150);
sec@ora10g> insert into t values (3,30);
sec@ora10g> insert into t values (3,60);
sec@ora10g> commit;
sec@ora10g> select * from t;
X Y
---------- ----------
1 110
2 120
2 80
3 150
3 30
3 60
6 rows selected.
2.需求描述
按照x列分组后统计y列的总值,我们的最终目标是选出比y列总值的三分之一大的那些分组统计信息。
3.使用子查询方式实现
这可能是大家最容易想到的方法。
sec@ora10g> SELECT x, SUM (y) AS total_y
2 FROM t
3 GROUP BY x
4 HAVING SUM (y) > (SELECT SUM (y) / 3 FROM t)
5 ORDER BY total_y
6 /
X TOTAL_Y
---------- ----------
2 200
3 240
4.WITH Clause方法闪亮登场
sec@ora10g> WITH secooler_sum AS (SELECT x, SUM (y) total_y
2 FROM t
3 GROUP BY x)
4 SELECT x, total_y
5 FROM secooler_sum
6 WHERE total_y > (SELECT SUM (total_y) / 3 FROM secooler_sum)
7 ORDER BY total_y
8 /
X TOTAL_Y
---------- ----------
2 200
3 240
是不是很神奇!这里的查询语句不是以select开始的,而是以“WITH”关键字开头。可以认为Oracle在真正进行查询之前预先构造了一个临时表secooler_sum,之后我们便可多次使用它做进一步的分析和处理。
5.WITH Clause方法的优点
使用WITH Clause方法有什么好处呢?
首先,增加了SQL的易读性,如果构造了多个子查询,结构会更清晰;
更重要的是:“一次分析,多次使用”,这也是为什么会提供性能的地方,达到了“少读”的目标。
6.知其所以然
为什么WITH Clause方法会提高效率?通过查看上面两种方法的执行计划便可略知一二。
1)使用子查询的执行计划
sec@ora10g> set autot trace exp
sec@ora10g> SELECT x, SUM (y) AS total_y
2 FROM t
3 GROUP BY x
4 HAVING SUM (y) > (SELECT SUM (y) / 3 FROM t)
5 ORDER BY total_y
6 /
Execution Plan
----------------------------------------------------------
Plan hash value: 4167292448
-----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 6 | 156 | 5 (40)| 00:00:01 |
| 1 | SORT ORDER BY | | 6 | 156 | 5 (40)| 00:00:01 |
|* 2 | FILTER | | | | | |
| 3 | HASH GROUP BY | | 6 | 156 | 5 (40)| 00:00:01 |
| 4 | TABLE ACCESS FULL| T | 6 | 156 | 3 (0)| 00:00:01 |
| 5 | SORT AGGREGATE | | 1 | 13 | | |
| 6 | TABLE ACCESS FULL| T | 6 | 78 | 3 (0)| 00:00:01 |
-----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter(SUM("Y")> (SELECT SUM("Y")/3 FROM "T" "T"))
Note
-----
- dynamic sampling used for this statement
2)使用子WITH Clause的执行计划
sec@ora10g> WITH secooler_sum AS (SELECT x, SUM (y) total_y
2 FROM t
3 GROUP BY x)
4 SELECT x, total_y
5 FROM secooler_sum
6 WHERE total_y > (SELECT SUM (total_y) / 3 FROM secooler_sum)
7 ORDER BY total_y
8 /
Execution Plan
----------------------------------------------------------
Plan hash value: 706070671
---------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 6 | 156 | 9 (23)| 00:00:01 |
| 1 | TEMP TABLE TRANSFORMATION | | | | | |
| 2 | LOAD AS SELECT | | | | | |
| 3 | HASH GROUP BY | | 6 | 156 | 4 (25)| 00:00:01 |
| 4 | TABLE ACCESS FULL | T | 6 | 156 | 3 (0)| 00:00:01 |
| 5 | SORT ORDER BY | | 6 | 156 | 5 (20)| 00:00:01 |
|* 6 | VIEW | | 6 | 156 | 2 (0)| 00:00:01 |
| 7 | TABLE ACCESS FULL | SYS_TEMP_0FD9D6660_23A196E | 6 | 156 | 2 (0)| 00:00:01 |
| 8 | SORT AGGREGATE | | 1 | 13 | | |
| 9 | VIEW | | 6 | 78 | 2 (0)| 00:00:01 |
| 10 | TABLE ACCESS FULL | SYS_TEMP_0FD9D6660_23A196E | 6 | 156 | 2 (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
6 - filter("TOTAL_Y"> (SELECT SUM("TOTAL_Y")/3 FROM (SELECT /*+ CACHE_TEMP_TABLE ("T1") */
"C0" "X","C1" "TOTAL_Y" FROM "SYS"."SYS_TEMP_0FD9D6660_23A196E" "T1") "SECOOLER_SUM"))
Note
-----
- dynamic sampling used for this statement
可见,第一种使用子查询的方法T表被扫描了两次,而使用WITH Clause方法,T表仅被扫描一次。
这也是为什么在大型数据仓库系统中推荐使用WITH Clause方法进行查询统计的原因,这样可以大大的提高数据分析和查询的效率。
另外,观察WITH Clause方法执行计划,其中“SYS_TEMP_0FD9D6660_23A196E”便是在运行过程中构造的中间统计结果临时表。
7.Oracle官方文档中有关WITH Clause的简单描述
Computation Using the WITH Clause
The WITH clause (formally known as subquery_factoring_clause) enables you to reuse the same query block in a SELECT statement when it occurs more than once within a complex query. WITH is a part of the SQL-99 standard. This is particularly useful when a query has multiple references to the same query block and there are joins and aggregations. Using the WITH clause, Oracle retrieves the results of a query block and stores them in the user's temporary tablespace. Note that Oracle Database does not support recursive use of the WITH clause.
参考链接:http://download.oracle.com/docs/cd/B19306_01/server.102/b14223/aggreg.htm#sthref1670
8.小结
WITH Clause方法在数据仓库或大数据量查询中有着自己的优势。
一切提高性能的措施都是推崇的。灵活掌握,裨益无限。在遇到具体问题时可善加利用。
Good luck.
secooler
10.03.22
-- The End --