问题说明:
数据库迁移到新服务器后,某一张视图查询缓慢,原库查询耗时3分钟,新库查询5小时以上。
环境说明:
新库:
DB:Oracle 11.2.0.4.0 RAC OS:Redhat 7.6
原库:
DB:Oracle 11.2.0.4.0 单机 OS:AIX 5.1
问题分析:
处理此类问题,常见思路如下: 1 分别对比原库和新库该SQL对应的执行计划。 2 对比两个执行计划,找到差异部分或最耗时的部分。 3 分析执行计划差异原因,根据原因尝试解决问题。
本次案例视图引用的SQL使用了28个union all组成,关联了30多张表。
在执行如下SQL时,新库会卡住很久,甚至无法返回结果。 select count(*) from v_cjc_01;
视图内SQL类似如下:
create or replace view v_cjc_01 as select col1,col2,col3,...col10 from t1 where xxx=yyy union all select col1,col2,col3,...col10 from t2 where zzz=yyy union all ...... union all select col1,col2,col3,...col10 from t3 where xxx=yyy and col2 not in (select col2 from t3 a where col1=aaa col3=bbb) union all select col1,col2,col3,...col10 from t5 where ttt=yyy;
查看执行计划,发现主要有两处不同
一:
原执行计划
|* 40 | HASH JOIN | | 72408 | 2616K| 2264K| 28026 (1)| 00:05:37 | | 41 | VIEW | VW_JF_SET$5EF83EA7 | 72408 | 1414K| | 3028 (1)| 00:00:37 | | 42 | UNION-ALL | | | | | | | |* 43 | TABLE ACCESS FULL | CJC_T1_00000000000000000 | 72407 | 2050K| | 1514 (1)| 00:00:19 | |* 44 | TABLE ACCESS FULL | CJC_T1_00000000000000000 | 1 | 29 | | 1514 (1)| 00:00:19 | | 45 | INDEX FAST FULL SCAN | IDX_MB_ACCINF_ACCNO | 9179K| 148M| | 12212 (1)| 00:02:27 | ----------------------------------------------------------------------------------------------------------------------
新执行计划
| 13 | NESTED LOOPS | | 236 | 9676 | 2003 (1)| 00:00:25 | |* 14 | TABLE ACCESS FULL | CJC_T1_00000000000000000 | 235 | 5640 | 1533 (1)| 00:00:19 | |* 15 | INDEX RANGE SCAN | IDX_MB_ACCINF_ACCNO | 1 | 17 | 2 (0)| 00:00:01 | | 16 | NESTED LOOPS | | 1 | 41 | 1535 (1)| 00:00:19 | |* 17 | TABLE ACCESS FULL | CJC_T1_00000000000000000 | 1 | 24 | 1533 (1)| 00:00:19 | |* 18 | INDEX RANGE SCAN | IDX_MB_ACCINF_ACCNO | 1 | 17 | 2 (0)| 00:00:01 |
二:
原执行计划
|* 16 | HASH JOIN RIGHT ANTI NA | | 6647 | 1174K| 12M| 26593 (1)| 00:05:20 | |* 17 | TABLE ACCESS FULL | CJC_T2_00000000000000000 | 130K| 10M| | 11309 (1)| 00:02:16 | |* 18 | TABLE ACCESS FULL | CJC_T2_00000000000000000 | 664K| 60M| | 11309 (1)| 00:02:16 |
新执行计划
|* 21 | FILTER | | | | | | |* 22 | TABLE ACCESS FULL | CJC_T2_00000000000000000 | 4928 | 202K| 11126 (1)| 00:02:14 | |* 23 | TABLE ACCESS FULL | CJC_T2_00000000000000000 | 1 | 27 | 11124 (1)| 00:02:14 |
根据第一处不同,可以看到:
原执行计划,出现了VW_JF_SET$5EF83EA7,说明原库使用到了Join Factorization特性, 在该特性下,union all连接各个分支查询SQL时,会将各个分支部分中公共部分单独提出作为一个单独的结果集, 然后在和原union all剩下的部分做关联,从而减少了公共部分表重复扫描的问题,性能会有提升。
为什么新库没有使用到Join Factorization特性呢?
对比两个库的_OPTIMIZER_JOIN_FACTORIZATION参数,都是使用的默认值true。 说明优化器认为新库不适合使用Join Factorization特性。 但是从执行计划看两种表数据量都比较大,结果集也很大,新库使用的NESTED LOOPS明显比原库HASH JOIN性能差。
可以通过10053事件对比查看原库和新库对同一SQL是如何进行改写,如何计算cost的?
原库和新库分别执行:
alter session set tracefile_identifier='10053'; ALTER SESSION SET EVENTS='10053 trace name context forever, level 1'; select count(*) from v_cjc_01; ALTER SESSION SET EVENTS '10053 trace name context off'; select value from v$diag_info where name='Default Trace File';
查看生成的trace文件:
搜索关键字 Final query after transformations
新库SQL改写部分:
UNION ALL
(SELECT '0' "KERNAL_INTER4"
FROM "CHEN"."CJC_T1_00000000000000000" "A",
CHENPMB."MB_ACCINF" "B"
WHERE "B"."AIF_ACCNO" = "A"."CJC_T3_000000"
AND "A"."MPB_BUSINESSTYPE" = '10'
AND "A"."MPB_TRANTIME" IS NOT NULL
AND LENGTH("A"."MPB_TRANTIME") = 14
AND "A"."MPB_TRANTIME" > '20200501000000') UNION ALL
(SELECT '0' "KERNAL_INTER4"
FROM "CHEN"."CJC_T1_00000000000000000" "A",
CHENPMB."MB_ACCINF" "B"
WHERE "B"."AIF_ACCNO" = "A"."CJC_T3_000000"
AND "A"."MPB_BUSINESSTYPE" = '11'
AND "A"."MPB_TRANTIME" IS NOT NULL
AND LENGTH("A"."MPB_TRANTIME") = 14
AND "A"."MPB_TRANTIME" > '20200501000000')
原库SQL改写部分:
UNION ALL
(SELECT '0' "KERNAL_INTER4"
FROM CHENPMB."EA_EACCT_INFO_S" "EA_EACCT_INFO_S"
WHERE "EA_EACCT_INFO_S"."EA_ACCOUNT_OPEN_RESULT" = '1') UNION ALL
(SELECT "VW_JF_SET$5EF83EA7"."ITEM_2" "KERNAL_INTER4"
FROM ((SELECT "A"."CJC_T3_000000" "ITEM_1", '0' "ITEM_2"
FROM "CHEN"."CJC_T1_00000000000000000" "A"
WHERE "A"."MPB_TRANTIME" > '20200501000000'
AND LENGTH("A"."MPB_TRANTIME") = 14
AND "A"."MPB_TRANTIME" IS NOT NULL
AND "A"."MPB_BUSINESSTYPE" = '10') UNION ALL
(SELECT "A"."CJC_T3_000000" "ITEM_1", '0' "ITEM_2"
FROM "CHEN"."CJC_T1_00000000000000000" "A"
WHERE "A"."MPB_TRANTIME" > '20200501000000'
AND LENGTH("A"."MPB_TRANTIME") = 14
AND "A"."MPB_TRANTIME" IS NOT NULL
AND "A"."MPB_BUSINESSTYPE" = '11')) "VW_JF_SET$5EF83EA7",
CHENPMB."MB_ACCINF" "B"
WHERE "B"."AIF_ACCNO" = "VW_JF_SET$5EF83EA7"."ITEM_1")) "MB_YINGXIAO_COUNT1"
尝试按照原库10053转换的SQL改写新库SQL后,执行速度恢复正常,但是SQL改动量较大。
根据第二处不同,可以看到:
改写前原SQL部分:
select CUSTOMERID as CSTNO, TRANTIME as TRANTIME, '10' as TRANTYPE, to_date(TRANTIME, 'yyyymmddhh24miss') as TRANTIME1, createtime, 1 as cnt, 0 as amt, '0' as kernal_inter1, '0' as kernal_inter2, '0' as kernal_inter3, '0' as kernal_inter4 from CJC_T2_00000000000000000 c where c.tranname = 'XX绑定' and c.retcode = '0000' and c.TRANTIME is not null and length(c.TRANTIME) = 14 and c.TRANTIME > '20200501000000' and c.medicalcardno not in (select a.medicalcardno from CJC_T2_00000000000000000 a where a.tranname = 'XX解绑' and a.retcode = '0000')
新库SQL改写部分:
SELECT '0' "KERNAL_INTER4"
FROM CHENPMB."CJC_T2_00000000000000000" "A",
CHENPMB."CJC_T2_00000000000000000" "C"
WHERE "C"."TRANNAME" = 'XX绑定'
AND "C"."RETCODE" = '0000'
AND "C"."TRANTIME" IS NOT NULL
AND LENGTH("C"."TRANTIME") = 14
AND "C"."TRANTIME" > '20200501000000'
AND "C"."MEDICALCARDNO" = "A"."MEDICALCARDNO"
AND "A"."TRANNAME" = 'XX解绑'
AND "A"."RETCODE" = '0000'
原库SQL改写部分:
(SELECT '0' "KERNAL_INTER4"
FROM CHENPMB."CJC_T2_00000000000000000" "C"
WHERE "C"."TRANNAME" = 'XX绑定'
AND "C"."RETCODE" = '0000'
AND "C"."TRANTIME" IS NOT NULL
AND LENGTH("C"."TRANTIME") = 14
AND "C"."TRANTIME" > '20200501000000'
AND NOT EXISTS
(SELECT 0
FROM CHENPMB."CJC_T2_00000000000000000" "A"
WHERE LNNVL("A"."MEDICALCARDNO" <> "C"."MEDICALCARDNO")
AND "A"."TRANNAME" = 'XX解绑'
AND "A"."RETCODE" = '0000'))
根据执行计划可知,
原库将not in改写成NOT EXISTS,表关联走的 HASH JOIN RIGHT ANTI NA,效率较高。 尝试将新库SQL中not in改写成NOT EXISTS后,效率有明显提升,不到5分钟可以执行完成。
SQL是否还有优化空间呢?
可以看看Oracle给出的建议sql tuning advisor:
SQL> DECLARE my_task_name VARCHAR2(30); BEGIN my_task_name := DBMS_SQLTUNE.CREATE_TUNING_TASK(sql_id => 'gzjy5nh7g0sfs', scope => 'COMPREHENSIVE', time_limit => 3600, task_name => 'cjc_sql_tuing_task', description => 'SQL TUNE TEST'); DBMS_SQLTUNE.EXECUTE_TUNING_TASK(task_name => 'cjc_sql_tuing_task'); END; /
SELECT status FROM DBA_ADVISOR_TASKS WHERE task_name = 'cjc_sql_tuing_task';
SET LONG 10000
SET LONGCHUNKSIZE 1000
SET LINESIZE 300
SELECT DBMS_SQLTUNE.REPORT_TUNING_TASK('cjc_sql_tuing_task') FROM DUAL;
建议如下:
1- SQL Profile Finding (see explain plans section below) -------------------------------------------------------- 为此语句找到了性能更好的执行计划 2。选择以下 SQL 概要文件之一进行实施。 Recommendation (estimated benefit: 95.14%) ------------------------------------------ - 考虑接受推荐的 SQL 概要文件。 execute dbms_sqltune.accept_sql_profile(task_name => 'cjc_sql_tuing_task', task_owner => 'CHENPMB', replace => TRUE); Recommendation (estimated benefit: 99.95%) ------------------------------------------ - 考虑接受建议的 SQL 概要文件, 以便对此语句使用并行执行。 execute dbms_sqltune.accept_sql_profile(task_name => 'cjc_sql_tuing_task', task_owner => 'CHENPMB', replace => TRUE, profile_type => DBMS_SQLTUNE.PX_PROFILE); 与 DOP 128 并行执行此查询会使 SQL 概要文件计划上的响应时间缩短 99.13%。但是, 启用并行执行时要付出一些代价。它将增加语句的资源消耗 (预计为 11.07%), 这会导致系统吞吐量降低。此外, 由于在非常短的持续时间内消耗了这些资源, 因此如果没有足够可用的硬件容量, 并发语句的响应时间将受到负面影响。
2- Index Finding (see explain plans section below)
--------------------------------------------------
通过创建一个或多个索引可以改进此语句的执行计划。
Recommendation (estimated benefit: 99.7%)
-----------------------------------------
- 考虑运行可以改进物理方案设计的访问指导或者创建推荐的索引。
create index CHENPMB.IDX$$_04000001 on
CHENPMB.MB_PAY_LOG("MPL_TRANCODE","MPL_STT",LENGTH("MPL_TIME"),"MPL_TIME");
- 考虑运行可以改进物理方案设计的访问指导或者创建推荐的索引。
create index CHENPMB.IDX$$_04000002 on
CHENPMB.MB_FINANCE_TRANS_INCREMENT("MFT_TRAN_STATUS",LENGTH("MFT_TRAN_TIME"
),"MFT_TRAN_CODE","MFT_TRAN_TIME");
- 考虑运行可以改进物理方案设计的访问指导或者创建推荐的索引。
create index CHEN.IDX$$_04000003 on CHEN.CJC_T1_00000000000000000("MPB_BUSI
NESSTYPE",LENGTH("MPB_TRANTIME"),"MPB_TRANTIME","CJC_T3_000000");
- 考虑运行可以改进物理方案设计的访问指导或者创建推荐的索引。
create index CHENPMB.IDX$$_04000004 on
CHENPMB.MB_LOANBYFINANCE_RECORD("MLR_TRANCODE","MLR_LOANTYPEFLAG","MLR_STAT
US",LENGTH("MLR_TRANTIME"),"MLR_TRANTIME");
- 考虑运行可以改进物理方案设计的访问指导或者创建推荐的索引。
create index CHENPMB.IDX$$_04000006 on
CHENPMB.MB_TRANFLOW_INCREMENT("TRF_BSNCODE","TRF_STT",LENGTH("TRF_SUBTIME")
,"TRF_SUBTIME","TRF_FLOWNO");
Rationale
---------
创建推荐的索引可以显著地改进此语句的执行计划。但是, 使用典型的 SQL 工作量运行 "访问指导"
可能比单个语句更可取。通过这种方法可以获得全面的索引建议案, 包括计算索引维护的开销和附加的空间消耗。
最终建议:
1 将SQL中not in部分改写成not exists。 2 创建组合索引。
优化效果:
SQL由原来5小时查不出结果,优化到12秒完成查询。
###2021-03-31 22:00 chenjuchao###