存储过程慢?可能是 plan cache 失效了

作者:程柳润,爱可生 DBA 团队成员,OceanBase 数据库技术爱好者。

问题描述

业务反馈一个存储过程执行慢,需要排查原因。由于存储过程过长且涉及多张表,所以主要分享排查思路。

分析过程

1. 查找该存储过程的 trace_id

每次执行存储过程后会有两条记录,两条记录 SQL 的请求类型不同,先产生的记录是预编译的记录(执行耗时很小),另一个是实际执行的记录。

obclient [oceanbase]> select usec_to_time(request_time),svr_ip,trace_id,query_sql,elapsed_time from gv$sql_audit where query_sql like 'call TEST_CHENG%' order by request_time;+----------------------------+----------------+-----------------------------------+-------------------+--------------+--------------+| usec_to_time(request_time) | svr_ip         | trace_id                          | query_sql         | elapsed_time | request_type |+----------------------------+----------------+-----------------------------------+-------------------+--------------+--------------+| 2023-12-16 18:20:31.741933 | xx.xxx.xxx.xxx | YB420ABA3D1F-00060A149FDC35FD-0-0 | call TEST_CHENG() |           73 |             5|| 2023-12-16 18:20:31.742838 | xx.xxx.xxx.xxx | YB420ABA3D1F-00060A149CFDF57C-0-0 | call TEST_CHENG() |      2362075 |             6|+----------------------------+----------------+-----------------------------------+-------------------+--------------+--------------+2 rows in set (0.614 sec)

2. 分析存储中较慢的 SQL

根据上一步中执行较慢的   trace_id 查看该存储过程中执行较慢的 SQL。发现存储过程中的几个  UPDATE 耗时较长。另外, IS_HIT_PLAN 都为 0,表示没有命中  plan cache。然而其  GET_PLAN_TIME(获取执行计划时间)也为 0,是不正常的,这一点会在后面解释。

obclient [oceanbase]> select usec_to_time(request_time),query_sql,elapsed_time,execute_time,AFFECTED_ROWS,EVENT,TOTAL_WAIT_TIME_MICRO,WAIT_TIME_MICRO,TOTAL_WAITS,GET_PLAN_TIME,QUEUE_TIME,IS_HIT_PLAN,plan_id,svr_ip from gv$sql_audit where trace_id='YB420ABA3D1F-00060A149CFDF57C-0-0' order by elapsed_time desc limit 6\G*************************** 1. row ***************************usec_to_time(request_time): 2023-12-16 18:20:31.742838                 query_sql: call TEST_CHENG()              elapsed_time: 2362075              execute_time: 2362015......             GET_PLAN_TIME: 21                QUEUE_TIME: 15               IS_HIT_PLAN: 0......*************************** 2. row ***************************usec_to_time(request_time): 2023-12-16 18:20:32.359304                 query_sql: update "CC"."TEST1" "T" set "T"."CITY" = '1',"T"."COUNTY" = (select               elapsed_time: 630735              execute_time: 630735......             GET_PLAN_TIME: 0                QUEUE_TIME: 0               IS_HIT_PLAN: 0......*************************** 3. row ***************************usec_to_time(request_time): 2023-12-16 18:20:32.999169                 query_sql: update "CC"."TEST1" "T" set "T"."CITY" = '0',"T"."COUNTY" = (select               elapsed_time: 400111              execute_time: 400111......             GET_PLAN_TIME: 0                QUEUE_TIME: 0               IS_HIT_PLAN: 0......*************************** 4. row ***************************usec_to_time(request_time): 2023-12-16 18:20:33.731726                 query_sql: update "CC"."TEST1" "T" set "T"."CITY" = '0',"T"."ROAD" = (select               elapsed_time: 335918              execute_time: 335918......             GET_PLAN_TIME: 0                QUEUE_TIME: 0               IS_HIT_PLAN: 0......*************************** 5. row ***************************usec_to_time(request_time): 2023-12-16 18:20:33.402608                 query_sql: update "CC"."TEST1" "T" set "T"."CITY" = '0',"T"."ROAD" = (select               elapsed_time: 319150              execute_time: 319150......             GET_PLAN_TIME: 0                QUEUE_TIME: 0               IS_HIT_PLAN: 0......*************************** 6. row ***************************usec_to_time(request_time): 2023-12-16 18:20:32.086302                 query_sql: truncate Table  TEST1              elapsed_time: 142128              execute_time: 142128......             GET_PLAN_TIME: 0                QUEUE_TIME: 0               IS_HIT_PLAN: 0......6 rows in set (5.342 sec)

3. 查看存储过程执行情况

发现调用该存储过程之后,对  __all_tenant_dependency 等系统视图有增删改查的操作(篇幅有限,这里只截取了一部分)。这是由于编译 PL 对象会对系统视图做一些操作, gv$sql_audit 会记录下来,可以用  tenant_id=业务租户 ID 来过滤掉这些记录。

另外,在  UPDATE 前有  TRUNCATE TEST1 的操作。该操作会导致两个问题:

  • 整个存储过程的 plan cache 无法命中,每次都要重新编译。
  • 存储过程中这张表相关的 SQL 的 plan cache 也会失效,每次执行也需要重新获取执行计划。
obclient [oceanbase]> select substr(query_sql,1,60),usec_to_time(request_time),elapsed_time,svr_ip from gv$sql_audit where trace_id='YB420ABA3D1F-00060A149CFDF57C-0-0' order by request_time;+---------------------------------------------------+----------------------------+--------------+----------------+| substr(query_sql,1,60)                            | usec_to_time(request_time) | elapsed_time | svr_ip         |+---------------------------------------------------+----------------------------+--------------+----------------+| call TEST_CHENG();                                | 2023-12-16 18:50:30.086159 |      2514812 | xx.xxx.xxx.xxx || START TRANSACTION                                 | 2023-10-07 18:50:30.472741 |          194 | xx.xxx.xxx.xxx || delete FROM __all_tenant_dependency WHERE dep_obj_| 2023-10-07 18:50:30.473589 |         1570 | xx.xxx.xxx.xxx || INSERT INTO __all_tenant_dependency (tenant_id, de| 2023-10-07 18:50:30.476450 |          243 | xx.xxx.xxx.xxx ||......                                                                                                                                                                                                                         ||......                                                                                                                                                                                                                         || insert into "CC"."TEST5"( "LOG_DATE", "LOG_LEVEL",| 2023-12-16 18:50:30.504996 |          393 | xx.xxx.xxx.xxx || truncate Table  TEST1                             | 2023-12-16 18:50:30.505998 |       569828 | xx.xxx.xxx.xxx || select max("T"."CREATE_TIME") from "CC"."TEST6    | 2023-12-16 18:50:31.176795 |          295 | xx.xxx.xxx.xxx || select max("T"."CREATE_TIME") from "CC"."TEST6    | 2023-12-16 18:50:31.177108 |          180 | xx.xxx.xxx.xxx || insert into "CC"."TEST1"( "ID", "NUMBER           | 2023-12-16 18:50:31.178344 |         8039 | xx.xxx.xxx.xxx || SELECT * FROM __all_tenant_meta_table WHERE tenant| 2023-12-16 18:50:31.184392 |          328 | xx.xxx.xxx.xxx || update "CC"."TEST1" "T" set "T"."ID" =            | 2023-12-16 18:50:31.187246 |         4696 | xx.xxx.xxx.xxx || SELECT partition_id, object_type, row_cnt as row_c| 2023-12-16 18:50:31.189366 |          181 | xx.xxx.xxx.xxx || SELECT tenant_id, table_id, partition_id, column_i| 2023-12-16 18:50:31.190145 |          224 | xx.xxx.xxx.xxx || SELECT tenant_id, table_id, partition_id, column_i| 2023-12-16 18:50:31.190774 |          131 | xx.xxx.xxx.xxx || update "CC"."TEST1" "T" set "T"."CITY" =          | 2023-12-16 18:50:31.192182 |         4577 | xx.xxx.xxx.xxx || update "CC"."TEST1" "T" set "T"."CITY" =          | 2023-12-16 18:50:31.197606 |         3896 | xx.xxx.xxx.xxx || update "CC"."TEST1" "T" set "T"."CITY" =          | 2023-12-16 18:50:31.202269 |       347793 | xx.xxx.xxx.xxx || update "CC"."TEST1" "T" set "T"."CITY" =          | 2023-12-16 18:50:31.559553 |       338883 | xx.xxx.xxx.xxx || update "CC"."TEST1" "T" set "T"."CITY" =          | 2023-12-16 18:50:31.901602 |       311209 | xx.xxx.xxx.xxx || update "CC"."TEST1" "T" set "T"."CITY" =          | 2023-12-16 18:50:32.218374 |         6085 | xx.xxx.xxx.xxx || update "CC"."TEST1" "T" set "T"."CITY" =          | 2023-12-16 18:50:32.225213 |       338044 | xx.xxx.xxx.xxx || update "CC"."TEST1" "T" set "T"."ORGAN_CODE" =    | 2023-12-16 18:50:32.566794 |        15760 | xx.xxx.xxx.xxx || insert into "CC"."TEST8"( "MANAGER_ID", "APART_ID | 2023-12-16 18:50:32.593934 |         5915 | xx.xxx.xxx.xxx |+---------------------------------------------------+----------------------------+--------------+----------------+154 rows in set (4.125 sec)

  1. 手工执行慢的几个update,发现命中 plan cache 时只需要几毫秒,而没有命中 plan cache 时需要300多毫秒。

手工执行时会命中执行计划

obclient [oceanbase]> select usec_to_time(request_time),query_sql,elapsed_time,execute_time,AFFECTED_ROWS,EVENT,TOTAL_WAIT_TIME_MICRO,WAIT_TIME_MICRO,TOTAL_WAITS,GET_PLAN_TIME,QUEUE_TIME,IS_HIT_PLAN,plan_id,svr_ip from gv$sql_audit where query_sql like 'update "CC"."TEST1" "T" set "T"."CITY"%' and GET_PLAN_TIME>0 order by request_time\Gusec_to_time(request_time): 2023-12-16 19:12:30.063271                 query_sql: update "CC"."TEST1" "T" set "T"."CITY" = '1',"T"."COUNTY" = (select               elapsed_time: 8810              execute_time: 8810......             GET_PLAN_TIME: 0                QUEUE_TIME: 0               IS_HIT_PLAN: 1......

手工执行update前先做truncate操作

不会命中执行计划,此时  GET_PLAN_TIME 为 372ms。

obclient [oceanbase]> select usec_to_time(request_time),query_sql,elapsed_time,execute_time,AFFECTED_ROWS,EVENT,TOTAL_WAIT_TIME_MICRO,WAIT_TIME_MICRO,TOTAL_WAITS,GET_PLAN_TIME,QUEUE_TIME,IS_HIT_PLAN,plan_id,svr_ip from gv$sql_audit where query_sql like 'update "CC"."TEST1" "T" set "T"."CITY"%' and GET_PLAN_TIME>0 order by request_time\G*************************** 1. row ***************************usec_to_time(request_time): 2023-12-16 19:21:53.069019                 query_sql: update "CC"."TEST1" "T" set "T"."CITY" = '1',"T"."COUNTY" = (select               elapsed_time: 378148              execute_time: 6036......             GET_PLAN_TIME: 372055                QUEUE_TIME: 28               IS_HIT_PLAN: 0......

注意:存储过程里的 SQL,即使没有命中 plan cache,在 gv$sql_audit  中还是会显示 GET_PLAN_TIME: 0,都体现在 execute_time。正常来说(比如手工执行的 SQL),如果没有命中 plan cache,GET_PLAN_TIME 值会比较大,并且不会计算到 execute_time 里。

结论

TRUNCATE 导致  UPDATE 无法命中  plan cache,所以存储过程慢。

解决方案

将存储过程里的 TRUNCATE 改成了  DELETE


请使用浏览器的分享功能分享到微信等