故障描述
某次,一套运行已经有点岁月的 O racle 11 g R2 单机环境,用户反馈某个物化视图不会定时自动刷新,进一步了解后,确认是 D BMS_JOB 无法按照指定的时间和间隔执行,根据以往的运维经验,估计大概率是触发 J OB 自动执行的条件不具备导致,该问题是比较容易处理。但实际情况并没有想的那么简单,详情如下。
根因分析
1 、对异常 2 17 J OB 初步分析与处理
创建多个物化视图,但没有一个能定时自动触发

选择 2 17 号 J OB ,修改其下一次执行时间和间隔

修改 2 17 JOB 下一次执行时间 next_date
begin
dbms_job.next_date(217,to_date('2022-12-07 17:35:00','yyyy-mm-dd hh24:mi:ss'));
commit;
end;
/
修改 2 17 JOB 执行间隔 interval
begin dbms_job.interval(217,interval => 'sysdate+10/1440'); commit; end; /
删除 2 17 JOB ,并重建
exec dbms_job.remove(217);
commit;
declare job2 number;
begin
dbms_job.submit(job2,'dbms_mview.refresh(''"PMS"."MV_HEX_PMS_USER_XBRL"'');',sysdate,'sysdate+10/1440');
commit;
end;
/
重建之后的 2 19 号 J OB 依旧没有执行
select job, what, last_date, next_date, interval, broken, failures, schema_user, log_user, priv_user from dba_jobs WHERE JOB = 219;

2 、模拟用户场景,在测试环境上进行验证
创建业务表 znfx_agg_zc_GT_bb
create table xuh.znfx_agg_zc_GT_bb( MONTHID varchar2(20), ZCZT varchar2(20), QYLX varchar2(20), CLRQ varchar2(20), SFXS varchar2(20), NUM number(20) , ZCZB number(20) , COUNT number(20) , NUM_CNT number(20) , ZCZB_CNT number(20) );
创建存储过程 znfx_tab_pro
create or replace procedure xuh.znfx_tab_pro(arg1 in varchar2) is v_count number := 1; begin while v_count <= 10000 loop insert into xuh.znfx_agg_zc_GT_bb values (arg1, 'AAA', 'BBB', 'CCC', 'DDD', v_count, v_count, v_count, v_count, v_count); v_count := v_count + 1; end loop; commit; end; /
执行存储过程,模拟数据插入
SQL> exec xuh.znfx_tab_pro('202212');
创建物化视图 JCD_AGG_ZC_GT_BB_MV
CREATE MATERIALIZED VIEW "XUH"."JCD_AGG_ZC_GT_BB_MV" ("MONTHID", "ZCZT", "QYLX", "CLRQ", "SFXS", "NUM", "ZCZB", "COUNT", "NUM_CNT", "ZCZB_CNT")
REFRESH force ON DEMAND
AS
select monthid,
zczt,
qylx,
clrq,
sfxs,
sum(num) num,
sum(zczb) zczb,
count(*) count,
Count(num) num_cnt,
Count(zczb) zczb_cnt
from xuh.znfx_agg_zc_GT_bb
group by monthid, zczt, qylx, clrq, sfxs;
手动执行物化视图刷新,确保刷新没有问题
SQL> exec dbms_mview.refresh('XUH.JCD_AGG_ZC_GT_BB_MV');
创建 J OB ,并跟踪观察 J OB 执行情况
declare job3 number;
begin
dbms_job.submit(job3,'dbms_mview.refresh(''"XUH"."JCD_AGG_ZC_GT_BB_MV"'');',sysdate,'sysdate+1/1440');
commit;
end;
/
可以看到该 J OB 确实可以定时自动成功执行

3 、排查可能造成 J OB 无法定时自动执行的原因
Check the most common reasons why jobs don't execute automatically and as scheduled: 1) Instance in RESTRICTED SESSIONS mode? Check if the instance is in restricted sessions mode: select instance_name,logins from v$instance; If logins=RESTRICTED, then: alter system disable restricted session; ^-- Checked! 2) JOB_QUEUE_PROCESSES=0 Make sure that job_queue_processes is > 0 show parameter job_queue_processes ^-- Checked! 3) _SYSTEM_TRIG_ENABLED=FALSE Check if _system_enabled_trigger=false col parameter format a25 col value format a15 select a.ksppinm parameter,b.ksppstvl value from x$ksppi a,x$ksppcv b where a.indx=b.indx and ksppinm='_system_trig_enabled'; If _system_trig_enabled=false, then alter system set "_system_trig_enabled"=TRUE scope=both; ^-- Checked! 4) Is the job BROKEN? select job,broken from dba_jobs where job=; If broken, then check the alert log and trace files to diagnose the issue. ^-- Checked! The job is not broken. 5) Is the job COMMITted? Make sure a commit is issued after submitting the job: DECLARE X NUMBER; BEGIN SYS.DBMS_JOB.SUBMIT ( job => X ,what => 'dbms_utility.analyze_schema (''SCOTT'',''COMPUTE'',NULL,NULL,NULL);' ,next_date => to_date('08/06/2005 09:35:00','dd/mm/yyyy hh24:mi:ss') ,no_parse => FALSE ); COMMIT; END; / If the job executes fine if forced (i.e., exec dbms_jobs.run( );), then likely a commit is missing. ^-- Checked! The job is committed after submission. 6) UPTIME > 497 days Check if the server (machine) has been up for more than 497 days: For SUN, use 'uptime' OS command. If uptime>497 and the jobs do not execute automatically, then you are hitting unpublished bug 3427424 (Jobs may stop running after 497 days uptime) which is fixed in 9206 and A102 ^-- Checked! The server in this case has been up 126 days only 7) DBA_JOBS_RUNNING Check dba_jobs_running to see if the job is still running: select * from dba_jobs_running; ^-- Checked! The job is not running. 8) LAST_DATE and NEXT_DATE Check if the last_date and next_date for the job are proper: select Job,Next_date,Last_date from dba_jobs where job= ; ^-- NEXT_DATE is proper, however LAST_DATE is null since the job never executes automatically. 9) NEXT_DATE and INTERVAL Check if the Next_date is changing properly as per the interval set in dba_jobs: select Job,Interval,Next_date,Last_date from dba_jobs where job= ; ^-- This is not possible since the job never gets executed automatically. 10) Toggle value for JOB_QUEUE_PROCESSES Stop and restart CJQ process(es) alter system set job_queue_processes=0 ; -- alter system set job_queue_processes=4 ; Ref: Bug 2649244 (fixed by: 9015, 9203, 10201) ^-- Done but did not help 11) Check for unsuccessful shutdowns: A shutdown immediate may get canceled because active sessions prevent the database close operation. Please review the alert log for the last two shutdown / startups and the messages: SHUTDOWN: Active sessions prevent database close operation Please refer to Note 434690.1 - Database Jobs Do Not Run After a Failed 'Shutdown Immediate' 12) DBMS_IJOB (non-documented package): Either restart the database or try the following: exec dbms_ijob.set_enabled(true); Ref: Bug 3505718 (Closed, Not a Bug) ^-- Done but did not help 13) Check view DBA_SCHEDULER_GLOBAL_ATTRIBUTE for CURRENT_OPEN_WINDOW: SQL> select * from DBA_SCHEDULER_GLOBAL_ATTRIBUTE where attribute_name='CURRENT_OPEN_WINDOW'; If a window is open close it (e.g.): ATTRIBUTE_NAME VALUE --------------------------------- ---------------------------- CURRENT_OPEN_WINDOW WEEKNIGHT_WINDOW SQL> exec DBMS_SCHEDULER.close_window ('WEEKNIGHT_WINDOW'); ^-- Done but did not help These are the most common causes for this behavior.
通过分析 M OS 文章 Jobs Not Executing Automatically (Doc ID 313102.1) , 检查所有的可能原因,并根据官方建议调整,但是问题依旧。此时,想短时间通过 D BMS_JOB 方式解决问题不太现实,因此尝试使用高版本官方建议的 DBMS_SCHEDULER ,确认可行性。

4 、使用 DBMS_SCHEDULER 创建任务,观察任务执行情况
创建 J OB2 ,并立即执行
BEGIN
DBMS_SCHEDULER.CREATE_JOB(
job_name => 'job2',
job_type => 'PLSQL_BLOCK',
job_action => 'dbms_mview.refresh(''"PMS"."MV_HEX_PMS_USER_XBRL"'');',
start_date => SYSDATE,
repeat_interval => 'FREQ = MINUTELY; INTERVAL = 10',
enabled => true);
END;
/
检查 DBA_SCHEDULER_JOBS 字典视图,发现任务没有自动执行
select OWNER, JOB_NAME, ENABLED, to_char(START_DATE, 'yyyy-mm-dd hh24:mi:ss') START_DATE, REPEAT_INTERVAL, RUN_COUNT, to_char(LAST_START_DATE, 'yyyy-mm-dd hh24:mi:ss') LAST_START_DATE, to_char(LAST_RUN_DURATION, 'yyyy-mm-dd hh24:mi:ss') LAST_RUN_DURATION, to_char(NEXT_RUN_DATE, 'yyyy-mm-dd hh24:mi:ss') NEXT_RUN_DATE from DBA_SCHEDULER_JOBS where job_name = 'JOB2'; [object Object]
select * from dba_scheduler_job_log t where owner='PMS' and job_name='JOB2';

5 、创建测试任务 test _job ,并重启 D BMS_JOB& DBMS_SCHEDULER 属性
创建测试 test_job 任务
begin dbms_scheduler.create_job ( job_name => 'test_job', job_type => 'plsql_block', job_action => 'null;', start_date => SYSDATE, repeat_interval => 'FREQ = MINUTELY; INTERVAL = 1', enabled => true); end; /
检查 D BMS_JOB& DBMS_SCHEDULER 属性生效情况
select value from dba_scheduler_global_attribute where attribute_name='SCHEDULER_DISABLED'; 未选定行 SQL> oradebug setmypid SQL> oradebug dumpvar sga kkjsre sword kkjsre_ [060040500, 060040504) = 00000001
先禁用任务属性,再重新触发生效
exec dbms_ijob.set_enabled(FALSE);
exec dbms_ijob.set_enabled(TRUE);
alter system set job_queue_processes=0;
alter system set job_queue_processes=500;
--用户环境重启DBMS_SCHEDULER属性报错分析
exec dbms_scheduler.set_scheduler_attribute('SCHEDULER_DISABLED', 'TRUE');
--exec dbms_scheduler.set_scheduler_attribute('SCHEDULER_DISABLED', 'FALSE');
关闭 DBMS_SCHEDULER.SET_SCHEDULER_ATTRIBUTE 属性,命中 O RA-1033 报错

通过分析 M OS 文章 DBMS_SCHEDULER.SET_SCHEDULER_ATTRIBUTE Failing With ORA-01033. (Doc ID 756740.1) ,官方未给出明确的故障原因,等待一会之后重试成功,同时观测到 DBMS_SCHEDULER 创建的任务正常定时自动执行。

这跟其他运行正常的 Oracle 相比有些异常,甚至是相反,但不排除是这套环境自身的原因,甚至是 B UG 。其实,官方原文中也有提到 Oracle 文档, 并 没有支持完全禁用调度器的方法 ,只是确保在升级模式下,自动任务不会执行。

此时检查任务 test_job 执行情况,发现该任务已经在自动调度了
select OWNER, JOB_NAME, ENABLED, to_char(START_DATE, 'yyyy-mm-dd hh24:mi:ss') START_DATE, REPEAT_INTERVAL, RUN_COUNT, to_char(LAST_START_DATE, 'yyyy-mm-dd hh24:mi:ss') LAST_START_DATE, to_char(LAST_RUN_DURATION, 'yyyy-mm-dd hh24:mi:ss') LAST_RUN_DURATION, to_char(NEXT_RUN_DATE, 'yyyy-mm-dd hh24:mi:ss') NEXT_RUN_DATE from DBA_SCHEDULER_JOBS where job_name = 'TEST_JOB';
同理,此时业务任务 job 2 ,同样已经在自动调度了,如下:
select OWNER, JOB_NAME, ENABLED, to_char(START_DATE, 'yyyy-mm-dd hh24:mi:ss') START_DATE, REPEAT_INTERVAL, RUN_COUNT, to_char(LAST_START_DATE, 'yyyy-mm-dd hh24:mi:ss') LAST_START_DATE, to_char(LAST_RUN_DURATION, 'yyyy-mm-dd hh24:mi:ss') LAST_RUN_DURATION, to_char(NEXT_RUN_DATE, 'yyyy-mm-dd hh24:mi:ss') NEXT_RUN_DATE from DBA_SCHEDULER_JOBS where job_name = 'JOB2';
select * from dba_scheduler_job_log t where owner='PMS' and job_name='JOB2';
解决办法
SQL> exec dbms_scheduler.set_scheduler_attribute('SCHEDULER_DISABLED', 'TRUE');