Oracle分布式事务典型案例处理

一、分布式事务概念 介绍

数据库分布式事务是指同一事务中 DML语句对两个或多个数据库进行修改,为确保事务原子性,事务内的所有操作只能一起提交或一起回滚。ORACLE通过使用DATABASE LINK实现分布式事务。

insert into dept@remote_db values (41,'SUPPORT','BRUSSELS'); -- 远程数据库插入

insert into emp values (1041,'MULDER',10);             -- 本地数据库插入

commit;

1.1  基本概念

1.Global Coordinator :分布事务的发起者,负责协调这个分布事务。

2.Commit Point Site :在分布事务中,首先执行 COMMIT ROLLBACK 操作的站点。一般可以把业务中的关键数据库作为 Commit Point Site Commit Point Site 与其它站点不同,事务不会进入 prepared 状态,不会存在 IN-DOUBT (悬疑)事务,也不会因为分布式事务的失败而导致相关表被阻塞。

数据库相关参数:

COMMIT_POINT_STRENGTH ,默认为 1 ,数值大的为 Commit Point Site 。两值相同时一般远程数据库为 Commit Point Site

 

1.2  两阶段提交

Doc ID 126069.1

 

STAGES PHASE CRASH-TEST-NR's

------------ -------- ---------------

(02) -> (06) PREPARE 1, 2, 3, 4

(07) -> (13) COMMIT 5, 6, 7, 8, 9

(14) -> (16) FORGET 10

 

1.2.1  准备阶段( PREPARE PHASE

(01) The global coordinator initiates and commits the distributed transaction.During execution of the SQL statements within the transaction, the definition of the session tree is completed (-> dba_2pc_neighbors).

全局协调器发起分布式事务提交

 

(02) The commit point site is determined (commit_point_strength) and the SCNs (System Change Number) of the communicating nodes are oordinated.The highest SCN at all nodes is determined. This will be the commit SCN at the commit point site later on (-> highest global SCN -> global integrity).

确认 commit point site,取得所有节点中SCN最大的节点值,将最高的SCN号作为分布事物的全局SCN号,保证全局完整性。

 

(03) The global coordinator asks participating参与节点 nodes other than the commit point site to promise to commit or roll back (-> prepare message) the transaction, even if there is a failure. If any node cannot prepare,the transaction is rolled back.

等待其它参与节点返回信息确认是否提交或回滚事务,如果有节点不能 prepare,那事务进行回滚

 

(04) Every participating node allocates resources it needs to commit or rollback the transaction if data is changed.

It saves redo records corresponding to changes made by the transaction to its online redo log. This makes it possible to recover the database back to the prepare state in case of an instance failure.

The node guarantees that locks held for the transaction are able to survive a failure.

刷新日志到日志文件

 

(05) All participating nodes place a distributed lock on modified tables preventing reads/writes.

对分布事物修改的表加分布式锁,防止被

 

(06) All participating nodes respond with a prepared message to their global/local coordinator and wait until a commit or rollback request is received from the global/local coordinator.

After the nodes are prepared, the distributed transaction is said to be in-doubt.

Note that all participating nodes need to be prepared for the two phase commit to continue to the next phase (-> commit phase).

所有参与节点通知并等待协调器的提交或者回滚请求,并将事务切换成 in-doubt存疑状态。

 

1.2.2  提交阶段( COMMIT PHASE

(07) The global coordinator instructs the commit point site to commit.

全局协调器发起提交点站点提交

(08) The commit point site commits with the highest SCN (see step 02).

提交点站点提交

(09) The commit point site informs the global coordinator of the commit.

提交点站点返回提交信息

(10)  The global/local coordinator instructs all the participating nodes to commit.

其它参与者进行提交

(11)  Every node commits the local portion of the distributed transaction and releases locks.

提交本地事务并释放锁

(12)  Every node records an additional redo entry in the local redo log indicating that the transaction has commited.

写日志

(13) The participating nodes notify the global coordinator that they have committed.

On completion of the commit phase, the data on all nodes of the distributed system is consistent with one another.

所有参与节点返回提交完成信息,完成提交

 

1.2.3  注销阶段( FORGET PHASE

(14) After receiving notice from the global coordinator that all nodes have committed, the commit point site erases status information about this transaction.

接收到提交信息后,提交点站点清理事务信息

 

(15) The commit point site informs the global coordinator that it has erased the status information.

commit point site返回消息

(16)  The global coordinator erases its own information about the transaction.

global coordinator清理本次事务的相关信息。

 

此时分布事物的两阶段提交全部完成。

 

分布式事务故障模拟

如果两阶段提交完成之前,数据库或网络出现异常,分布事物处于 IN_DOUBT状态。相关状态会记录到dba_2pc_pending视图中。一旦数据库或网络恢复正常,RECO进程会自动处理IN_DOUBT状态的分布事物。少数情况需要DBA手工处理IN_DOUBT状态的分布事物。

2.1  Co mmit Comment

ORACLE 提供的用于模拟分布式事务各阶段异常的 COMMIT COMMENT

参见: Doc ID 126069.

 

COMMIT COMMENT 'ORA-2PC-CRASH-TEST-n';

where n is one of the following integers:

n Effect

1 Crash commit point site after collect

2 Crash non-commit point site after collect

3 Crash before prepare (non-commit point site)

4 Crash after prepare (non-commit point site)

5 Crash commit point site before commit

6 Crash commit point site after commit

7 Crash non-commit point site before commit

8 Crash non-commit point site after commit

9 Crash commit point site before forget

10 Crash non-commit point site before forget

 

2.2  Testing the script   setup_rem.sql

-->>>>>>>>>> Begin of setup_rem.sql <<<<<<<<<<--

/* Execute at the remote site */

 

connect sys/change_on_install@v817

 

create user identified by tiger

default tablespace users

temporary tablespace temp;

 

grant dba to ;

grant force transaction,force any transaction to ;

/* To be able to crash a distributed transaction with

COMMIT COMMENT 'ORA-2PC-CRASH-n'; */

grant alter system to ;

/* To be able to do ALTER SYSTEM DISABLE/ENABLE DISTRIBUTED

RECOVERY; */

grant delete on sys.pending_trans$ to ;

grant delete on sys.pending_sessions$ to ;

grant delete on sys.pending_sub_sessions$ to ;

/* To be able to use DBMS_TRANSACTION.PURGE_LOST_DB_ENTRY */

 

connect /tiger@v817

 

create database link v817rep.be.oracle.com connect to

identified by tiger using 'v817rep.be.oracle.com';

 

SET TERMOUT OFF

SET ECHO OFF

CREATE TABLE DEPT

(DEPTNO NUMBER(2) CONSTRAINT PK_DEPT PRIMARY KEY,

DNAME VARCHAR2(14) ,

LOC VARCHAR2(13) ) ;

CREATE TABLE EMP

(EMPNO NUMBER(4) CONSTRAINT PK_EMP PRIMARY KEY,

ENAME VARCHAR2(10),

DEPTNO NUMBER(2) CONSTRAINT FK_DEPTNO REFERENCES DEPT);

INSERT INTO DEPT VALUES (10,'ACCOUNTING','CITY1');

INSERT INTO DEPT VALUES (20,'RESEARCH','CITY2');

INSERT INTO DEPT VALUES (30,'SALES','CITY3');

INSERT INTO DEPT VALUES (40,'OPERATIONS','CITY4');

INSERT INTO EMP VALUES (1000,'NAME1',20);

INSERT INTO EMP VALUES (1001,'NAME2',30);

INSERT INTO EMP VALUES (1002,'NAME3',30);

INSERT INTO EMP VALUES (1003,'NAME4',20);

COMMIT;

CREATE SYNONYM S_DEPT FOR DEPT@v817rep.be.oracle.com;

CREATE SYNONYM S_EMP FOR EMP@v817rep.be.oracle.com;

SET TERMOUT ON

SET ECHO ON

 

-->>>>>>>>>> End of setup_rem.sql <<<<<<<<<<--

 

-->>>>>>>>>> Begin of setup_loc.sql <<<<<<<<<<--

 

/* Execute at the local site */

 

connect sys/change_on_install@v817rep

 

create user identified by tiger

default tablespace users

temporary tablespace temp;

 

grant dba to ;

grant force transaction,force any transaction to ;

/* To be able to crash a distributed transaction with

COMMIT COMMENT 'ORA-2PC-CRASH-n'; */

grant alter system to ;

/* To be able to do ALTER SYSTEM DISABLE/ENABLE DISTRIBUTED

RECOVERY; */

grant delete on sys.pending_trans$ to ;

grant delete on sys.pending_sessions$ to ;

grant delete on sys.pending_sub_sessions$ to ;

/* To be able to use DBMS_TRANSACTION.PURGE_LOST_DB_ENTRY */

 

connect /tiger@v817rep

 

create database link v817.be.oracle.com connect to

identified by tiger using 'v817.be.oracle.com';

 

SET TERMOUT OFF

SET ECHO OFF

CREATE TABLE DEPT

(DEPTNO NUMBER(2) CONSTRAINT PK_DEPT PRIMARY KEY,

DNAME VARCHAR2(14) ,

LOC VARCHAR2(13) ) ;

CREATE TABLE EMP

(EMPNO NUMBER(4) CONSTRAINT PK_EMP PRIMARY KEY,

ENAME VARCHAR2(10),

DEPTNO NUMBER(2) CONSTRAINT FK_DEPTNO REFERENCES DEPT);

INSERT INTO DEPT VALUES (10,'ACCOUNTING','CITY1');

INSERT INTO DEPT VALUES (20,'RESEARCH','CITY2');

INSERT INTO DEPT VALUES (30,'SALES','CITY3');

INSERT INTO DEPT VALUES (40,'OPERATIONS','CITY4');

INSERT INTO EMP VALUES (1000,'NAME1',20);

INSERT INTO EMP VALUES (1001,'NAME2',30);

INSERT INTO EMP VALUES (1002,'NAME3',30);

INSERT INTO EMP VALUES (1003,'NAME4',20);

COMMIT;

CREATE SYNONYM S_DEPT FOR DEPT@v817.be.oracle.com;

CREATE SYNONYM S_EMP FOR EMP@v817.be.oracle.com;

SET TERMOUT ON

SET ECHO ON

 

-->>>>>>>>>> End of setup_loc.sql <<<<<<<<<<--

 

-->>>>>>>>>> Begin of crash_1.sql <<<<<<<<<<--

 

/* Crash Scenario 1 */

/* Crash commit point site after collect */

 

connect /tiger@v817rep

alter system disable distributed recovery;

 

/* DML remote */

/* object s_dept is a synonym for table dept@v817.be.oracle.com */

insert into s_dept values (41,'SUPPORT','BRUSSELS');

 

/* DML local */

insert into emp values (1041,'MULDER',10);

commit comment 'ORA-2PC-CRASH-TEST-1';

 

2.3  Crash commit point site after collect

commit comment 'ORA-2PC-CRASH-TEST-1';

-> after step (06) above

 

step (06) All participating nodes respond with a prepared message to their global/local coordinator and wait until a commit or rollback request is received from the global/local coordinator.

After the nodes are prepared, the distributed transaction is said to be in-doubt.

Note that all participating nodes need to be prepared for the two phase commit to continue to the next phase (-> commit phase).

 

SQL> connect local/local

Connected.

SQL> alter system disable distributed recovery;

System altered.

SQL> insert into s_dept values (42,'SUPPORT','BRUSSELS');

1 row created.

SQL> insert into emp values (1041,'MULDER',10);

1 row created.

SQL> commit comment 'ORA-2PC-CRASH-TEST-1';

commit comment 'ORA-2PC-CRASH-TEST-1'

*

ERROR at line 1:

ORA-02054: transaction 1.8.664 in-doubt

ORA-02059: ORA-2PC-CRASH-TEST-1 in commit comment

ORA-02063: preceding line from TONODE2

 

本地数据库 alert 日志

Error 2059 trapped in 2PC on transaction 1.8.664. Cleaning up.

Error stack returned to user:

ORA-02054: transaction 1.8.664 in-doubt

ORA-02059: ORA-2PC-CRASH-TEST-1 in commit comment

ORA-02063: preceding line from TONODE2

Mon Jan 03 13:52:54 2022

DISTRIB TRAN ORCL.6c28f3e5.1.8.664

  is local tran 1.8.664 (hex=01.08.298)

  insert pending prepared tran, scn=1002823 (hex=0.000f4d47)

 

远程数据库 alert 日志

Mon Jan 03 13:52:54 2022

Error 2059 trapped in 2PC on transaction 1.11.660. Cleaning up.

Error stack returned to user:

ORA-02059: ORA-2PC-CRASH-TEST-1 in commit comment

 

远程节点

SQL> select LOCAL_TRAN_ID,GLOBAL_TRAN_ID,STATE,COMMIT# from dba_2pc_pending;

no rows selected

 

本地节点

col GLOBAL_TRAN_ID for a40

select LOCAL_TRAN_ID,GLOBAL_TRAN_ID,STATE,COMMIT# from dba_2pc_pending ;

LOCAL_TRAN_ID          GLOBAL_TRAN_ID                           STATE            COMMIT#

---------------------- ---------------------------------------- ---------------- ----------------

1.8.664                ORCL.6c28f3e5.1.8.664                    prepared         1002823

 

SQL> ROLLBACK FORCE '1.8.664';

Rollback complete.

SQL> COMMIT FORCE '1.8.664';

 

SQL> select LOCAL_TRAN_ID,GLOBAL_TRAN_ID,STATE,COMMIT# from dba_2pc_pending;

LOCAL_TRAN_ID          GLOBAL_TRAN_ID                           STATE            COMMIT#

---------------------- ---------------------------------------- ---------------- ----------------

1.8.664                ORCL.6c28f3e5.1.8.664                    forced rollback  1002823

 

SQL> execute DBMS_TRANSACTION.PURGE_LOST_DB_ENTRY('1.8.664');

PL/SQL procedure successfully completed.

 

SQL> commit;

Commit complete.

 

2.4  Crash commit point site before commit

COMMIT COMMENT 'ORA-2PC-CRASH-TEST-5'

Crash commit point site before commit

-> after step (07) above

(07) The global coordinator instructs the commit point site to commit.

 

SQL> connect local/local

Connected.

SQL> alter system disable distributed recovery;

System altered.

SQL> insert into s_dept values (43,'SUPPORT','BRUSSELS');

1 row created.

SQL> insert into emp values (1043,'MULDER',10);

1 row created.

SQL> commit comment 'ORA-2PC-CRASH-TEST-5';

commit comment 'ORA-2PC-CRASH-TEST-5'

*

ERROR at line 1:

ORA-02054: transaction 5.9.879 in-doubt

ORA-02059: ORA-2PC-CRASH-TEST-5 in commit comment

ORA-02063: preceding line from TONODE2

 

本地 alert 日志

Mon Jan 03 14:14:56 2022

Error 2059 trapped in 2PC on transaction 5.9.879. Cleaning up.

Mon Jan 03 14:14:56 2022

DISTRIB TRAN ORCL.6c28f3e5.5.9.879

   is local tran 5.9.879 (hex=05.09.36f)

   insert pending prepared tran, scn=1003725 (hex=0.000f50cd)

Error stack returned to user:

ORA-02054: transaction 5.9.879 in-doubt

ORA-02059: ORA-2PC-CRASH-TEST-5 in commit comment

ORA-02063: preceding line from TONODE2

 

远程 alert 日志

Mon Jan 03 14:14:56 2022

Error 2059 trapped in 2PC on transaction 1.14.661. Cleaning up.

Error stack returned to user:

ORA-02059: ORA-2PC-CRASH-TEST-5 in commit comment

 

远程节点

SQL> select LOCAL_TRAN_ID,GLOBAL_TRAN_ID,STATE,COMMIT# from dba_2pc_pending;

no rows selected

 

本地节点

SQL> select LOCAL_TRAN_ID,GLOBAL_TRAN_ID,STATE,COMMIT# from dba_2pc_pending;

LOCAL_TRAN_ID          GLOBAL_TRAN_ID                           STATE            COMMIT#

---------------------- ---------------------------------------- ---------------- ----------------

5.9.879                ORCL.6c28f3e5.5.9.879                    prepared         1003725

 

SQL> ROLLBACK FORCE '5.9.879';

Rollback complete.

 

SQL> select LOCAL_TRAN_ID,GLOBAL_TRAN_ID,STATE,COMMIT# from dba_2pc_pending;

 

LOCAL_TRAN_ID          GLOBAL_TRAN_ID                           STATE            COMMIT#

---------------------- ---------------------------------------- ---------------- ----------------

5.9.879                ORCL.6c28f3e5.5.9.879                    forced rollback  1003725

 

SQL> execute DBMS_TRANSACTION.PURGE_LOST_DB_ENTRY('5.9.879');

PL/SQL procedure successfully completed.

 

SQL> commit;

Commit complete.

 

2.5  伪行构造

SQL > connect local/local

Connected.

SQL> alter system disable distributed recovery;

System altered.

SQL> insert into s_dept values (46,'SUPPORT','BRUSSELS');

1 row created.

SQL> insert into emp values (1046,'MULDER',10);

1 row created.

SQL> commit comment 'ORA-2PC-CRASH-TEST-4';

commit comment 'ORA-2PC-CRASH-TEST-4'

*

ERROR at line 1:

ORA-02054: transaction 5.26.883 in-doubt

ORA-02059: ORA-2PC-CRASH-TEST-4 in commit comment

 

- - 本地 alert 日志

Mon Jan 03 18:51:33 2022

Error 2059 trapped in 2PC on transaction 5.26.883. Cleaning up.

Mon Jan 03 18:51:33 2022

DISTRIB TRAN ORCL.6c28f3e5.5.26.883

  is local tran 5.26.883 (hex=05.1a.373)

  insert pending prepared tran, scn=1045010 (hex=0.000ff212)

Error stack returned to user:

ORA-02054: transaction 5.26.883 in-doubt

ORA-02059: ORA-2PC-CRASH-TEST-4 in commit comment

 

- - 远程 alert 日志,无输出

 

-- 远程节点

SQL> select LOCAL_TRAN_ID,GLOBAL_TRAN_ID,STATE,COMMIT# from dba_2pc_pending;

no rows selected

 

本地节点

SQL> select LOCAL_TRAN_ID,GLOBAL_TRAN_ID,STATE,COMMIT# from dba_2pc_pending;

LOCAL_TRAN_ID          GLOBAL_TRAN_ID                           STATE            COMMIT#

---------------------- ---------------------------------------- ---------------- ----------------

5.26.883               ORCL.6c28f3e5.5.26.883                   prepared         1045010

 

SQL> ROLLBACK FORCE '5.26.883';

模拟 hang

 

SQL> select * from emp;

ERROR:

ORA-01591: lock held by in-doubt distributed transaction 5.26.883

 

删除基表信息

SQL> set transaction use rollback segment SYSTEM;

SQL> delete from sys.pending_trans$ where local_tran_id = '5.26.883';

SQL> delete from sys.pending_sessions$ where local_tran_id = '5.26.883';

SQL> delete from sys.pending_sub_sessions$ where local_tran_id = '5.26.883';

SQL> commit;

 

SQL> select * from emp;

ERROR:

ORA-01591: lock held by in-doubt distributed transaction 5.26.883

 

SQL> shutdown immediate

Database closed.

Database dismounted.

ORACLE instance shut down.

SQL> startup

ORACLE instance started.

 

Total System Global Area  400846848 bytes

Fixed Size                  2253664 bytes

Variable Size             176164000 bytes

Database Buffers          218103808 bytes

Redo Buffers                4325376 bytes

Database mounted.

Database opened.

SQL> conn local/local

Connected.

 

重启数据库后事务未释放

SQL> select * from emp;

ERROR:

ORA-01591: lock held by in-doubt distributed transaction 5.26.883

 

SQL> select LOCAL_TRAN_ID,GLOBAL_TRAN_ID,STATE,COMMIT# from dba_2pc_pending;

no rows selected

 

SQL> SELECT KTUXEUSN, KTUXESLT, KTUXESQN, /* Transaction ID */ KTUXESTA Status,KTUXECFL Flags FROM x$ktuxe WHERE ktuxesta!='INACTIVE' AND ktuxeusn= 5;

  KTUXEUSN   KTUXESLT   KTUXESQN STATUS           FLAGS

---------- ---------- ---------- ---------------- ------------------------

         5         26        883 PREPARED         SCO|COL|REV|DEAD

 

SQL> commit force '5.26.883';

commit force '5.26.883'

*

ERROR at line 1:

ORA-02058: no prepared transaction found with ID 5.26.883

 

SQL> rollback force '5.26.883';

rollback force '5.26.883'

*

ERROR at line 1:

ORA-02058: no prepared transaction found with ID 5.26.883

 

raises ORA-02058 since dba_2pc views are empty. In order to use commit force or rollback force a dummy record should be inserted into pending_trans$ as follows:

insert into pending_trans$ (

LOCAL_TRAN_ID,

GLOBAL_TRAN_FMT,

GLOBAL_ORACLE_ID,

STATE,

STATUS,

SESSION_VECTOR,

RECO_VECTOR,

TYPE#,

FAIL_TIME,

RECO_TIME)

values( '5.26.883', /* <== Replace this with your local tran id */

306206, /* */

'XXXXXXX.12345.1.2.3', /* These values can be used without any */

'prepared','P', /* modification. Most of the values are */

hextoraw( '00000001' ), /* constant. */

hextoraw( '00000000' ), /* */

0, sysdate, sysdate );

 

insert into pending_sessions$

values( '5.26.883',/* <==Replace only this with your local tran id */

1, hextoraw('05004F003A1500000104'),

'C', 0, 30258592, '',

146

);

commit;

 

在插入模拟事务数据后,可查询到事务信息,强制提交后,事务完成

SQL> select LOCAL_TRAN_ID,GLOBAL_TRAN_ID,STATE,COMMIT# from dba_2pc_pending;

LOCAL_TRAN_ID          GLOBAL_TRAN_ID                           STATE            COMMIT#

---------------------- ---------------------------------------- ---------------- ----------------

5.26.883               XXXXXXX.12345.1.2.3                      prepared         1045010

 

commit force '5.26.883';

 

SQL> execute DBMS_TRANSACTION.PURGE_LOST_DB_ENTRY('5.26.883');

PL/SQL procedure successfully completed.

 

SQL> commit;

Commit complete.

 

SQL> select LOCAL_TRAN_ID,GLOBAL_TRAN_ID,STATE,COMMIT# from dba_2pc_pending;

no rows selected

 

分布式事务案例总结

分布式事务的手工处理主要按 dba_2pc_pending 的结果来灵活对应。

dba_2pc_pending 中的数据时,主要看 state字段,

1) 如果 state状态是prepared,表示事务未提交或回滚,此时需要手工进行强制回滚或提交。

ROLLBACK FORCE 'transaction_id';

-OR-

COMMIT FORCE 'transaction_id','commit#';

 

2)如果state状态是committed, rollback forced或者commit forced状态,表示事务已经完成了,但是在FORGET阶段处理时,数据库字典的信息没能及时清除。此时,我们调用oracle的清理丢失事务信息的语句就可以完成处理:

execute DBMS_TRANSACTION.PURGE_LOST_DB_ENTRY('5.26.883');

Commit;

 

3 dba_2pc_pending 中无数据时

有可能基表 pending_trans$ pending_sessions$ 被执行过删除操作,此时需要向表中重新插回伪造的数据后再进行强制回滚或提交。


请使用浏览器的分享功能分享到微信等