一、分布式事务概念 介绍
数据库分布式事务是指同一事务中 DML语句对两个或多个数据库进行修改,为确保事务原子性,事务内的所有操作只能一起提交或一起回滚。ORACLE通过使用DATABASE LINK实现分布式事务。
insert into dept@remote_db values (41,'SUPPORT','BRUSSELS'); -- 远程数据库插入
insert into emp values (1041,'MULDER',10); -- 本地数据库插入
commit;
1.1 基本概念
1.Global Coordinator :分布事务的发起者,负责协调这个分布事务。
2.Commit Point Site :在分布事务中,首先执行 COMMIT 或 ROLLBACK 操作的站点。一般可以把业务中的关键数据库作为 Commit Point Site 。 Commit Point Site 与其它站点不同,事务不会进入 prepared 状态,不会存在 IN-DOUBT (悬疑)事务,也不会因为分布式事务的失败而导致相关表被阻塞。
数据库相关参数:
COMMIT_POINT_STRENGTH ,默认为 1 ,数值大的为 Commit Point Site 。两值相同时一般远程数据库为 Commit Point Site 。
1.2 两阶段提交
Doc ID 126069.1
STAGES PHASE CRASH-TEST-NR's
------------ -------- ---------------
(02) -> (06) PREPARE 1, 2, 3, 4
(07) -> (13) COMMIT 5, 6, 7, 8, 9
(14) -> (16) FORGET 10
1.2.1 准备阶段( PREPARE PHASE )
(01) The global coordinator initiates and commits the distributed transaction.During execution of the SQL statements within the transaction, the definition of the session tree is completed (-> dba_2pc_neighbors).
全局协调器发起分布式事务提交
(02) The commit point site is determined (commit_point_strength) and the SCNs (System Change Number) of the communicating nodes are oordinated.The highest SCN at all nodes is determined. This will be the commit SCN at the commit point site later on (-> highest global SCN -> global integrity).
确认 commit point site,取得所有节点中SCN最大的节点值,将最高的SCN号作为分布事物的全局SCN号,保证全局完整性。
(03) The global coordinator asks participating参与节点 nodes other than the commit point site to promise to commit or roll back (-> prepare message) the transaction, even if there is a failure. If any node cannot prepare,the transaction is rolled back.
等待其它参与节点返回信息确认是否提交或回滚事务,如果有节点不能 prepare,那事务进行回滚
(04) Every participating node allocates resources it needs to commit or rollback the transaction if data is changed.
It saves redo records corresponding to changes made by the transaction to its online redo log. This makes it possible to recover the database back to the prepare state in case of an instance failure.
The node guarantees that locks held for the transaction are able to survive a failure.
刷新日志到日志文件
(05) All participating nodes place a distributed lock on modified tables preventing reads/writes.
对分布事物修改的表加分布式锁,防止被 读 写
(06) All participating nodes respond with a prepared message to their global/local coordinator and wait until a commit or rollback request is received from the global/local coordinator.
After the nodes are prepared, the distributed transaction is said to be in-doubt.
Note that all participating nodes need to be prepared for the two phase commit to continue to the next phase (-> commit phase).
所有参与节点通知并等待协调器的提交或者回滚请求,并将事务切换成 in-doubt存疑状态。
1.2.2 提交阶段( COMMIT PHASE )
(07) The global coordinator instructs the commit point site to commit.
全局协调器发起提交点站点提交
(08) The commit point site commits with the highest SCN (see step 02).
提交点站点提交
(09) The commit point site informs the global coordinator of the commit.
提交点站点返回提交信息
(10) The global/local coordinator instructs all the participating nodes to commit.
其它参与者进行提交
(11) Every node commits the local portion of the distributed transaction and releases locks.
提交本地事务并释放锁
(12) Every node records an additional redo entry in the local redo log indicating that the transaction has commited.
写日志
(13) The participating nodes notify the global coordinator that they have committed.
On completion of the commit phase, the data on all nodes of the distributed system is consistent with one another.
所有参与节点返回提交完成信息,完成提交
1.2.3 注销阶段( FORGET PHASE )
(14) After receiving notice from the global coordinator that all nodes have committed, the commit point site erases status information about this transaction.
接收到提交信息后,提交点站点清理事务信息
(15) The commit point site informs the global coordinator that it has erased the status information.
commit point site返回消息
(16) The global coordinator erases its own information about the transaction.
global coordinator清理本次事务的相关信息。
此时分布事物的两阶段提交全部完成。
二 、 分布式事务故障模拟
如果两阶段提交完成之前,数据库或网络出现异常,分布事物处于 IN_DOUBT状态。相关状态会记录到dba_2pc_pending视图中。一旦数据库或网络恢复正常,RECO进程会自动处理IN_DOUBT状态的分布事物。少数情况需要DBA手工处理IN_DOUBT状态的分布事物。
2.1 Co mmit Comment
ORACLE 提供的用于模拟分布式事务各阶段异常的 COMMIT COMMENT
参见: Doc ID 126069.
COMMIT COMMENT 'ORA-2PC-CRASH-TEST-n';
where n is one of the following integers:
n Effect
1 Crash commit point site after collect
2 Crash non-commit point site after collect
3 Crash before prepare (non-commit point site)
4 Crash after prepare (non-commit point site)
5 Crash commit point site before commit
6 Crash commit point site after commit
7 Crash non-commit point site before commit
8 Crash non-commit point site after commit
9 Crash commit point site before forget
10 Crash non-commit point site before forget
2.2 Testing the script setup_rem.sql
-->>>>>>>>>> Begin of setup_rem.sql <<<<<<<<<<--
/* Execute at the remote site */
connect sys/change_on_install@v817
create user
default tablespace users
temporary tablespace temp;
grant dba to
grant force transaction,force any transaction to
/* To be able to crash a distributed transaction with
COMMIT COMMENT 'ORA-2PC-CRASH-n'; */
grant alter system to
/* To be able to do ALTER SYSTEM DISABLE/ENABLE DISTRIBUTED
RECOVERY; */
grant delete on sys.pending_trans$ to
grant delete on sys.pending_sessions$ to
grant delete on sys.pending_sub_sessions$ to
/* To be able to use DBMS_TRANSACTION.PURGE_LOST_DB_ENTRY */
connect
create database link v817rep.be.oracle.com connect to
identified by tiger using 'v817rep.be.oracle.com';
SET TERMOUT OFF
SET ECHO OFF
CREATE TABLE DEPT
(DEPTNO NUMBER(2) CONSTRAINT PK_DEPT PRIMARY KEY,
DNAME VARCHAR2(14) ,
LOC VARCHAR2(13) ) ;
CREATE TABLE EMP
(EMPNO NUMBER(4) CONSTRAINT PK_EMP PRIMARY KEY,
ENAME VARCHAR2(10),
DEPTNO NUMBER(2) CONSTRAINT FK_DEPTNO REFERENCES DEPT);
INSERT INTO DEPT VALUES (10,'ACCOUNTING','CITY1');
INSERT INTO DEPT VALUES (20,'RESEARCH','CITY2');
INSERT INTO DEPT VALUES (30,'SALES','CITY3');
INSERT INTO DEPT VALUES (40,'OPERATIONS','CITY4');
INSERT INTO EMP VALUES (1000,'NAME1',20);
INSERT INTO EMP VALUES (1001,'NAME2',30);
INSERT INTO EMP VALUES (1002,'NAME3',30);
INSERT INTO EMP VALUES (1003,'NAME4',20);
COMMIT;
CREATE SYNONYM S_DEPT FOR DEPT@v817rep.be.oracle.com;
CREATE SYNONYM S_EMP FOR EMP@v817rep.be.oracle.com;
SET TERMOUT ON
SET ECHO ON
-->>>>>>>>>> End of setup_rem.sql <<<<<<<<<<--
-->>>>>>>>>> Begin of setup_loc.sql <<<<<<<<<<--
/* Execute at the local site */
connect sys/change_on_install@v817rep
create user
default tablespace users
temporary tablespace temp;
grant dba to
grant force transaction,force any transaction to
/* To be able to crash a distributed transaction with
COMMIT COMMENT 'ORA-2PC-CRASH-n'; */
grant alter system to
/* To be able to do ALTER SYSTEM DISABLE/ENABLE DISTRIBUTED
RECOVERY; */
grant delete on sys.pending_trans$ to
grant delete on sys.pending_sessions$ to
grant delete on sys.pending_sub_sessions$ to
/* To be able to use DBMS_TRANSACTION.PURGE_LOST_DB_ENTRY */
connect
create database link v817.be.oracle.com connect to
identified by tiger using 'v817.be.oracle.com';
SET TERMOUT OFF
SET ECHO OFF
CREATE TABLE DEPT
(DEPTNO NUMBER(2) CONSTRAINT PK_DEPT PRIMARY KEY,
DNAME VARCHAR2(14) ,
LOC VARCHAR2(13) ) ;
CREATE TABLE EMP
(EMPNO NUMBER(4) CONSTRAINT PK_EMP PRIMARY KEY,
ENAME VARCHAR2(10),
DEPTNO NUMBER(2) CONSTRAINT FK_DEPTNO REFERENCES DEPT);
INSERT INTO DEPT VALUES (10,'ACCOUNTING','CITY1');
INSERT INTO DEPT VALUES (20,'RESEARCH','CITY2');
INSERT INTO DEPT VALUES (30,'SALES','CITY3');
INSERT INTO DEPT VALUES (40,'OPERATIONS','CITY4');
INSERT INTO EMP VALUES (1000,'NAME1',20);
INSERT INTO EMP VALUES (1001,'NAME2',30);
INSERT INTO EMP VALUES (1002,'NAME3',30);
INSERT INTO EMP VALUES (1003,'NAME4',20);
COMMIT;
CREATE SYNONYM S_DEPT FOR DEPT@v817.be.oracle.com;
CREATE SYNONYM S_EMP FOR EMP@v817.be.oracle.com;
SET TERMOUT ON
SET ECHO ON
-->>>>>>>>>> End of setup_loc.sql <<<<<<<<<<--
-->>>>>>>>>> Begin of crash_1.sql <<<<<<<<<<--
/* Crash Scenario 1 */
/* Crash commit point site after collect */
connect
alter system disable distributed recovery;
/* DML remote */
/* object s_dept is a synonym for table dept@v817.be.oracle.com */
insert into s_dept values (41,'SUPPORT','BRUSSELS');
/* DML local */
insert into emp values (1041,'MULDER',10);
commit comment 'ORA-2PC-CRASH-TEST-1';
2.3 Crash commit point site after collect
commit comment 'ORA-2PC-CRASH-TEST-1';
-> after step (06) above
step (06) : All participating nodes respond with a prepared message to their global/local coordinator and wait until a commit or rollback request is received from the global/local coordinator.
After the nodes are prepared, the distributed transaction is said to be in-doubt.
Note that all participating nodes need to be prepared for the two phase commit to continue to the next phase (-> commit phase).
SQL> connect local/local
Connected.
SQL> alter system disable distributed recovery;
System altered.
SQL> insert into s_dept values (42,'SUPPORT','BRUSSELS');
1 row created.
SQL> insert into emp values (1041,'MULDER',10);
1 row created.
SQL> commit comment 'ORA-2PC-CRASH-TEST-1';
commit comment 'ORA-2PC-CRASH-TEST-1'
*
ERROR at line 1:
ORA-02054: transaction 1.8.664 in-doubt
ORA-02059: ORA-2PC-CRASH-TEST-1 in commit comment
ORA-02063: preceding line from TONODE2
本地数据库 alert 日志
Error 2059 trapped in 2PC on transaction 1.8.664. Cleaning up.
Error stack returned to user:
ORA-02054: transaction 1.8.664 in-doubt
ORA-02059: ORA-2PC-CRASH-TEST-1 in commit comment
ORA-02063: preceding line from TONODE2
Mon Jan 03 13:52:54 2022
DISTRIB TRAN ORCL.6c28f3e5.1.8.664
is local tran 1.8.664 (hex=01.08.298)
insert pending prepared tran, scn=1002823 (hex=0.000f4d47)
远程数据库 alert 日志
Mon Jan 03 13:52:54 2022
Error 2059 trapped in 2PC on transaction 1.11.660. Cleaning up.
Error stack returned to user:
ORA-02059: ORA-2PC-CRASH-TEST-1 in commit comment
远程节点
SQL> select LOCAL_TRAN_ID,GLOBAL_TRAN_ID,STATE,COMMIT# from dba_2pc_pending;
no rows selected
本地节点
col GLOBAL_TRAN_ID for a40
select LOCAL_TRAN_ID,GLOBAL_TRAN_ID,STATE,COMMIT# from dba_2pc_pending ;
LOCAL_TRAN_ID GLOBAL_TRAN_ID STATE COMMIT#
---------------------- ---------------------------------------- ---------------- ----------------
1.8.664 ORCL.6c28f3e5.1.8.664 prepared 1002823
SQL> ROLLBACK FORCE '1.8.664';
Rollback complete.
SQL> COMMIT FORCE '1.8.664';
SQL> select LOCAL_TRAN_ID,GLOBAL_TRAN_ID,STATE,COMMIT# from dba_2pc_pending;
LOCAL_TRAN_ID GLOBAL_TRAN_ID STATE COMMIT#
---------------------- ---------------------------------------- ---------------- ----------------
1.8.664 ORCL.6c28f3e5.1.8.664 forced rollback 1002823
SQL> execute DBMS_TRANSACTION.PURGE_LOST_DB_ENTRY('1.8.664');
PL/SQL procedure successfully completed.
SQL> commit;
Commit complete.
2.4 Crash commit point site before commit
COMMIT COMMENT 'ORA-2PC-CRASH-TEST-5'
Crash commit point site before commit
-> after step (07) above
(07) The global coordinator instructs the commit point site to commit.
SQL> connect local/local
Connected.
SQL> alter system disable distributed recovery;
System altered.
SQL> insert into s_dept values (43,'SUPPORT','BRUSSELS');
1 row created.
SQL> insert into emp values (1043,'MULDER',10);
1 row created.
SQL> commit comment 'ORA-2PC-CRASH-TEST-5';
commit comment 'ORA-2PC-CRASH-TEST-5'
*
ERROR at line 1:
ORA-02054: transaction 5.9.879 in-doubt
ORA-02059: ORA-2PC-CRASH-TEST-5 in commit comment
ORA-02063: preceding line from TONODE2
本地 alert 日志
Mon Jan 03 14:14:56 2022
Error 2059 trapped in 2PC on transaction 5.9.879. Cleaning up.
Mon Jan 03 14:14:56 2022
DISTRIB TRAN ORCL.6c28f3e5.5.9.879
is local tran 5.9.879 (hex=05.09.36f)
insert pending prepared tran, scn=1003725 (hex=0.000f50cd)
Error stack returned to user:
ORA-02054: transaction 5.9.879 in-doubt
ORA-02059: ORA-2PC-CRASH-TEST-5 in commit comment
ORA-02063: preceding line from TONODE2
远程 alert 日志
Mon Jan 03 14:14:56 2022
Error 2059 trapped in 2PC on transaction 1.14.661. Cleaning up.
Error stack returned to user:
ORA-02059: ORA-2PC-CRASH-TEST-5 in commit comment
远程节点
SQL> select LOCAL_TRAN_ID,GLOBAL_TRAN_ID,STATE,COMMIT# from dba_2pc_pending;
no rows selected
本地节点
SQL> select LOCAL_TRAN_ID,GLOBAL_TRAN_ID,STATE,COMMIT# from dba_2pc_pending;
LOCAL_TRAN_ID GLOBAL_TRAN_ID STATE COMMIT#
---------------------- ---------------------------------------- ---------------- ----------------
5.9.879 ORCL.6c28f3e5.5.9.879 prepared 1003725
SQL> ROLLBACK FORCE '5.9.879';
Rollback complete.
SQL> select LOCAL_TRAN_ID,GLOBAL_TRAN_ID,STATE,COMMIT# from dba_2pc_pending;
LOCAL_TRAN_ID GLOBAL_TRAN_ID STATE COMMIT#
---------------------- ---------------------------------------- ---------------- ----------------
5.9.879 ORCL.6c28f3e5.5.9.879 forced rollback 1003725
SQL> execute DBMS_TRANSACTION.PURGE_LOST_DB_ENTRY('5.9.879');
PL/SQL procedure successfully completed.
SQL> commit;
Commit complete.
2.5 伪行构造
SQL > connect local/local
Connected.
SQL> alter system disable distributed recovery;
System altered.
SQL> insert into s_dept values (46,'SUPPORT','BRUSSELS');
1 row created.
SQL> insert into emp values (1046,'MULDER',10);
1 row created.
SQL> commit comment 'ORA-2PC-CRASH-TEST-4';
commit comment 'ORA-2PC-CRASH-TEST-4'
*
ERROR at line 1:
ORA-02054: transaction 5.26.883 in-doubt
ORA-02059: ORA-2PC-CRASH-TEST-4 in commit comment
- - 本地 alert 日志
Mon Jan 03 18:51:33 2022
Error 2059 trapped in 2PC on transaction 5.26.883. Cleaning up.
Mon Jan 03 18:51:33 2022
DISTRIB TRAN ORCL.6c28f3e5.5.26.883
is local tran 5.26.883 (hex=05.1a.373)
insert pending prepared tran, scn=1045010 (hex=0.000ff212)
Error stack returned to user:
ORA-02054: transaction 5.26.883 in-doubt
ORA-02059: ORA-2PC-CRASH-TEST-4 in commit comment
- - 远程 alert 日志,无输出
-- 远程节点
SQL> select LOCAL_TRAN_ID,GLOBAL_TRAN_ID,STATE,COMMIT# from dba_2pc_pending;
no rows selected
本地节点
SQL> select LOCAL_TRAN_ID,GLOBAL_TRAN_ID,STATE,COMMIT# from dba_2pc_pending;
LOCAL_TRAN_ID GLOBAL_TRAN_ID STATE COMMIT#
---------------------- ---------------------------------------- ---------------- ----------------
5.26.883 ORCL.6c28f3e5.5.26.883 prepared 1045010
SQL> ROLLBACK FORCE '5.26.883';
模拟 hang 住
SQL> select * from emp;
ERROR:
ORA-01591: lock held by in-doubt distributed transaction 5.26.883
删除基表信息
SQL> set transaction use rollback segment SYSTEM;
SQL> delete from sys.pending_trans$ where local_tran_id = '5.26.883';
SQL> delete from sys.pending_sessions$ where local_tran_id = '5.26.883';
SQL> delete from sys.pending_sub_sessions$ where local_tran_id = '5.26.883';
SQL> commit;
SQL> select * from emp;
ERROR:
ORA-01591: lock held by in-doubt distributed transaction 5.26.883
SQL> shutdown immediate
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL> startup
ORACLE instance started.
Total System Global Area 400846848 bytes
Fixed Size 2253664 bytes
Variable Size 176164000 bytes
Database Buffers 218103808 bytes
Redo Buffers 4325376 bytes
Database mounted.
Database opened.
SQL> conn local/local
Connected.
重启数据库后事务未释放
SQL> select * from emp;
ERROR:
ORA-01591: lock held by in-doubt distributed transaction 5.26.883
SQL> select LOCAL_TRAN_ID,GLOBAL_TRAN_ID,STATE,COMMIT# from dba_2pc_pending;
no rows selected
SQL> SELECT KTUXEUSN, KTUXESLT, KTUXESQN, /* Transaction ID */ KTUXESTA Status,KTUXECFL Flags FROM x$ktuxe WHERE ktuxesta!='INACTIVE' AND ktuxeusn= 5;
KTUXEUSN KTUXESLT KTUXESQN STATUS FLAGS
---------- ---------- ---------- ---------------- ------------------------
5 26 883 PREPARED SCO|COL|REV|DEAD
SQL> commit force '5.26.883';
commit force '5.26.883'
*
ERROR at line 1:
ORA-02058: no prepared transaction found with ID 5.26.883
SQL> rollback force '5.26.883';
rollback force '5.26.883'
*
ERROR at line 1:
ORA-02058: no prepared transaction found with ID 5.26.883
raises ORA-02058 since dba_2pc views are empty. In order to use commit force or rollback force a dummy record should be inserted into pending_trans$ as follows:
insert into pending_trans$ (
LOCAL_TRAN_ID,
GLOBAL_TRAN_FMT,
GLOBAL_ORACLE_ID,
STATE,
STATUS,
SESSION_VECTOR,
RECO_VECTOR,
TYPE#,
FAIL_TIME,
RECO_TIME)
values( '5.26.883', /* <== Replace this with your local tran id */
306206, /* */
'XXXXXXX.12345.1.2.3', /* These values can be used without any */
'prepared','P', /* modification. Most of the values are */
hextoraw( '00000001' ), /* constant. */
hextoraw( '00000000' ), /* */
0, sysdate, sysdate );
insert into pending_sessions$
values( '5.26.883',/* <==Replace only this with your local tran id */
1, hextoraw('05004F003A1500000104'),
'C', 0, 30258592, '',
146
);
commit;
在插入模拟事务数据后,可查询到事务信息,强制提交后,事务完成
SQL> select LOCAL_TRAN_ID,GLOBAL_TRAN_ID,STATE,COMMIT# from dba_2pc_pending;
LOCAL_TRAN_ID GLOBAL_TRAN_ID STATE COMMIT#
---------------------- ---------------------------------------- ---------------- ----------------
5.26.883 XXXXXXX.12345.1.2.3 prepared 1045010
commit force '5.26.883';
SQL> execute DBMS_TRANSACTION.PURGE_LOST_DB_ENTRY('5.26.883');
PL/SQL procedure successfully completed.
SQL> commit;
Commit complete.
SQL> select LOCAL_TRAN_ID,GLOBAL_TRAN_ID,STATE,COMMIT# from dba_2pc_pending;
no rows selected
三 、 分布式事务案例总结
分布式事务的手工处理主要按 dba_2pc_pending 的结果来灵活对应。
当 dba_2pc_pending 中的数据时,主要看 state字段,
1) 如果 state状态是prepared,表示事务未提交或回滚,此时需要手工进行强制回滚或提交。
ROLLBACK FORCE 'transaction_id';
-OR-
COMMIT FORCE 'transaction_id','commit#';
2)如果state状态是committed, rollback forced或者commit forced状态,表示事务已经完成了,但是在FORGET阶段处理时,数据库字典的信息没能及时清除。此时,我们调用oracle的清理丢失事务信息的语句就可以完成处理:
execute DBMS_TRANSACTION.PURGE_LOST_DB_ENTRY('5.26.883');
Commit;
3 ) 当 dba_2pc_pending 中无数据时
有可能基表 pending_trans$ , pending_sessions$ 被执行过删除操作,此时需要向表中重新插回伪造的数据后再进行强制回滚或提交。