关于切换时primary-primary 双主问题,看起来是因为使用“alter database switchover to xx” 执行切换时的流程上缺陷问题:
首先在以往传统的switchover切换方式下,切换的流程是这样的:
假如当前A库(primary),B库(standby)
1. 在A库(primary)库上,执行切换语句"alter database commit to switchover to physical standby with session shutdown"。此时主库A在最后一个redo文件头中的EOR(End-of-redo)置为YES,表示其是当前作为主库的最后一个日志。
--dump出最后一个归档文件头:
ALTER SYSTEM DUMP LOGFILE '/home/oracle/archive/1_59_1047029376.dbf' RBA MIN 1 1 RBA MAX 1 1;
DUMP OF REDO FROM FILE '/home/oracle/archive/1_59_1047029376.dbf'
Container ID: 0
Container UID: 0
Opcodes *.*
Container ID: 0
Container UID: 0
RBAs: 0x000001.00000001.0000 thru 0x000001.00000001.0000
SCNs: scn: 0x0000000000000000 thru scn: 0xffffffffffffffff
……..
Largest LWN: 0 blocks
End-of-redo stream : Yes <<<<<<<<<<<
Unprotected mode
2. A库将这个日志发送到B(standby)库上。
3. B库接收到这个日志后并应用到EOR(End-of-redo)时,才将v$database.switchover_status从"not allowed" 切换为 "to primary" 为 "sessions active". 此时B standby库便可以开始进行切换为主库。
4. 原standby库b库在没有接收到EOR前,v$database.switchover_status时not allowed,代表不会出现 primary-primary情况出现。
当在19c下使用alter database switchover to xx的方法切换,其流程如下:
假如依然如同以上A库(primary),B库(standby),
1. 此时我在A库上执行alter database switchover to B;命令,将主库切换到B库;
2. 开始切换后,A库会发送角色转换的指令到B库(standby),此时,B库角色提升为primary,B库实例重新启动到MOUNT状态下:
OPEN_MODE DATABASE_ROLE SWITCHOVER_STATUS
-------------------- ---------------- --------------------
MOUNTED PRIMARY NOT ALLOWED
3. A库则被shutdown abort掉,重新启动A库并启动应用日志时,此时的switchover_status状态变为to primary:
启动到mount状态:
OPEN_MODE DATABASE_ROLE SWITCHOVER_STATUS
-------------------- ---------------- --------------------
MOUNTED PHYSICAL STANDBY RECOVERY NEEDED
启动日志应用:
OPEN_MODE DATABASE_ROLE SWITCHOVER_STATUS
-------------------- ---------------- --------------------
MOUNTED PHYSICAL STANDBY TO PRIMARY
4.查看alert日志发现时A库recover了65号归档,这个归档其实A库做primary库时的最后一个归档。
2021-09-24T17:48:32.739450+08:00
TT02 (PID:40943): LOG_FILE_NAME_CONVERT is not defined, stop clearing ORLs
2021-09-24T17:48:32.870791+08:00
PR00 (PID:40937): Media Recovery Log /home/oracle/archive/1_65_1047029376.dbf
2021-09-24T17:48:32.941701+08:00
PR00 (PID:40937): Resetting standby activation ID 0 (0x0)
2021-09-24T17:48:32.944149+08:00
对这个归档的文件头dump出来时发现其EOR是YES.如上面的机制所说,这也是为何此时A库的状态可以为to primary的原因:
DUMP OF REDO FROM FILE '/home/oracle/archive/1_65_1047029376.dbf'
Container ID: 0
Container UID: 0
Opcodes *.*
Container ID: 0
Container UID: 0
RBAs: 0x000001.00000001.0000 thru 0x000001.00000001.0000
SCNs: scn: 0x0000000000000000 thru scn: 0xffffffffffffffff
Times: creation thru eternity
FILE HEADER:
Compatibility Vsn = 318767104=0x13000000
Db ID=2146799166=0x7ff58e3e, Db Name='ORCL193'
Activation ID=2183894505=0x822b95e9
Control Seq=1084123756=0x409e6a6c, File size=409600=0x64000
File Number=3, Blksiz=512, File Type=2 LOG
…….
End-of-redo stream : Yes<<<<<<<<<<
Unprotected mode
Miscellaneous flags: 0x88000a9
Miscellaneous second flags: 0x0
Thread internal enable indicator: thr: 0, seq: 0 scn: 0x0000000000000000
此时,你刚好再去将B库提升为主库,便出现primary-primary 双主的情况了。
5.这应该是一种流程上的缺陷,因为在原standby库B库提升为primary后,应该先将其open起来,再到新的standby库启动应用日志,此时新的standby库应用到新主库的日志,switchover_status的状态便变回not allowed:
2021-09-24T18:35:31.461468+08:00
PR00 (PID:40937): Media Recovery Log /home/oracle/archive/1_66_1047029376.dbf
PR00 (PID:40937): Media Recovery Waiting for T-1.S-67 (in transit)
OPEN_MODE DATABASE_ROLE SWITCHOVER_STATUS
-------------------- ---------------- --------------------
MOUNTED PHYSICAL STANDBY NOT ALLOWED