gc current/cr block busy等待事件

最近遇到一个性能问题,top 5等待事件为 log file sync +gc cr block busy,于是总结下这俩等待事件以及他俩之间的关系

一:gc current/cr block busy等待事件

首先知道:这里CR和current 是不同的概念,如果是读的话,那就是cr request,如果是更改的话,那就是current request(当前读)。
1)gc current block busy 等待事件
When a request needs a block in current mode, it sends a request to the master instance. The requestor evenutally gets the block via cache fusion transfer. However sometimes the block transfer  is delayed due to either the block was being used by a session on another instance or the block transfer was delayed because the holding instance could not writethe corresponding redo records to the online logfile immediately. 
当请求的block是current模式,会发送一个请求到master 实例,最终请求者通过cache fusion获取到这个block。但是有时block在transfer过程中会有延时,比如这个block正在被另一个实例的会话使用,或者持有block的实例不能及时的将redo records写入online logfile。

One can use the session level dynamic performance views v$session and v$session_event to find the programs or sesions causing the most waits on this events 

SQL> select a.sid , a.time_waited , b.program , b.module from v$session_event a , v$session b where a.sid=b.sid and a.event='gc current block busy' order by a.time_waited;


2)gc cr block busy 等待事件  
When a request needs a block in CR mode, it sends a request to the master instance. The requestor evenutally gets the block via cache fusion transfer. However sometimes the block transfer is delayed due toeither the block was being used by a session on another instance or the block transfer was delayed because the holding instance could not write the corresponding redo records to the online logfile immediately. 

One can use the session level dynamic performance views v$session and v$session_event to find the programs or sessions causing the most waits on this events 


SQL>  select a.sid , a.time_waited , b.program , b.module from v$session_event  a ,v$session b where a.sid=b.sid and a.event='gc cr block busy' order by a.time_waited;

3) 相关说明
gc current block busy 等待是RAC中global cache全局缓存当前块的争用等待事件, 该等待事件时长由三个部分组成:
 
Time to process current block request in the cache= (pin time + flush time + send time)
gc current block flush time
The current block flush time is part of the service (or processing) time for a currentblock. The pending redo needs to be flushed to the log file by LGWR before LMSsends it. The operation is asynchronous in that LMS queues the request, postsLGWR, and continues processing. The LMS would check its log flush queue forcompletions and then send the block, or go to sleep and be posted by LGWR. Theredo log write time and redo log sync time can influence theoverall service time significantly.
 
flush time 是为了保证Instance Recovery实例恢复机制,而要求每一个current block在本地节点local instance被修改后(modify/update) 必须要将该current block相关的redo 写入到logfile 后(要求LGWR必须完成写入后才能返回),才能由LMS进程传输给其他节点使用。(前提是当rac中的另一个节点需要读取的时候才会触发LMS去传输给其他节点,即cache fusion)------这里就会导致log file sync等待事件的产生!!!

4)gc buffer busy acquire/release
而gc buffer busy acquire/release 往往是 gc current block busy的衍生产品, 当同一实例内的多个进程并发地访问同一个数据块时 ,首先发起的进程 将进入 gc current block busy的等待 ,而在 buffer waiter list 上的后续进程 会陷入gc buffer busy acquire/release 等待(A user on the same instance has started a remote operation on the same resource and the request has not completed yet or the block was requested by another node and the block has not been released by the local instance when the new local access was made), 这里存在一个排队效应, 即 gc current block busy是缓慢的,那么在 排队的gc buffer busy acquire/release就会更慢:

Pin time = (timeto read the block into cache) + (time to modify/process the buffer)
Busy time =(average pin time) * (number of interested users waiting ahead of me)

不局限于current block (reference AWR Avg global cache current block flush time(ms)),  cr block(Avg global cache cr block flush time (ms)) 也存在flush time。

可以通过设置_cr_server_log_flush to false(LMS are/is waiting for LGWR to flush the pending redo during CR fabrication.Without going too much in to details, you can turn off the behaviour by setting   _cr_server_log_flush to false.) 来禁止cr server flush redo log,但是该参数对于current block的flush time无效, 也强烈不推荐使用。

二:解决办法
针对gc cr block busy等待事件
1)修改应用,尽量避免跨节点获取数据,该方法同时对gc current block busy 等待事件flush time也有效果!
2)通过设置_cr_server_log_flush to false,来禁止cr server flush redo log,但是该参数对于current block的flush time无效;
3)可以提高redo log file的磁盘io吞吐能力 (该方法治标不治本,如果 log file parallel write等待事件和log file sync等待事件的时间差 ,如果两者的时间接近,则说明存储IO资源紧张是引起log  file sync的主要原因,因为log file parallel write只包括io的部分)
三:为什么设置_cr_server_log_flush to false对gc current block busy等待事件没有效果?
首先gc current block busy是针对更新而言的,一个节点更新了某个块没有提交,这个时候另一个节点也需要更新这个块,那么这个时候需要当前读,就可能发生gc current block busy等待事件,然后flush time 是为了保证Instance Recovery实例恢复机制,而要求每一个current block在本地节点local instance被修改后(modify/update) 必须要将该current block相关的redo 写入到logfile 后,(要求LGWR必须完成写入后才能返回),才能由LMS进程传输给其他节点使用。(前提是当rac中的另一个节点需要读取的时候才会触发LMS去传输给其他节点,即cache fusion),所以gc current block busy等待事件意味着另一个节点是在这个块的基础上再次修改,而Oracle rac中各自节点都有各自的undo表空间,所以在刚开始修改块的节点必须要将该current block相关的redo 写入到logfile 来保证实例恢复一定能成功!   针对gc cr block busy 等待事件就不一样了,他是针对跨节点查询的,不涉及跨节点实例恢复,所以可以通过设置_cr_server_log_flush to false,来禁止cr server flush redo log





请使用浏览器的分享功能分享到微信等