通常情况下是因为等待其他节点传送块,如果出现太多次,则意味着集群之间有contentionhttp://www.ardentperf.com/2007/09/12/gc-buffer-busy-waits-in-rac-finding-hot-blocks/ 过去几个小时系统比较慢,通过查询dba_hist_active_sess_history了解一下经历的等待事件首先要确定这一时间段对应的snapshotselect min(begin_interval_time) min, max(end_interval_time) maxfrom dba_hist_snapshotwhere snap_id between 12831 and 12838; MIN MAX------------------------------ ------------------------------
12-SEP-07 09.00.17.451 AM 12-SEP-07 05.00.03.683
查看过去一段时间的等待事件类型汇总
select wait_class_id, wait_class, count(*) cnt
from dba_hist_active_sess_history
where snap_id between 12831 and 12838
group by wait_class_id, wait_class
order by 3;
WAIT_CLASS_ID WAIT_CLASS CNT------------- ------------------------------ ---------- 3290255840 Configuration 169 2000153315 Network 934 4108307767 System I/O 7199 3386400367 Commit 7809 4217450380 Application 12248 3875070507 Concurrency 14754 1893977003 Other 35499 3871361733 Cluster 104810
1740759767 User I/O 121999
可以更细化为具体的等待事件
select event_id, event, count(*) cnt from dba_hist_active_sess_historywhere snap_id between 12831 and 12838 and wait_class_id=3871361733group by event_id, eventorder by 3;1457266432 gc current split 2292685450749 gc current grant 2-way 290 957917679 gc current block lost 579 737661873 gc cr block 2-way 6992277737081 gc current grant busy 9913570184881 gc current block 3-way 11903151901526 gc cr block lost 1951 111015833 gc current block 2-way 20783046984244 gc cr block 3-way 2107 661121159 gc cr multi block request 40923201690383 gc cr grant 2-way 41291520064534 gc cr block busy 45762701629120 gc current block busy 14379
1478861578 gc buffer busy 67275
--在这段时间内累计有67275个session等待gc buffer busy,然后查看出是哪些sql引起的
select sql_id, count(*) cnt from dba_hist_active_sess_historywhere snap_id between 12831 and 12838and event_id in (1478861578)group by sql_idhaving count(*)>1000order by 2; SQL_ID CNT------------- ----------22ggtj4z9ak3a 1574gsqhbt5a6d4uv 1744cyt90uk11a22c 224039dtqqpr7ygcw 4251
8v3b2m405atgy 42292
运行次数最多的sql达到4万多次,通过dba_hist_sqltext查看出对应的sql
insert into bigtable(id, version, client, cl_business_id, cl_order_id, desc。。。。。
该表是个分区表,且其上有很多trigger和大量的索引,需要明确该sql等待的对象
select current_obj#, count(*) cnt from dba_hist_active_sess_historywhere snap_id between 12831 and 12838and event_id=1478861578 and sql_id='8v3b2m405atgy'group by current_obj#order by 2;CURRENT_OBJ# CNT------------ ---------- 0 511 3122795 617 3064433 880 3208619 3913 3208620 5411 3208618 22215
后三个是次数比较多的
select object_id, owner, object_name, subobject_name, object_type from dba_objectswhere object_id in (3208618, 3208619, 3208620); OBJECT_ID OWNER OBJECT_NAME SUBOBJECT_NAME OBJECT_TYPE---------- ---------- ------------------------------ ------------------------------ ------------------- 3208618 JSCHDER BIGTABLE_LOG P_2007_09 TABLE PARTITION 3208619 JSCHDER BIGTABL_LG_X_ID P_2007_09 INDEX PARTITION
3208620 JSCHDER BIGTABL_LG_X_CHANGE_DATE P_2007_09 INDEX PARTITION
等待次数最多的却是bigtable_log,这是因为表上有一个trigger,每对bigtable更新一次,就要往该表insert7次;
我们可以进一步确认那些块竞争最激烈
select current_file#, current_block#, count(*) cntfrom dba_hist_active_sess_historywhere snap_id between 12831 and 12838and event_id=1478861578 and sql_id='8v3b2m405atgy'and current_obj# in (3208618, 3208619, 3208620)group by current_file#, current_block#having count(*)>50order by 3; CURRENT_FILE# CURRENT_BLOCK# CNT------------- -------------- ---------- 1330 238073 51 1542 22645 55 1487 237914 56 1330 238724 61 1330 244129 76
1487 233206 120
似乎并没有过热的块,因为有超过4万个session访问这些object,但等待次数最多的块只有120次;
检查一下这些块是否为段头块,
select segment_name, header_file, header_blockfrom dba_segments where wner='JHEIDER' and partition_name='P_2007_09'and segment_name in ('PLACEMENTS_LOG','PLCMNTS_LG_X_ID', 'PLCMNTS_LG_X_CHANGE_DATE'); SEGMENT_NAME HEADER_FILE HEADER_BLOCK------------------------------ ----------- ------------BIGTABL_LG_X_CHANGE_DATE 1207 204809BIGTABL_LG_X_ID 1207 196617
BIGTABLE_LOG 1209 16393
可惜这些只是普通的数据块。
任何时候,只要表数据块在insert时候有严重的并发问题,首先要想到的就是space management;
检查该表所属表空间的管理方式
select distinct tablespace_name from dba_tab_partitionswhere table_name='BIGTABLE_LOG'; TABLESPACE_NAME------------------------------BIGTABLE_LOG_DATA select extent_management, allocation_type, segment_space_managementfrom dba_tablespaces where tablespace_name='BIGTABLE_LOG_DATA'; EXTENT_MAN ALLOCATIO SEGMEN---------- --------- ------LOCAL USER MANUAL
SQL> select distinct freelists, freelist_groups from dba_tab_partitions 2 where table_name='BIGTABLE_LOG'; FREELISTS FREELIST_GROUPS---------- ---------------
1 1
其表空间使用手工段管理且只有一个freelist;
小结:这是一个6节点RAC,最忙的表却只有一个freelist,难怪会引起严重的gc buffer busy;
但是因为每个块都很快的被填满,所以没有等待次数过多的块;
不需要考虑ITL的问题,因为插入的都是空块,ITL如果不够会自动分配直到块满为止;
该系统是从non-rac升级到RAC的,因此所有的object都是MSSM,解决完这个object其他object的问题则会后续显现;