HP PHKL_40208补丁

Applies to:
Oracle Server - Enterprise Edition - Version: 10.2.0.3 to 11.1.0.7
HP-UX Itanium
Symptoms
syslog.16:Aug 16 16:55:46 s51uf33 vmunix: kthread: table is full
syslog.16:Aug 16 16:56:09 s51uf33 vmunix: kthread: table is full
syslog.16:Aug 16 16:58:46 s51uf33 vmunix: kkthread: table is full
syslog.16:Aug 16 16:58:46 s51uf33 vmunix: kthread: tablket hirse afdu:l lt
syslog.16:Aug 16 16:58:46 s51uf33 vmunix: kthread: table is full
syslog.16:Aug 16 16:58:46 s51uf33 vmunix: kthread: table is full
syslog.16:Aug 16 16:58:46 s51uf33 vmunix: kkthread: table is full
syslog.16:Aug 16 16:58:46 s51uf33 vmunix: kthread: tablekthread: table is full
syslog.16:Aug 16 16:58:47 s51uf33 vmunix: kthread:k ttharbelaed :i st afbulle is full
syslog.16:Aug 16 16:58:48 s51uf33 vmunix: kthread: table is fkthread: table is full
....

There are a lot of ragimon threads.

[s51uf33:/var/adm/crash/crash.16]# Threads -B ; pview | grep racgimon | wc -l


# Loading all inuse kthreads ...
Loaded 4740 kthread_t entries in 'DefaultView'
3360

--> 33360 racgimon threads running

The following commands hang on all nodes:

srvctl status service -d orcl

and

crs_stat -p

Also pstack for crs_stat shows:

$ pstack 27576
27576: /opt/crs/oracle/product/10.2.0/crs/bin/crs_stat.bin

-------------------------------- lwpid : 2554775 ----------------------------
--

0: c000000000439190 : _poll_sys() + 0x30 (/usr/lib/hpux64/libc.so.1)
1: c00000000044d7a0 : poll() + 0xe0 (/usr/lib/hpux64/libc.so.1)
2: c00000000972bad0 : sntrecvhdl() + 0x1b0
(/opt/crs/oracle/product/10.2.0/crs/lib/libttsh10.so)
3: c000000009726c80 : ntevque() + 0x260
(/opt/crs/oracle/product/10.2.0/crs/lib/libttsh10.so)
4: c000000009687bf0 : nsevwait() + 0x9e0
(/opt/crs/oracle/product/10.2.0/crs/lib/libttsh10.so)
5: c000000007347d80 : clsc_cvtimewait() + 0x1730
(/opt/crs/oracle/product/10.2.0/crs/lib/libhasgen10.so)
6: c0000000073406e0 : clsc_select_ext() + 0x2e0
(/opt/crs/oracle/product/10.2.0/crs/lib/libhasgen10.so)
7: c00000000733a4d0 : clscsendstatus() + 0x1ac0
(/opt/crs/oracle/product/10.2.0/crs/lib/libhasgen10.so)
8: c0000000073300b0 : clscreceive() + 0x1030
(/opt/crs/oracle/product/10.2.0/crs/lib/libhasgen10.so)
9: c000000007343a60 : clsc_event_hndlr() + 0x1f40
(/opt/crs/oracle/product/10.2.0/crs/lib/libhasgen10.so)
10: c000000007353a90 : clsaauthmsg() + 0x270
(/opt/crs/oracle/product/10.2.0/crs/lib/libhasgen10.so)
11: c000000007354d50 : clsavalidate() + 0x490
(/opt/crs/oracle/product/10.2.0/crs/lib/libhasgen10.so)
12: c000000007329510 : clscanswer() + 0x3e00
(/opt/crs/oracle/product/10.2.0/crs/lib/libhasgen10.so)
13: c00000000732be10 : clscconnect() + 0x18f0
(/opt/crs/oracle/product/10.2.0/crs/lib/libhasgen10.so)
14: c000000007188df0 : proac_init() + 0x710
(/opt/crs/oracle/product/10.2.0/crs/lib/libocr10.so)
15: c000000007195ef0 : proa_init() + 0x2dd0
(/opt/crs/oracle/product/10.2.0/crs/lib/libocr10.so)
16: c0000000071adb80 : procr_init_ext2() + 0x260
(/opt/crs/oracle/product/10.2.0/crs/lib/libocr10.so)
17: c0000000071ad880 : procr_init_ext() + 0x70
(/opt/crs/oracle/product/10.2.0/crs/lib/libocr10.so)
18: c000000007355760 : clse_init() + 0x120
(/opt/crs/oracle/product/10.2.0/crs/lib/libhasgen10.so)
Memory fault(coredump)

Changes
Specific to HP-UX Itanium operating system.
Cause
From the Operating System level investigation by the OS support the problem has been confirmed to be caused due to hitting OS bug ( QX:QXCR1000940361 )

( QX:QXCR1000940361 )
When the timeout expiry and the wakeup for a time based
sleep on a synchronization object happens simultaneously
in the kernel, there is a possibility of sleep returning
with ETIMEDOUT at the same time when the wakeup also
reports waking up of the thread.
Resolution:
Corrected the return value of sleep in case timeout expiry
and the wakeup for a time based sleep on a synchronization
object happens simultaneously in the kernel.
Solution
Apply the operating system patch PHKL_40208
References
@ ( QX:QXCR1000940361 ) patch PHKL_40208 many racgimon

Applies to:
Oracle Server - Enterprise Edition - Version: 10.2.0.4 - Release: 10.2
Information in this document applies to any platform.
Symptoms
On Oracle clusterware 10.2.0.4 version, the following error occurs.

2009-11-11 10:48:58.085: [ CRSEVT][53546] CAAMonitorHandler :: 0:Could not join /oracle/product/10.2/bin/racgwrap(check)
category: 1234, operation: scls_process_join, loc: childcrash, OS error: 0, other: Abnormal termination of the child


Cause
The problem was related to HP bug number is QXCR1000940361.
Solution
Please apply HP patch PHKL_40208.

[@more@]
请使用浏览器的分享功能分享到微信等