在Suse 12.4上安装11.2.0.4的rac执行root.sh报错“ORA-12547: TNS:lost contact”--__lll_unlock_elision

操作系统：SLES_SAP-release-12.4-1.131.x86_64

在节点1上执行root.sh，报错： /u01/app/11.2.0/grid/root.sh

Performing root user operation for Oracle 11g 
The following environment variables are set as:
    ORACLE_OWNER= grid
    ORACLE_HOME=  /u01/app/11.2.0/grid
   Copying dbhome to /usr/local/bin ...
   Copying oraenv to /usr/local/bin ...
   Copying coraenv to /usr/local/bin ...
Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root script.
Now product-specific root actions will be performed.
Using configuration parameter file: /u01/app/11.2.0/grid/crs/install/crsconfig_params
User ignored Prerequisites during installation
Installing Trace File Analyzer
Adding Clusterware entries to inittab
CRS-2672: Attempting to start 'ora.mdnsd' on 'v3erpzgd01'
CRS-2676: Start of 'ora.mdnsd' on 'v3erpzgd01' succeeded
CRS-2672: Attempting to start 'ora.gpnpd' on 'v3erpzgd01'
CRS-2676: Start of 'ora.gpnpd' on 'v3erpzgd01' succeeded
CRS-2672: Attempting to start 'ora.cssdmonitor' on 'v3erpzgd01'
CRS-2672: Attempting to start 'ora.gipcd' on 'v3erpzgd01'
CRS-2676: Start of 'ora.cssdmonitor' on 'v3erpzgd01' succeeded
CRS-2676: Start of 'ora.gipcd' on 'v3erpzgd01' succeeded
CRS-2672: Attempting to start 'ora.cssd' on 'v3erpzgd01'
CRS-2672: Attempting to start 'ora.diskmon' on 'v3erpzgd01'
CRS-2676: Start of 'ora.diskmon' on 'v3erpzgd01' succeeded
CRS-2676: Start of 'ora.cssd' on 'v3erpzgd01' succeeded
ASM failed to start. Check /u01/app/grid/cfgtoollogs/asmca/asmca-200507PM121627.log for details.
Configuration of ASM ... failed
see asmca logs at /u01/app/grid/cfgtoollogs/asmca for details
Did not succssfully configure and start ASM at /u01/app/11.2.0/grid/crs/install/crsconfig_lib.pm line 6912.
/u01/app/11.2.0/grid/perl/bin/perl -I/u01/app/11.2.0/grid/perl/lib -I/u01/app/11.2.0/grid/crs/install /u01/app/11.2.0/grid/crs/install/rootcrs.pl execution failed

查看文件：/u01/app/grid/cfgtoollogs/asmca/asmca-200507PM121627.log，错误如下：

[main] [ 2020-05-07 09:50:51.837 CST ] [UsmcaLogger.logException:174]  ORA-12547: TNS:lost contact
[main] [ 2020-05-07 09:50:51.837 CST ] [UsmcaLogger.logException:175]  oracle.sysman.assistants.util.sqlEngine.SQLFatalErrorException: ORA-12547: TNS:lost contact
oracle.sysman.assistants.util.sqlEngine.SQLEngine.executeImpl(SQLEngine.java:1658)
oracle.sysman.assistants.util.sqlEngine.SQLEngine.connect(SQLEngine.java:981)
oracle.sysman.assistants.usmca.backend.USMInstance.connectToASM(USMInstance.java:626)
oracle.sysman.assistants.usmca.backend.USMInstance.configureLocalASM(USMInstance.java:3016)
oracle.sysman.assistants.usmca.service.UsmcaService.configureLocalASM(UsmcaService.java:1049)
oracle.sysman.assistants.usmca.model.UsmcaModel.performConfigureLocalASM(UsmcaModel.java:944)
oracle.sysman.assistants.usmca.model.UsmcaModel.performOperation(UsmcaModel.java:797)
oracle.sysman.assistants.usmca.Usmca.execute(Usmca.java:174)
oracle.sysman.assistants.usmca.Usmca.main(Usmca.java:369)
[main] [ 2020-05-07 09:50:51.837 CST ] [UsmcaLogger.logInfo:143]  ASM failed to start. Check /u01/app/grid/cfgtoollogs/asmca/asmca-200507AM095050.log for details.
[main] [ 2020-05-07 09:50:51.837 CST ] [UsmcaLogger.logInfo:143]  Instance running false
[main] [ 2020-05-07 09:50:51.837 CST ] [UsmcaLogger.logInfo:143]  ASM failed to start. Check /u01/app/grid/cfgtoollogs/asmca/asmca-200507AM095050.log for details.

其实看不出来啥，总之就是不能创建ASM实例，然后去检查ASM实例的告警日志发现错误如下：

System parameters with non-default values:
  large_pool_size          = 12M
  instance_type            = "asm"
  remote_login_passwordfile= "EXCLUSIVE"
  asm_diskstring           = "/dev/oracleasm/asm-*"
  asm_power_limit          = 1
  diagnostic_dest          = "/u01/app/grid"
Cluster communication is configured to use the following interface(s) for this instance
  10.206.110.3
cluster interconnect IPC version:Oracle UDP/IP (generic)
IPC Vendor 1 proto 2
Process PMON died, see its trace file
USER (ospid: 14306): terminating the instance due to error 443
Instance terminated by USER, pid = 14306
Thu May 07 12:16:34 2020
Exception [type: SIGSEGV, SI_KERNEL(general_protection)] [ADDR:0x0] [PC:0x7F8962019640, __lll_unlock_elision()+48] [flags: 0x0, count: 1]
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.

主要报错信息：__lll_unlock_elision，用这个关键词去搜，基本上都是bug。

执行root.sh的时候的详细日志：$GRID_HOME/cfgtoollogs/crsconfig/rootcrs_主机名.log

解决： https://www.novell.com/support/kb/doc.php?id=7022289

export LD_LIBRARY_PATH=/lib64/noelision/:$LD_LIBRARY_PATH
echo "/lib64/noelision" > /etc/ld.so.conf.d/noelision.conf
ldconfig
vi /etc/ld.so.conf
添加：/lib64/noelision
ln -s /lib64/noelision/libpthread-2.22.so /u01/app/11.2.0/grid/lib/libpthread.so.0

How to disable Hardware Lock Elision

This document (7022289) is provided subject to the disclaimer at the end of this document.

Environment

SUSE Linux Enterprise Server 12 (SLES 12)
SUSE Linux Enterprise Server 12 Service Pack 1 (SLES 12 SP1)
SUSE Linux Enterprise Server 12 Service Pack 2 (SLES 12 SP2)
SUSE Linux Enterprise Server 12 Service Pack 3 (SLES 12 SP3)
SUSE Linux Enterprise Server 12 Service Pack 4 (SLES 12 SP4)
SUSE Linux Enterprise Server 12 Service Pack 5 (SLES 12 SP5)
SUSE Linux Enterprise Server 15 (SLES 15)
SUSE Linux Enterprise Server 15 Service Pack 1 (SLES 15 SP1)

Situation

Some third-party applications installed on SUSE Linux Enterprise Server, may require that the 'Hardware Lock Elision' functionality is disabled.

Resolution

Temporary solution: Add the appropriate noelision path statement to the beginning of the library load path (LD_LIBRARY_PATH) so that the noelision libraries are used in preference to any others appearing later in the path

e.g.   export LD_LIBRARY_PATH=/lib64/noelision/:$LD_LIBRARY_PATH

                                  On server reboot, this change will be lost.

Permanent solution: Create file /etc/ld.so.conf.d/noelision.conf

                                 Add the appropriate line: e.g.    /lib64/noelision

                                 After saving the noelision.conf changes, run ` ldconfig` to rebuild caches.

Cause

There is a bug in some Intel CPUs which does not handle the Hardware Lock Elision correctly.

Additional Information

Oracle grid software is susceptible to this CPU bug. If elision locking is not disabled on servers with such faulty CPUs, Oracle may crash during initialization.

The ' noelision' 32 bit and 64 bit libraries are found here:-

   /lib/noelision
   /lib64/noelision

NOTE: Make sure that /etc/ld.so.conf includes the /etc/ld.so.conf.d/ directory (default) or add the directory you want to be included (the directory where you placed the noelision.conf file)   * it is recommended to keep everything under /etc/ld.so.conf.d/

NOTE: The files present in the directories listed in /etc/ld.so.conf are applied in alpha-numerical file name order. You need to be aware of what is in each file, in each of the included directories, to make sure that the desired settings are being applied and not 'overwritten' or ignored.

Disclaimer

This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.

SLES 12: CLSRSC-366: Failed to import credentials for ASM During root.sh Execution When Adding New Node (Doc ID 2420338.1)

In this Document

Community Discussions

References

APPLIES TO:

Oracle Database Cloud Schema Service - Version N/A and later
Oracle Database Exadata Cloud Machine - Version N/A and later
Oracle Cloud Infrastructure - Database Service - Version N/A and later
Oracle Database Exadata Express Cloud Service - Version N/A and later
Oracle Database Backup Service - Version N/A and later
Information in this document applies to any platform.

SYMPTOMS

On : 12.2.0.1 version, Clusterware

Following error is reported during root.sh execution while adding a new node to 12.2 cluster environment

#/u01/app/12.2.0.1/grid # ./root.sh

Performing root user operation.

The following environment variables are set as:

ORACLE_OWNER= grid

ORACLE_HOME= /u01/app/12.2.0.1/grid

.

CRS-4133: Oracle High Availability Services has been stopped.

CRS-4123: Oracle High Availability Services has been started.

2018/06/25 17:34:46 CLSRSC-366: Failed to import credentials for ASM

Died at /u01/app/12.2.0.1/grid/crs/install/crsutils.pm line 8461.
The command '/u01/app/12.2.0.1/grid/perl/bin/perl -I/u01/app/12.2.0.1/grid/perl/lib -I/u01/app/12.2.0.1/grid/crs/install /u01/app/12.2.0.1/grid/crs/install/rootcrs.pl ' execution failed

root execution log indicates error when importing ASM credentials and core dump is created.

/crsdata/node3/crsconfig/rootcrs_node3_2018-06-25_05-33-24PM.log:

2018-06-25 17:34:45: It is an add node scenario
2018-06-25 17:34:45: Importing asm credentials
2018-06-25 17:34:45: Executing cmd: /u01/app/12.2.0.1/grid/bin/crsctl add credmaint -path ASM -local
2018-06-25 17:34:45: Command output:
> CRS-10405: (:CLSCRED0006:)Credential domain already exists.
> CRS-4000: Command Add failed, or completed with errors.
>End Command output
Jun 25 17:34:46. 20181657:2018-06-25 17:34:45: Executing cmd: /u01/app/12.2.0.1/grid/bin/crsctl setperm credmaint -o grid -path ASM -local
2018-06-25 17:34:46: Running as user grid: /u01/app/12.2.0.1/grid/bin/kfod op=credimport wrap=/u01/app/12.2.0.1/grid/gpnp/seed/asm/credentials.xml olr=TRUE force=TRUE
2018-06-25 17:34:46: s_run_as_user2: Running /bin/su grid -c ' echo CLSRSC_START; /u01/app/12.2.0.1/grid/bin/ kfod op=credimport wrap=/u01/app/12.2.0.1/grid/gpnp/seed/asm/credentials.xml olr=TRUE force=TRUE
'
2018-06-25 17:34:46: Removing file /tmp/xsFtxcsJX0
2018-06-25 17:34:46: Successfully removed file: /tmp/xsFtxcsJX0
2018-06-25 17:34:46: pipe exit code: 35584
2018-06-25 17:34:46: /bin/su exited with rc=139

2018-06-25 17:34:46: kfod op=credimport rc: 139
2018-06-25 17:34:46: Failed to enable flex ASM on local node, error: bash: line 1: 29674 Segmentation fault (core dumped) /u01/app/12.2.0.1/grid/bin/kfod op=credimport wrap=/u01/app/12.2.0.1/grid/gpnp/seed/asm/credentials.xml olr=TRUE force=TRUE

2018-06-25 17:34:46: Executing cmd: /u01/app/12.2.0.1/grid/bin/clsecho -p has -f clsrsc -m 366
2018-06-25 17:34:46: Command output:
> CLSRSC-366: Failed to import credentials for ASM

Generate stack trace from the generated core file using Doc ID 1812.1

Script started on Mon 25 Jun 2018 05:42:46 PM CEST
[grid@rac3/+ASM3 ~]> gdb /u01/app/12.2.0.1/grid/bin/kfod.bin /var/log[Kcal/dumps/core/ core.kfod.bin.27988
GNU gdb (GDB; SUSE Linux Enterprise 12) 8.0
Copyright (C) 2017 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-suse-linux".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:

Find the GDB manual and other documentation resources online at:

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /u01/app/12.2.0.1/grid/bin/kfod.bin...done.
[New LWP 27988]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by 'kfod.bin 12.2.0.1/grid/bin/kfod.bin op=credimport
wrap=/u01/app/12.2.0.1/grid/g'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x00007fdbc171a9db in raise () from /lib64/libpthread.so.0
Missing separate debuginfos, use: zypper install
glibc-debuginfo-2.22-61.3.x86_64 libaio1-debuginfo-0.3.109-17.15.x86_64
libgcc_s1-debuginfo-6.2.1+r239768-2.4.x86_64
libnuma1-debuginfo-2.0.9-9.1.x86_64
(gdb) bt
#0 0x00007fdbc171a9db in raise () from /lib64/libpthread.so.0
#1 0x00007fdbc6cc0f12 in skgesigOSCrash () from
/u01/app/12.2.0.1/grid/lib/libclntsh.so.12.1
#2 0x00007fdbc72e2295 in kpeDbgSignalHandler () from
/u01/app/12.2.0.1/grid/lib/libclntsh.so.12.1
#3 0x00007fdbc6cc1250 in skgesig_sigactionHandler () from
/u01/app/12.2.0.1/grid/lib/libclntsh.so.12.1
#4
#5 0x00007fdbc171c4a0 in __lll_unlock_elision () from /lib64/libpthread.so.0 <<<<<<<<<<<<<<<<<<<<<<<<
#6 0x00007fdbc2a9b794 in scls_iddb_is_a_privgrp_member_by_id (ctx=0x105b988, <<<<<<<<<<<<<<<<<<<<<<<<
ose=0x7ffe45412214, flags=256, u_id=0x105ba08, g_id=0x105be48,
result=0x7ffe45412204) at scls.c:3669
#7 0x00007fdbc25c5626 in proa_is_valid_user_group (ocrctx=0x1055e20,
sec_attr=0x10620a8, flag=0) at proa.c:15016
#8 0x00007fdbc25b1a31 in proa_write (ocrctx=0x1055e20, keyhandle=0x10618e8,
dtype=procr_dtype_NOVALUE, sec_attr=0x10620a8, invalue=0x1062134 "",
inout_size=0, write_option=10, flags=33554432)
at proa.c:5049
#9 0x00007fdbc25ba742 in proa_batch_execute (ocrctx=0x1055e20,
batch=0x105bc48, flags=33554432) at proa.c:9520
#10 0x00007fdbc25d9061 in procr_batch_execute (ocrctx=0x1055e20,
batch=0x105bc48, flags=0) at procr.c:8859
#11 0x00007fdbc33b3901 in clsCredOcrBatchExec (pOcrCtx=0x1055e20,
pBatch=0x105bc48, ocrFlags=0, pDruid=0x7fdbc3998a74 "(:CLSCRED0138:)",
msgId=10401, pErr=0xf0c4f0) at clsCredUtils.c:5463
#12 0x00007fdbc33b383b in clsCredStoreBatchExec (pDom=0x1025240,
pBatch=0x7ffe45417b60, pDruid=0x7fdbc3998a74 "(:CLSCRED0138:)",
pErr=0xf0c4f0) at clsCredUtils.c:5443

CHANGES

CAUSE

This issue was investigated in Bug 28268288 - ADDNODE FAIL WITH KFOD OP=CREDIMPORT CORE DUMP and it was concluded as a duplicate of Bug 26399297 - CLUSTER STARTUP FAILS IF HLE IS ENABLED

The issue caused by Hardware Lock Elision set to TRUE

SOLUTION

Bug 26399297 fixed in future release. Apply interim patch 26399297, if available for your platform and Oracle version and re-run root.sh. If no patch exists for your version, please contact Oracle Support for a backport request.

Workaround is to Disable Hardware Lock Elision and re-run root.sh.

Reference:

https://www.novell.com/support/kb/doc.php?id=7022289

Oracle Grid Infrastructure Install Fails on SuSE 12 when Running root.sh (Doc ID 2253054.1)

In this Document

APPLIES TO:

Oracle Database - Enterprise Edition - Version 12.1.0.1 and later
Oracle Database Cloud Schema Service - Version N/A and later
Oracle Database Exadata Express Cloud Service - Version N/A and later
Oracle Database Exadata Cloud Machine - Version N/A and later
Oracle Cloud Infrastructure - Database Service - Version N/A and later
Linux x86-64

SYMPTOMS

This problem can be encountered when installing Oracle Restart (SIHA) or Grid Infrastructure for a Cluster. The root.sh script fails with:

Using configuration parameter file: ./crsconfig_params
LOCAL ADD MODE
Creating OCR keys for user 'oracle', privgrp 'dba'..
Operation successful.
LOCAL ONLY MODE
Successfully accumulated necessary OCR keys.
Creating OCR keys for user 'root', privgrp 'root'..
Operation successful.
CRS-4664: Node ldwebi101 successfully pinned.
2017/03/24 15:52:51 CLSRSC-330: Adding Clusterware entries to file 'oracle-ohasd.service'

PRCR-1006 : Failed to add resource ora.ons for ons
PRCR-1115 : Failed to find entities of type resource type that match filters
(TYPE_NAME ends .type) and contain attributes
CRS-0184 : Cannot communicate with the CRS daemon.
2017/03/24 15:53:03 CLSRSC-180: An error occurred while executing the command
'srvctl add ons' (error code 0)

Review of the OHASD Trace ($DIAG_DEST/crs//crs/trace/ohasd.trc) will show OHASD crashing with Signal 11:

2017-03-29 08:09:27.014827 :OHASDMAIN:1063044736: OHASD Daemon Starting. Command string :restart
2017-03-29 08:09:27.014839 :OHASDMAIN:1063044736: OHASD params []
2017-03-29 08:09:27.015156 :OHASDMAIN:1063044736: Initializing OLR
. . .
2017-03-29 08:11:09.927125 : default:121607936: clsvactversion:4: Retrieving
Active Version from local storage.
2017-03-29 08:11:09.947361 :UiServer:121607936: {0:0:54} Container [ Name:
UI_REGISTER_TYPE
API_HDR_VER:
TextMessage[3]
ATTR_LIST:
TextMessage[BASE_TYPE=ora.local_asm.typeBASE_TYPE=_STRINGBASE_TYPE=_READONL
Y TYPE_NAME=ora.asm.typeTYPE_NAME=_STRINGTYPE_NAME=_READONLY ]
CLIENT:
TextMessage[]
CLIENT_NAME:
TextMessage[/opt/oracle/product/grid_12102/bin/crsctl.bin]
CLIENT_PID:
TextMessage[31662]
CLIENT_PRIMARY_GROUP:
TextMessage[dba]
LOCALE:
TextMessage[AMERICAN_AMERICA.AL32UTF8]
QUEUE_TAG:
TextMessage[1]
RESOURCE:
TextMessage[ora.asm.type]
]
Trace file /opt/oracle/diag/crs//crs/trace/ohasd.trc
Oracle Database 12c Clusterware Release 12.1.0.2.0 - Production Copyright 1996, 2014 Oracle. All rights reserved.
DDE: Flood control is not active
2017-03-29 08:11:09.947455 :UiServer:121607936: {0:0:54} Sending to PE. ctx=0x7f66e8038030, ClientPID=31662
2017-03-29 08:11:09.947666 : CRSPE:123709184: {0:0:54} Cmd : 0x7f66e41e1c60 : flags: QUEUE_TAG
2017-03-29 08:11:09.947703 : CRSPE:123709184: {0:0:54} Processing PE command id=108 origin:ldwebi101. Description: [Register Type : : 0x7f66e41e1c60]
2017-03-29 08:11:09.948247 : CRSSEC:123709184: {0:0:54} Allow all users to register resources in the new engine.
CLSB:123709184: Oracle Clusterware infrastructure error in OHASD (OS PID 30985): Fatal signal 11 has occurred in program ohasd thread 123709184; nested signal count is 1
Incident 265 created, dump file:
/opt/oracle/diag/crs//crs/incident/incdir_265/ohasd_i265.trc
CRS-8503 [] [] [] [] [] [] [] [] [] [] [] []
Trace file /opt/oracle/diag/crs//crs/trace/ohasd.trc
Oracle Database 12c Clusterware Release 12.1.0.2.0 - Production Copyright 1996, 2014 Oracle. All rights reserved.

Review of the incident file generated from the OHASD crash ($DIAG_DEST/crs//crs/incident/incdir_/ohasd_i.trc) shows the signaling function in the crash to be "__lll_unlock_elision":

----- Incident Context Dump -----
Address: 0x7f67075d9ac0
Incident ID: 265
Problem Key: CRS 8503
Error: CRS-8503 [] [] [] [] [] [] [] [] [] [] [] []
[00]: dbgexProcessError [diag_dde]
[01]: dbgeExecuteForError [diag_dde]
[02]: dbgePostErrorDirect [diag_dde]
[03]: clsdAdrPostError []<-- Signaling
[04]: clsbSigErrCB []
[05]: skgesig_sigactionHandler []
[06]: __sighandler []
[07]: __lll_unlock_elision []
[08]: sltsmnr []
[09]: scls_iddb_is_a_privgrp_member_by_id []
[10]: _ZNK3CAA8Identity14belongsToGroupESs []
[11]: _ZN3CAA17PrimaryGroupEntry3hasERKNS_8IdentityERKSs []
[12]: _ZN3CAA3Acl3hasERKNS_8IdentityERKSs []
[13]: _ZN3CAA10Authorizer15checkPermissionERKNS_8IdentityERKSs []
[14]: _ZNK8cls_pe1210Authorizer15checkPermissionERKN3CAA8IdentityERNS1_10AuthorizerEj []
[15]: _ZNK8cls_pe1213ManagedObject17verifyPermissionsERKN3CAA8IdentityEj []
[16]: _ZN8cls_pe1229RegisterResourceTypeOperation17validateOperationEv []
[17]: _ZN8cls_pe1229RegisterResourceTypeOperation10initializeEv []
[18]: _ZN8cls_pe1229RegisterResourceTypeUiCommand15createOperationERKSsS2_ []
[19]: _ZN8cls_pe1229RegisterResourceTypeUiCommand19createIceOperationsEv []
[20]: _ZN8cls_pe127Command7processEv []
[21]: _ZN8cls_pe1218PolicyEngineModule15invokeUiCommandEPNS_7CommandE []
[22]: _ZN8cls_pe1218PolicyEngineModule23processUiCommandHandlerEPNS_14PeUiCmdMessageE []
[23]: _ZN3cls11ThreadModel12processQueueEP7sltstid []
[24]: _ZN3cls11ThreadModel5runTMEPv []
[25]: _ZN13CLS_Threading13CLSthreadMain8cppStartEPv []
[26]: start_thread []
MD [00]: 'Client ProcId'='ohasd.bin@ldwebi101.30985_140080482068224' (0x0)

Other symptoms of this issue have been seen in oracssdagent startup where will see a similar call stack show ($DIAG_DEST/crs//crs/incident/incdir_/ohasd_cssdagent_root_i.trc) shows the signaling function in the crash to be "__lll_unlock_elision":

Call Stack
-----------
Dump continued from file:
/u01/app/grid/diag/crs//crs/trace/ohasd_cssdagent_root.trc
[TOC00001]
CRS-8503 [] [] [] [] [] [] [] [] [] [] [] []
[TOC00001-END]
[TOC00002]
========= Dump for incident 9 (CRS 8503) ========
Starting a Diag Context default dump (level=3)

Problem Key: CRS 8503
Error: CRS-8503 [] [] [] [] [] [] [] [] [] [] [] []
[00]: dbgexProcessError [diag_dde]
[01]: dbgeExecuteForError [diag_dde]
[02]: dbgePostErrorDirect [diag_dde]
[03]: clsdAdrPostError []<-- Signaling
[04]: clsbSigErrCB []
[05]: skgesig_sigactionHandler []
[06]: __sighandler []
[07]: __lll_unlock_elision []
[08]: sltsmnr []
[09]: scls_iddb_is_a_privgrp_member_by_id []
[10]: scls_canexec []
[11]: scls_process_spawn []
[12]: clsncssd_cssdstart []
[13]: _ZN8cls_agfw3Cmd7executeEv []
[14]: _ZN8cls_agfw5CmdEx10executeCmdEPN3cls7MessageE []
[15]: _ZN8cls_agfw5CmdEx14clsRequestHdlrEPN3cls7MessageE []
[16]: _ZN3cls11ThreadModel12processQueueEP7sltstid []
[17]: _ZN3cls11ThreadModel5runTMEPv []
[18]: _ZN13CLS_Threading13CLSthreadMain8cppStartEPv []
[19]: start_thread []

CHANGES

New Installation of Oracle GI 12.1 and above on SuSE 12.

CAUSE

glibc in SuSE 12 makes use of a Hardware Lock Elision (HLE) available in newer Intel Processors. This exposes Bug 25851874 causing Clusterware Processes (OHASD, CSSDAGENT, etc) to crash on startup.

SOLUTION

The fix for Bug 25851874 MUST be applied prior to running root.sh as described in MOS Note: 1410202.1. Bug 25851874 will be fixed in an upcoming GI PSU for 12.1.0.2 and RU for 12.2.0.1.

If for some reason you are unable to apply the fix for Bug 25851874 prior to running root.sh, the following workaround may be implemented to allow the installation/upgrade to succeed:

1. Assuming root.sh has already failed, deconfigure the failed install as the ROOT user:

Note: Do NOT close out the OUI Window, we will need this to complete the installation after root.sh is successful.

# $GI_HOME/crs/install/roothas.pl -deconfig -force

2. Modify the /etc/ld.so.conf adding /lib64/noelision as the FIRST entry. It should look similar to the following:

/lib64/noelision
/usr/local/lib64
/usr/local/lib
include /etc/ld.so.conf.d/*.conf
# /lib64, /lib, /usr/lib64 and /usr/lib gets added
# automatically by ldconfig after parsing this file.
# So, they do not need to be listed.

3. Create a link in $GI_HOME/lib for the noelision version of the libpthread library:

# ln -s /lib64/noelision/libpthread-2.19.so $GI_HOME/lib/libpthread.so.0

4. Rerun the root.sh script and complete the installation via the OUI once root.sh has successfully completed.

"sqlplus / as sysdba" reports ORA-12547 on SUSE 12 (Doc ID 2297117.1)

In this Document

APPLIES TO:

Oracle Database - Enterprise Edition - Version 11.2.0.4 and later
Oracle Database Cloud Schema Service - Version N/A and later
Oracle Database Exadata Cloud Machine - Version N/A and later
Oracle Database Exadata Express Cloud Service - Version N/A and later
Oracle Cloud Infrastructure - Database Service - Version N/A and later
Linux x86-64

SYMPTOMS

On : 11.2.0.4 version, RDBMS, SUSE 12 platform:

1. "sqlplus / as sysdba" reports ORA-12547:

2. Strace shows below error:

15069 0.000019 munmap(0x7ff21311f000, 268435456) = 0 <0.000010>
15069 0.000024 --- SIGSEGV {si_signo=SIGSEGV, si_code=SI_KERNEL, si_addr=0} ---
15068 0.000293 <... read resumed> "", 64) = 0 <0.010658>
15069 0.000024 +++ killed by SIGSEGV +++
15068 0.000005 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_KILLED, si_pid=15069, si_uid=300, si_status=SIGSEGV, si_utime=0, si_stime=0} ---

3. ora-7445 [__lll_unlock_elision] raised if attempting to startup the database:

Alert log:

Process PMON died, see its trace file
USER (ospid: 3075): terminating the instance due to error 443
Instance terminated by USER, pid = 3075
Tue Aug 08 01:43:12 2017
Exception [type: SIGSEGV, SI_KERNEL(general_protection)] [ADDR:0x0] [PC:0x7F11DF21C490, __lll_unlock_elision()+48] [flags: 0x0, count: 1]

Trace File:

*** 2017-08-08 01:43:12.268
Exception [type: SIGSEGV, SI_KERNEL(general_protection)] [ADDR:0x0] [PC:0x7F11DF21C490, __lll_unlock_elision()+48] [flags: 0x0, count: 1]
DDE: Flood control is not active
========= Dump for critical error (no incident) (ORA 7445 [__lll_unlock_elision()+48]) ========
Registers:
%rax: 0x0000000000000000 %rbx: 0x0000000000048006 %rcx: 0x000000000000001e
%rdx: 0x000000000c1c2e88 %rdi: 0x000000000c1c2e88 %rsi: 0x0000000000000000
%rsp: 0x00007ffc48934788 %rbp: 0x00007ffc48934790 %r8: 0x0000000000000000
%r9: 0x00000000bfffffff %r10: 0x0000000008000000 %r11: 0x0000000000000202
%r12: 0x000000000c195340 %r13: 0x00007ffc48934a38 %r14: 0x0000000010000000
%r15: 0x00007f11de6015e8 %rip: 0x00007f11df21c490 ïl: 0x0000000000010246
__lll_unlock_elision()+29 (0x7f11df21c47d) add $0x80,%rsp
__lll_unlock_elision()+36 (0x7f11df21c484) xor êx,êx
__lll_unlock_elision()+38 (0x7f11df21c486) ret
__lll_unlock_elision()+39 (0x7f11df21c487) nopw 0x0(%rax,%rax)
> __lll_unlock_elision()+48 (0x7f11df21c490) lgdt %bp
__lll_unlock_elision()+51 (0x7f11df21c493) xor êx,êx
__lll_unlock_elision()+53 (0x7f11df21c495) ret
__lll_unlock_elision()+54 (0x7f11df21c496) cs: nopw 0x0(%rax,%rax)
__lll_unlock_elision()+64 (0x7f11df21c4a0) movzwl (%rsi),êx

----- Call Stack Trace -----

skdstdst <- ksedst1 <- ksedst <- dbkedDefDump <- ksedmp
<- ssexhd <- sighandler <- lll_unlock_elisio <- sltsimr <- sskgmdt
<- skgmdtmany <- skgmdetach0 <- skgmdetach <- ksmdsgi <- ksmdsg
<- ksuabt <- opistr_real <- opistr <- opiodr <- ttcpip
<- opitsk <- opiino <- opiodr <- opidrv <- sou2o
<- opimai_real <- ssthrdmain <- main <- libc_start_main <- start

4. "dbca" will also fail with above error.

CHANGES

This is a new installed 11.2.0.4 on SUSE 12.

CAUSE

glibc in SuSE 12 makes use of a Hardware Lock Elision (HLE) available in newer Intel Processors.
This can cause process crash on call stack "__lll_unlock_elision"

SOLUTION

1. Modify the "/etc/ld.so.conf" adding "/lib64/noelision" as the FIRST entry. It should look similar to the following:

/lib64/noelision
/usr/local/lib64
/usr/local/lib
include /etc/ld.so.conf.d/*.conf
# /lib64, /lib, /usr/lib64 and /usr/lib gets added
# automatically by ldconfig after parsing this file.
# So, they do not need to be listed.

2. Create a link in $ORACLE_HOME/lib for the noelision version of the libpthread library: (please replace with your own one)

su - oracle
ln -s /lib64/noelision/libpthread-.so $ORACLE_HOME/lib/libpthread.so.0

3. Restart the host and then re-logon oracle and see if the sqlplus works.

su - oracle
ldd $ORACLE_HOME/bin/sqlplus
ldd $ORACLE_HOME/bin/oracle
sqlplus / as sysdba

Note: The solution can also be applied on GRID/ASM home if ora-12547 reports on SUSE12 while connecting the ASM instance by sqlplus. please also refer to Note: 2253054.1 for more details.

REFERENCES

NOTE:2253054.1 - Oracle Grid Infrastructure Install Fails on SuSE 12 when Running root.sh

总之很心累，花了1天半的时间。

About Me

........................................................................................................................

● 本文作者：小麦苗，部分内容整理自网络，若有侵权请联系小麦苗删除

● 本文在itpub、博客园、CSDN和个人微 信公众号（
DB宝）上有同步更新

● 本文itpub地址：
http://blog.itpub.net/26736162 

● 本文博客园地址：
http://www.cnblogs.com/lhrbest 

● 本文CSDN地址：
https://blog.csdn.net/lihuarongaini 

● 本文pdf版、个人简介及小麦苗云盘地址：
http://blog.itpub.net/26736162/viewspace-1624453/ 

● 数据库笔试面试题库及解答：
http://blog.itpub.net/26736162/viewspace-2134706/ 

● DBA宝典今日头条号地址：
http://www.toutiao.com/c/user/6401772890/#mid=1564638659405826 

........................................................................................................................

● QQ群号：
230161599
、618766405

● 微 信群：可加我微 信，我拉大家进群，非诚勿扰

● 联系我请加QQ好友
（
646634621
），注明添加缘由

● 于 2020-05-01 06:00 ~ 2020-05-30 24:00 在西安完成

● 最新修改时间：2020-05-01 06:00 ~ 2020-05-30 24:00

● 文章内容来源于小麦苗的学习笔记，部分整理自网络，若有侵权或不当之处还请谅解

● 版权所有，欢迎分享本文，转载请保留出处

........................................................................................................................

● 
小麦苗的微店：
https://weidian.com/s/793741433?wfr=c&ifr=shopdetail

● 
小麦苗出版的数据库类丛书：
http://blog.itpub.net/26736162/viewspace-2142121/

● 
小麦苗OCP、OCM、高可用网络班：
http://blog.itpub.net/26736162/viewspace-2148098/

● 
小麦苗腾讯课堂主页：
https://lhr.ke.qq.com/

........................................................................................................................
使用

微 信客户端扫描下面的二维码来关注小麦苗的微 信公众号（
DB宝）及QQ群（DBA宝典）、添加小麦苗微 信，
学习最实用的数据库技术。 


........................................................................................................................