针对RAC中一些启动故障,常常是与GPNP相关的问题,在这篇文章中,我将解释什么是GPnP配置文件以及它如何被群集件使用,并且通过小小实验来进行常说的修改GPNP文件。
什么是GPNP?
GPNP配置文件是位于GRID_HOME / gpnp / <主机名> / profiles / peer中的一个很小XML文件profile.xml。 它被用来建立一个节点的正确的全局属性。 每个节点都会维护GPNP配置文件的本地副本,并由GPnP Deamon(GPnPD)维护。
GPNP文件包含什么?
GPnP Profile用于存储启动Oracle Clusterware所需的必要信息,如SPFILE位置,ASM DiskString等。
- 集群名称
- 网络分类(Public/Private)
- 用于ASM的存储:SPFILE位置,ASM DiskString等
- 数字签名信息
什么会更新GPNP文件?
GPNPD守
护进程在以下期
间会同步配置文件的更改,比如 安装或更新
软件
时。
每当使用以下工具对群集进行更改时会更新GPNP文件
• oifcfg(更改网络),
• crsctl(更改voting disk的位置),
• asmcmd(更改ASM_DISKSTRING,SPFILE位置)等d
集群软件如何使用GPNP文件?
在11g R2 RAC中,仲裁盘存储在ASM磁盘组上。但是在ASM 挂着之前,CSSD需要仲裁文件。在其启动时,CSSD将扫描gpnp profile xml文件中指定的所有设备的设备头,并标记出“DiscoveryString”,其中包含asm_diskstring参数的值。
启动群集件时访问仲裁盘,如果仲裁盘在ASM上,则从GPNP配置文件(
接下来,群集件会检查所有节点是否具有更新的GPNP配置文件,并根据GPNP配置加入群集。当节点启动/添加到群集时,启动节点上的群集件软件会启动GPNP AGENT, GPNP AGENT 会进行以下动作:
•如果已经是群集的一部分,GPNP AGENT将读取该节点上的现有配置文件。
• 如果将节点添加到群集,GPnP代理使用多点传送协议(由MDNS提供)在另一个现有节点上找到AGENT,并从该代理获取配置文件。
接下来,CRSD需要读取OCR来启动节点上的各种资源。如果OCR 在ASM 上,便需要先找出ASM SPfile的位置,此时会优先搜索GPNP profile。
当从GPNP profile中找到位置时,是从ASM磁盘头中获取相关信息,一下模拟这一过程。
在这里,以下显示包含spfile 文件所在的磁盘(spfflg 不为空)
[grid@rac1]$kfed read /dev/asm-diskb | grep -E 'spf|ausize' kfdhdb.ausize: 1048576 ; 0x0bc: 0x00100000 <<<<<1M kfdhdb.spfile: 437 ; 0x0f4: 0x000001b5 <<< 从上面可以可以看出:设备 /dev/sdb3 包含了一个 ASM spfile (spfflg=1).这个ASM spfile 位于磁盘偏移量437位置 (spfile=16)根据分配单元大小(kfdhdb.ausize = 1M ),可以从设备转储ASM spfile :[grid@rac1]$dd if=/dev/asm-diskb of=/tmp/spfileASM_Copy.ora skip=437 bs=1M count=1 1+0 records in 1+0 records out 1048576 bytes (1.0 MB) copied, 0.406486 s, 2.6 MB/s
修改GPNP文件的一个小 实验在这里,我修改了ASM_DISKSTRING的值,使得包含仲裁盘的ASM磁盘不包含在内,并试图在节点上重新启动crs。查看CSSD日志文件,我发现CSSD无法识别其投票文件。然后只能在不运行ASM的情况下修改ASM disk_string参数,并且没有可用的CSSD,此时使用gpnptool来编辑gpnp配置文件:1 .检查仲裁盘位置2.找其他磁盘组的所有磁盘3.设置ASM_DISKTRING 的值为DATA/FLASH/TEST磁盘组下所有的盘:
4.检查gpnp profile 中discovery string指定的磁盘,此时不包含voting disk。DiscoveryString="/dev/asm-diskc,/dev/asm-diskd,/dev/asm-diske"5.重启CRScrsctl stop crscrsctl start crs6.检查到has已经启动,但其他服务无法启动[root@rac1 peer]# crsctl check has CRS-4638: Oracle High Availability Services is online [root@rac1 peer]# crsctl check crs CRS-4638: Oracle High Availability Services is online CRS-4535: Cannot communicate with Cluster Ready Services CRS-4530: Communications failure contacting Cluster Synchronization Services daemon CRS-4534: Cannot communicate with Event Manager [root@rac1 peer]# crsctl check css CRS-4530: Communications failure contacting Cluster Synchronization Services daemon [root@rac1 peer]#
7.检查ocssd.log 日志扫描discovery string一部分盘但未找到仲裁盘的。
2018-02-28 09:24:01.804: [ CSSD][607938304]clssnmvDiskVerify: Successful discovery of 0 disks 2018-02-28 09:24:01.804: [ CSSD][607938304]clssnmCompleteInitVFDiscovery: Completing initial voting file discovery 2018-02-28 09:24:01.804: [ CSSD][607938304]clssnmvFindInitialConfigs: No voting files found 2018-02-28 09:24:01.804: [ CSSD][607938304](:CSSNM00070:)clssnmCompleteInitVFDiscovery: Voting file not found. Retrying discovery in 15 seconds 2018-02-28 09:24:01.844: [ CSSD][610629376]clssscSelect: cookie accept request 0x1747880 2018-02-28 09:24:01.844: [ CSSD][610629376]clssgmAllocProc: (0x7f6e10036470) allocated 2018-02-28 09:24:01.845: [ CSSD][610629376]clssgmClientConnectMsg: properties of cmProc 0x7f6e10036470 - 1,2,3,4,5 2018-02-28 09:24:01.845: [ CSSD][610629376]clssgmClientConnectMsg: Connect from con(0xdeb) proc(0x7f6e10036470) pid(6857) version 11:2:1:4, properties: 1,2,3,4,5 2018-02-28 09:24:01.845: [ CSSD][610629376]clssgmClientConnectMsg: msg flags 0x0000 2018-02-28 09:24:02.364: [ CSSD][610629376]clssgmExecuteClientRequest(): type(37) size(80) only connect and exit messages are allowed before lease acquisition proc(0x7f6e10036470) client((nil)) 2018-02-28 09:24:02.366: [ CSSD][610629376]clssgmDeadProc: proc 0x7f6e10036470 2018-02-28 09:24:02.366: [ CSSD][610629376]clssgmDestroyProc: cleaning up proc(0x7f6e10036470) con(0xdeb) skgpid ospid 6857 with 0 clients, refcount 0 2018-02-28 09:24:02.366: [ CSSD][610629376]clssgmDiscEndpcl: gipcDestroy 0xdeb 2018-02-28 09:24:06.840: [ CSSD][610629376]clssscSelect: cookie accept request 0x1747880 2018-02-28 09:24:06.840: [ CSSD][610629376]clssgmAllocProc: (0x7f6e1001d630) allocated 2018-02-28 09:24:06.843: [ CSSD][610629376]clssgmClientConnectMsg: properties of cmProc 0x7f6e1001d630 - 1,2,3,4,5 2018-02-28 09:24:06.843: [ CSSD][610629376]clssgmClientConnectMsg: Connect from con(0xe4f) proc(0x7f6e1001d630) pid(6857) version 11:2:1:4, properties: 1,2,3,4,5 2018-02-28 09:24:06.843: [ CSSD][610629376]clssgmClientConnectMsg: msg flags 0x0000 2018-02-28 09:24:07.375: [ CSSD][610629376]clssgmExecuteClientRequest(): type(37) size(80) only connect and exit messages are allowed before lease acquisition proc(0x7f6e1001d630) client((nil)) 2018-02-28 09:24:07.380: [ CSSD][610629376]clssgmDeadProc: proc 0x7f6e1001d630 2018-02-28 09:24:07.380: [ CSSD][610629376]clssgmDestroyProc: cleaning up proc(0x7f6e1001d630) con(0xe4f) skgpid ospid 6857 with 0 clients, refcount 0 2018-02-28 09:24:07.380: [ CSSD][610629376]clssgmDiscEndpcl: gipcDestroy 0xe4f 2018-02-28 09:24:11.849: [ CSSD][610629376]clssscSelect: cookie accept request 0x1747880 2018-02-28 09:24:11.849: [ CSSD][610629376]clssgmAllocProc: (0x7f6e1005a090) allocated 2018-02-28 09:24:11.851: [ CSSD][610629376]clssgmClientConnectMsg: properties of cmProc 0x7f6e1005a090 - 1,2,3,4,5 2018-02-28 09:24:11.851: [ CSSD][610629376]clssgmClientConnectMsg: Connect from con(0xeb3) proc(0x7f6e1005a090) pid(6857) version 11:2:1:4, properties: 1,2,3,4,5 2018-02-28 09:24:11.851: [ CSSD][610629376]clssgmClientConnectMsg: msg flags 0x0000 2018-02-28 09:24:12.392: [ CSSD][610629376]clssgmExecuteClientRequest(): type(37) size(80) only connect and exit messages are allowed before lease acquisition proc(0x7f6e1005a090) client((nil)) 2018-02-28 09:24:12.396: [ CSSD][610629376]clssgmDeadProc: proc 0x7f6e1005a090 2018-02-28 09:24:12.396: [ CSSD][610629376]clssgmDestroyProc: cleaning up proc(0x7f6e1005a090) con(0xeb3) skgpid ospid 6857 with 0 clients, refcount 0 2018-02-28 09:24:12.396: [ CSSD][610629376]clssgmDiscEndpcl: gipcDestroy 0xeb3 2018-02-28 09:24:16.807: [ GPNP][607938304]clsgpnp_profileCallUrlInt: [at clsgpnp.c:2104] get-profile call to url "ipc://GPNPD_rac1" disco "" [f=0 claimed- host: cname: seq: auth:] 2018-02-28 09:24:16.833: [ GPNP][607938304]clsgpnp_profileCallUrlInt: [at clsgpnp.c:2234] Result: (0) CLSGPNP_OK. Successful get-profile CALL to remote "ipc://GPNPD_rac1" disco "" 2018-02-28 09:24:16.833: [ CSSD][607938304]clssscGetParameterProfile: buffer passed for parameter ASM discovery (3) is too short, required 45, passed 20 2018-02-28 09:24:16.833: [ CSSD][607938304]clssnmReadDiscoveryProfile: voting file discovery string(/dev/asm-diskc,/dev/asm-diskd,/dev/asm-diske) 2018-02-28 09:24:16.833: [ CSSD][607938304]clssnmvDDiscThread: using discovery string /dev/asm-diskc,/dev/asm-diskd,/dev/asm-diske for initial discovery 2018-02-28 09:24:16.833: [ SKGFD][607938304]Discovery with str:/dev/asm-diskc,/dev/asm-diskd,/dev/asm-diske: 2018-02-28 09:24:16.834: [ SKGFD][607938304]UFS discovery with :/dev/asm-diskc: 2018-02-28 09:24:16.838: [ SKGFD][607938304]Fetching UFS disk :/dev/asm-diskc: 2018-02-28 09:24:16.838: [ SKGFD][607938304]OSS discovery with :/dev/asm-diskc: 2018-02-28 09:24:16.839: [ SKGFD][607938304]Discovery advancing to nxt string :/dev/asm-diskd: 2018-02-28 09:24:16.839: [ SKGFD][607938304]UFS discovery with :/dev/asm-diskd: 2018-02-28 09:24:16.843: [ SKGFD][607938304]Fetching UFS disk :/dev/asm-diskd: 2018-02-28 09:24:16.843: [ SKGFD][607938304]OSS discovery with :/dev/asm-diskd: 2018-02-28 09:24:16.843: [ SKGFD][607938304]Discovery advancing to nxt string :/dev/asm-diske: 2018-02-28 09:24:16.843: [ SKGFD][607938304]UFS discovery with :/dev/asm-diske: 2018-02-28 09:24:16.847: [ SKGFD][607938304]Fetching UFS disk :/dev/asm-diske: 2018-02-28 09:24:16.847: [ SKGFD][607938304]OSS discovery with :/dev/asm-diske: 2018-02-28 09:24:16.847: [ SKGFD][607938304]Handle 0x7f6e14135840 from lib :UFS:: for disk :/dev/asm-diskc: 2018-02-28 09:24:16.848: [ SKGFD][607938304]Handle 0x7f6e14132060 from lib :UFS:: for disk :/dev/asm-diskd: 2018-02-28 09:24:16.849: [ SKGFD][607938304]Handle 0x7f6e14136d00 from lib :UFS:: for disk :/dev/asm-diske: 2018-02-28 09:24:16.850: [ SKGFD][607938304]Lib :UFS:: closing handle 0x7f6e14135840 for disk :/dev/asm-diskc: 2018-02-28 09:24:16.852: [ SKGFD][607938304]Lib :UFS:: closing handle 0x7f6e14132060 for disk :/dev/asm-diskd: 2018-02-28 09:24:16.854: [ SKGFD][607938304]Lib :UFS:: closing handle 0x7f6e14136d00 for disk :/dev/asm-diske:
8.复制一个GPNP PROFILE 文件[root@rac1 peer]# pwd /u01/app/11.2.0/grid/gpnp/rac1/profiles/peer [root@rac1 peer]# cp profile.xml profile.bak
9. 删除文件中的 oracle signaturegpnptool unsign -p=profile.bak10.修改 DiscoveryString 的值gpnptool edit -asm_dis='/dev/asm*' -p=profile.bak -o=profile.bak –ovr11.重新签名profile xml文件gpnptool sign -p=profile.bak -w=file:/u01/app/11.2.0/grid/gpnp/rac1/wallets/peer/ -o=profile.new12. move 掉原来的profile.xml .13.查看 discovery string 参数已经调整[root@host01 peer]# vi profile.xmlDiscoveryString="/dev/asm*"
14. 重启 crs可以杀掉所有进程或使用/u01/app/11.2.0/grid/bin/crsctl stop crs -fcrsctl start crs