RAC Assurance Support Team: RAC and Oracle Clusterware Starter Kit and Best Practices (Generic) [ID

RAC Assurance Support Team: RAC and Oracle Clusterware Starter Kit and Best Practices (Generic) [ID 810394.1]

Modified 21-JUL-2011 Type BULLETIN Status PUBLISHED

In this Document
Purpose
Scope and Application
RAC Assurance Support Team: RAC and Oracle Clusterware Starter Kit and Best Practices (Generic)
RAC Platform Specific Starter Kits and Best Practices
RAC Platform Generic Load Testing and System Test Plan Outline

RAC Platform Generic Highlighted Recommendations
RAC Platform Generic Best Practices
Getting Started - Preinstallation and Design Considerations
Clusterware Considerations
Networking Considerations
Storage Considerations
Installation Considerations
Patching Considerations
Upgrade Considerations
Oracle VM Considerations
Database Initialization Parameter Considerations
Performance Tuning Considerations
General Configuration Considerations
E-Business Suite (with RAC) Considerations
Peoplesoft (with RAC) Considerations
Tools/Utilities for Diagnosing and Working with Oracle Support
11gR2 Specific Considerations
11.2.0.2 Specific Considerations
CRS / RAC Related References
RAC / RDBMS Related References
VIP References
ASM References
11.2 References
Infiniband References
MAA / Standby References
Patching References
Upgrade References
E-Business References
Unix References
Weblogic/RAC References
References Related to Working with Oracle Support
Modification History


Applies to:

Oracle Server - Enterprise Edition - Version: 10.2.0.1 to 11.2.0.1.0 - Release: 10.2 to 11.2
Oracle Server - Enterprise Edition - Version: 10.2.0.1 to 11.1.0.7 [Release: 10.2 to 11.1]
Information in this document applies to any platform.

Purpose

The goal of the Oracle Real Application Clusters (RAC) Starter Kit is to provide you with the latest information on generic and platform specific best practices for implementing an Oracle RAC cluster. This document is compiled and maintained based on Oracle's experience with its global RAC customer base.

This Starter Kit is not meant to replace or supplant the Oracle Documentation set, but rather, it is meant as a supplement to the same. It is imperative that the Oracle Documentation be read, understood, and referenced to provide answers to any questions that may not be clearly addressed by this Starter Kit.

All recommendations should be carefully reviewed by your own operations group and should only be implemented if the potential gain as measured against the associated risk warrants implementation. Risk assessments can only be made with a detailed knowledge of the system, application, and business environment.

As every customer environment is unique, the success of any Oracle Database implementation, including implementations of Oracle RAC, is predicated on a successful test environment. It is thus imperative that any recommendations from this Starter Kit are thoroughly tested and validated using a testing environment that is a replica of the target production environment before being implemented in the production environment to ensure that there is no negative impact associated with the recommendations that are made.

Scope and Application

This article is intended for use by all new (and existing) Oracle RAC implementers.

RAC Assurance Support Team: RAC and Oracle Clusterware Starter Kit and Best Practices (Generic)

RAC Platform Specific Starter Kits and Best Practices

While this note focuses on platform generic RAC Best Practices, the following notes contain detailed platform specific best practices including Step-By-Step installation cookbooks.

Document 811306.1 RAC Assurance Support Team: RAC Starter Kit and Best Practices (Linux)
Document 811280.1 RAC Assurance Support Team: RAC Starter Kit and Best Practices (Solaris)
Document 811271.1 RAC Assurance Support Team: RAC Starter Kit and Best Practices (Windows)
Document 811293.1 RAC Assurance Support Team: RAC Starter Kit and Best Practices (AIX)
Document 811303.1 RAC Assurance Support Team: RAC Starter Kit and Best Practices (HP-UX)

RAC Platform Generic Load Testing and System Test Plan Outline

A critical component of any successful implementation, particularly in the High Availability arena, is testing. For a RAC environment, testing should include both load generation, to monitor and measure how the system works under heavy load, and a system test plan, to understand how the system reacts to certain types of failures. To assist with this type of testing, this document contains links to documents to get you started in both of these areas.

Click here for a White Paper on available RAC System Load Testing Tools

Click here for a platform generic RAC System Test Plan Outline for 10gR2 and 11gR1

Click here for a platform generic RAC System Test Plan Outline for 11gR2

Use these documents to validate your system setup and configuration, and also as a means to practice responses and establish procedures in case of certain types of failures.

RAC Platform Generic Highlighted Recommendations

Highlighted Recommendations are recommendations that are thought to have the greatest impact, or answer most commonly addressed questions or issues. In this case, Generic Highlighted Recommendations talk about commonly asked or encountered issues that are generic to RAC implementations across all platforms.
  • Having a step-by-step plan for your RAC project implementation is invaluable. The following OTN article contains a sample project outline: http://www.oracle.com/technetwork/articles/haskins-rac-project-guide-099429.html
  • To simplify the stack and simplify vendor interactions, Oracle recommends avoiding 3rd party clusterware, unless absolutely necessary.
  • Automatic Storage Management (ASM) is recommended for datafile storage. This link references the ASM Overview and Technical Best Practices White Paper. Reference: http://www.oracle.com/technetwork/database/asm-10gr2-bestpractices.pdf
  • The RAC Assurance Team recommends placement of Oracle Homes on local drives whenever possible. The following white paper contains an analysis of the pros and cons of shared versus local Oracle Homes: http://www.oracle.com/technetwork/database/clustering/overview/oh-rac-133684.pdf
  • Having a system test plan to help plan for and practice unplanned outages is crucial. The following paper discusses Best Practices for Optimizing Availability During Unplanned Outages Using Oracle Clusterware and Oracle Real Application Clusters: http://www.oracle.com/technetwork/database/features/availability/maa-wp-10gr2-fastrecoveryoracleclus-130899.pdf
    In addition, this note has an attached sample System Test Plan Outline, to guide your system testing to help prepare for potential unplanned failures.
  • Develop a proactive patching strategy, to stay ahead of the latest known issues. Keep current with the latest Patch Set Updates (as documented in Document 850471.1) and be aware of the most current recommended patches (as documented in Document 756671.1). Plan for periodic (for example: quarterly) maintenance windows to keep current with the latest recommended patch (set) updates and patches.
  • Understanding how to minimize downtime while patching is a key piece of this strategy. The following paper discusses patching strategies geared towards minimizing downtime in a RAC/Clusterware environment: http://www.oracle.com/technetwork/database/asm.pdf
  • For all Unix platforms running Oracle version 11.1.0.6 or 11.1.0.7: Take note / implement the solution explained in Document 858279.1.
  • When patching, be sure to use the latest version of OPATCH. Available for download from My Oracle Support under Patch 6880880.

RAC Platform Generic Best Practices

Beyond the Highlighted Recommendations above, the RAC Assurance Team has recommendations for various different parts/components of your RAC setup. These additional recommendations are broken into categories and listed below.

Getting Started - Preinstallation and Design Considerations

  • Check with the Disk Vendor that the Number of Nodes, OS version, RAC version, CRS version, Network fabric, and Patches are certified, as some Storage/San vendors may require special certification for a certain number of nodes.
  • Check the support matrix to ensure supportability of product, version and platform combinations or for understanding any specific steps which need to be completed which are extra in the case of some such combinations. Document 337737.1
  • Avoid SSH and XAUTH warning before RAC 10G installation. Reference Document 285070.1
  • Consider configuring the system logger to log messages to one central server.
  • For CRS, ASM, and Oracle ensure one unique User ID with a single name, is in use across the cluster. Problems can occur accessing OCR keys when multiple O/S users share the same UID. Also this results in logical corruptions and permission problems which are hard to diagnose.
  • Make sure machine clocks are synchronized on all nodes to the same NTP source.
    Implementing NTP (Network Time Protocol) on all nodes prevents evictions and helps to facilitate problem diagnosis. Use the -x option (ie. ntpd -x, xntp -x) if available. Running ntp or ntpd with the '-x' flag (ie. ntpd -x, xntp -x) will call for gradual time changes (known as slewing). Large time changes (which are the default if the -x flag is not used) could result in (what could be) unnecessary node evictions Document 759143.1
  • Eliminate any single points of failure in the architecture. Examples include (but are not limited to): Cluster interconnect redundancy (NIC bonding etc), multiple access paths to storage, using 2 or more HBA's or initiators and multipathing software, and Disk mirroring/RAID
  • Plan and document capacity requirements. Work with server vendor to produce detailed capacity plan and system configuration, but consider: Use normal capacity planning process to estimate number of CPUs required to run workload. Both SMP and RAC clusters have synchronization costs as the number of CPUs increase. SMPs normally scale well for small number of CPUs, RAC clusters normally scale better than SMPs for large number of CPUs. Typical synchronization cost: 5-20%
  • Use proven high availability strategies. RAC is one component in a high availability architecture. Make sure all parts are covered. Review Oracle's Maximimum Availability Architecture recommendations and references further down in this document.
  • It is strongly advised that a production RAC instance does not share a node with a DEV, TEST, QA or TRAINING instance. These extra instances can often introduce unexpected performance changes into a production environment.
  • Configure Servers to boot from SAN disk, rather than local disk for easier repair, quick provisioning and consistency.
  • For Oracle 10g and 11gR1 it is recommended to utilize Oracle redundancy for the OCR and Voting Disks. These files should be stored on RAW or block devices (depending on the OS and Oracle Version). Voting Disks should always be created in odd numbers (1,3,5,etc). This is because losing 1/2 or more of all of your voting disks will cause nodes to get evicted from the cluster, or nodes to evict themselves out of the cluster. Document 428681.1 explains how to add OCR mirror and how to add additional voting disks.
  • For Oracle 11gR2 it is a best practice to store the OCR and Voting Disk within ASM and to maintain the ASM best practice of having no more than 2 diskgroups (Flash Recovery Area and Database Area). This means that the OCR and Voting disk will be stored along with the database related files. If you are utilizing external redundancy for your disk groups this means you will have 1 Voting Disk and 1 OCR.

    For those who wish to utilize Oracle supplied redundancy for the OCR and Voting disks one could create a separate (3rd) ASM Diskgroup having a minimum of 2 fail groups (total of 3 disks). This configuration will provide 3 Voting Disks and a single OCR which takes on the redundancy of that disk group (mirrored within ASM). The minimum size of the 3 disks that make up this normal redundancy diskgroup is 1GB.
  • If you are planning to use T-Series servers in a RAC environment, review Document 1181315.1 'Important Considerations for Operating Oracle RAC on T-Series Servers'

Clusterware Considerations

  • For versions prior to 11gR2, configure three or more voting disks (always an odd number). This is because losing 1/2 or more of all of your voting disks will cause nodes to get evicted from the cluster, or nodes to evict themselves out of the cluster.

Networking Considerations

  • Underscores should not be used in a host or domainname according to RFC952 - DoD Internet host table specification. The same applies for Net, Host, Gateway, or Domain name. Reference: http://www.faqs.org/rfcs/rfc952.html
  • Ensure the default gateway is on the same subnet as the VIP. Otherwise this can cause problems with racgvip and cause the vip and listener to keep restarting.
  • Make sure network interfaces have the same name on all nodes. This is required. To check - use ifconfig (on Unix) or ipconfig (on Windows).
  • Use Jumbo Frames if supported and possible in the system. Reference: Document 341788.1
  • Use non-routable network addresses for private interconnect; Class A: 10.0.0.0 to 10.255.255.255, Class B: 172.16.0.0 to 172.31.255.255, Class C: 192.168.0.0 to 192.168.255.255. Reference: http://www.faqs.org/rfcs/rfc1918.html and Document 338924.1
  • Make sure network interfaces are configured correctly in terms of speed, duplex, etc. Various tools exist to monitor and test network: ethtool, iperf, netperf, spray and tcp. Document 563566.1
  • Configure nics for fault tolerance (bonding/link aggregation). Document 787420.1.
  • Performance: check for faulty switches, bad hba's or ports which drop packets. Most cases we see with network related evictions is when either there is too much traffic on the interconnect (so the interconnect capacity is exhausted which is where aggregation or some other hardware solution helps) or the switch, network card is not configured properly and this is evident from the "netstat -s | grep udp" settings (if using UDP protocol for IPC for RAC) where this will register underflows (buffer size configuration for UDP) or errors due to bad ports, switches, network card, network card settings. Please review the same in the context of errors reported from packets sent through the interface.
  • For more predictable hardware discovery, place hba and nic cards in the same corresponding slot on each server in the Grid.
  • Ensure that all network cables are terminated in a grounded socket. A switch is required for the private network. Use dedicated redundant switches for private interconnect and VLAN considerations. RAC and Clusterware deployment best practices recommend that the interconnection be deployed on a stand-alone, physically separate, dedicated switch.
  • Deploying the RAC/Clusterware interconnect on a shared switch, segmented VLAN may expose the interconnect links to congestion and instability in the larger IP network topology. If deploying the interconnect on a VLAN, there should be a 1:1 mapping of VLAN to non-routable subnet and the VLAN should not span multiple VLANs (tagged) or multiple switches. Deployment concerns in this environment include Spanning Tree loops when the larger IP network topology changes, Asymmetric routing that may cause packet flooding, and lack of fine grained monitoring of the VLAN/port. Reference Bug 9761210.
  • Consider using Infiniband on the interconnect for workloads that have high volume requirements. Infiniband can also improve performance by lowering latency, particularly with Oracle 11g, with the RDS protocol. See Document 751343.1.
  • Configure IPC address first in listener.ora address list. For databases upgraded from earlier versions to 10gR2 the netca did not configure the IPC address first in the listener.ora file. In 10gR2 this is the default but if you upgrade this isn't changed unless you do it manually. Failure to do so can adversely impact the amount of time it takes the VIP to fail over if the public network interface should fail. Therefore, check the 10gR1 and 10gR2 listener.ora file. Not only should the IPC address be contained in the address list but it should be FIRST. Document 403743.1
  • Increase the SDU (and in older versions the TDU as well) to a higher value (e.g. 4KB 8KB, up to 32KB), thus reducing round trips on the network, possibly decreasing response time and over all perceived user responsiveness of the system. Document 44694.1
  • To avoid ORA-12545 errors, ensure that client HOSTS files and/or DNS are furnished with both VIP and Public hostnames.
  • Starting with Oracle 10g the TNSListener is secure out of the box. The 10g listener uses local OS authentication. If you want to allow a secondary user to administer the listener you have to set a listener password as described in Document 260986.1.
  • Please note that IPv6 addressing is currently not yet supported with RAC. For more information, reference: http://stcontent.oracle.com/content/dav/oracle/Users/Users-K/kant.patel/IPv6/OracleDatabase_IPv6_SOD.pdf
  • Network Interface Card (NIC) names must not contain " . "
  • For version 11.2.0.2 multicast traffic must be allowed on the private network for the 230.0.1.0 subnet. Reference: Document 1212703.1.

Storage Considerations

  • Ensure Correct Mount Options for NFS Disks when RAC is used with NFS.The documented mount options are detailed in Document 359515.1 for each platform.
  • Implement multiple access paths to storage array using two or more HBAs or initiators with multi-pathing software over these HBAs. Where possible, use the pseudo devices (multi-path I/O) as the diskstring for ASM. Examples are: EMC PowerPath, Veritas DMP, Sun Traffic Manager, Hitachi HDLM, IBM SDDPC, Linux 2.6 Device Mapper. This is useful for I/O loadbalancing and failover. Reference: Document 294869.1 and Document 394956.1. See also our Multipathing Best Practices paper: http://www.oracle.com/technetwork/database/asm.pdf
  • Adhere to ASM best practices. Reference: Document 265633.1 ASM Technical Best Practices
  • ORA-15196 (ASM block corruption) can occur, if LUNs larger than 2TB are presented to an ASM diskgroup. As a result of the fix, ORA-15099 will be raised if a disk larger than 2TB is specified. This is irrespective of the presence of asmlib. Workaround: Do not add more than 2 TB size disk to a diskgroup. Reference: Document 6453944.8
  • On some platforms repeat warnings about AIO limits may be seen in the alert log:
    "WARNING:Oracle process running out of OS kernel I/O resources." Apply Patch 6687381, available on many platforms. This issue affects 10.2.0.3, 10.2.0.4, and 11.1.0.6. It is fixed in 11.1.0.7. Document 6687381.8
  • Create two ASM disk groups, one for database area and one for flash recovery area, on separate physical disks. RAID storage array LUNs can be used as ASM disks to minimize the number of LUNs presented to the OS . Place database and redo log files in database area.
  • The occurrence of Bug 5100163 (possible metadata corruption) has being identified during an ASM upgrade from release 10.2 to release 11.1 or 11.2, this bug could only occur having ASM diskgroups with an AU > 1 MB (before the ASM upgrade is performed). This bug is not encountered with brand new diskgroups created directly on release 11.1 or 11.2.

    In order to prevent any occurrence of Bug 5100163, a public alert has been generated and it is visible through My Oracle Support. Reference: Document 1145365.1 Alert: Querying v$asm_file Gives ORA-15196 After ASM Was Upgraded From 10gR2 To 11gR2. In short, you would want to run an "alter diskgroup check all repair" to validate and repair any upgraded diskgroups.

Installation Considerations

  • Check Cluster Prequisites Using cluvfy (Cluster Verification Utility). Use cluvfy at all stages prior to and during installation of Oracle software. Also, rather than using the version on the installation media, it is crucial to download the latest version of cluvfy OTN: http://www.oracle.com/technetwork/database/clustering/downloads/cvu-download-homepage-099973.html. Document 339939.1 and Document 316817.1 contain more relevant information on this topic.
  • It is recommended to patch the Clusterware Home to the desired level before doing any RDBMS or ASM home install.
    For example, install clusterware 10.2.0.1 and patch to 10.2.0.4 before installing 10.2.0.1 RDBMS.
  • Install ASM in a separate ORACLE_HOME from the database for maintenance and availability reasons (eg., to independently patch and upgrade).
  • If you are installing Oracle Clusterware as a user that is a member of multiple operating system groups, the installer installs files on all nodes of the cluster with group ownership set to that of the user's current active or primary group. Therefore: ensure that the first group listed in the file /etc/ group is the current active group OR invoke the Oracle Clusterware installation using the following additional command line option, to force the installer to use the proper group when setting group ownership on all files: runInstaller s_usergroup=current_active_group (Bug 4433140)

Patching Considerations

This section is targeted towards customers beginning a new implementation of Oracle Real Application Clusters, or customers who are developing a proactive patching strategy for an existing implementation. For new implementations, it is strongly recommended that the latest available patchset for your platform be applied at the outset of your testing. In cases where that latest version of the RDBMS cannot be used because of lags in internal or 3rd party application certification or due to other limitations, it is still supported to have the CRS Home and ASM Homes running at a later patch level than the RDBMS Home, therefore, it may still be possible to run either the CRS or ASM Home at the latest patchset level. As a best practice (with some exceptions, see the Note in the references section below), Oracle Support recommends that the following be true:
  • The CRS_HOME must be at a patch level or version that is greater than or equal to the patch level or version of the ASM Home. The CRS_HOME must be a patch level or version that is greater than or equal to the patch level or version of the RDBMS home.
  • The ASM_HOME must be at a patch level or version that is greater than or equal to the patch level or version of the RDBMS Home. The ASM_HOME must be a patch level or version that is equal to but not greater than the patch level or version of the CRS_HOME.
  • Before patching the database, ASM or clusterware homes using opatch check the available space on the filesystem and use Document 550522.1 in order to estimate how much space will be needed and how to handle the situation if the filesystem should fill up during the patching process.
  • Document 557934.1 provides a basic overview of patching Oracle Clusterware and clarifies how the Oracle Clusterware components are updated through patching
  • Develop a proactive patching strategy, to stay ahead of the latest known issues. Keep current with the latest Patch Set Updates (as documented in Document 850471.1) and be aware of the most current recommended patches (as documented in Document 756671.1). Plan for periodic (for example: quarterly) maintenance windows to keep current with the latest recommended patch (set) updates and patches.

    For more detailed notes and references on patching in a RAC environment, see the patching section below, in the "RAC Platform Generic References" section at the end of this note.

Upgrade Considerations

Note: A new prerequisite check has been added to ensure that Oracle Clusterware release 10.2.0.x is at release 10.2.0.3 (or higher), before you attempt to upgrade it to Oracle Clusterware 11g release 1 (11.1). If this check fails, then you are instructed to apply Oracle Clusterware patch set release 10.2.0.3.0 or later to your existing release 10.2.0.1 or 10.2.0.2 before it can be upgraded. All other upgrade paths and fresh install cycles are unaffected by this prerequisite check.

  • Use rolling upgrades where appropriate for Oracle Clusterware (CRS) Document 338706.1. For detailed upgrade assistance, refer to the appropriate Upgrade Companion for your release: Document 466181.1 10g Upgrade Companion and Document 601807.1 Oracle 11gR1 Upgrade Companion
  • For information about upgrading a database using a transient logical standby, refer to: Document 949322.1 : Oracle11g Data Guard: Database Rolling Upgrade Shell Script
  • If upgrading ASM to 11.X run an "alter diskgroup check all repair" to validate and repair any upgraded diskgroups.

Oracle VM Considerations

Database Initialization Parameter Considerations

  • Set PRE_PAGE_SGA=false. If set to true, it can significantly increase the time required to establish database connections. In cases where clients might complain that connections to the database are very slow then consider setting this parameter to false, doing so avoids mapping the whole SGA and process startup and thus saves connection time.
  • Be sure to monitor the number of active servers and calculate the average value to be applied for PARALLEL_MIN_SERVERS. This can be done by:
Select * from v$pq_syssstat;
Then: Get/save the value for row "Servers Highwater"
  • Tune PARALLEL_MAX_SERVERS to your hardware. Start with (2 * ( 2 threads ) *(CPU_COUNT)) = 4 x CPU count and repeat test for higher values with test data.
  • Consider setting FAST_START_PARALLEL_ROLLBACK. This parameter determines how many processes are used for transaction recovery, which is done after redo application. Optimizing transaction recovery is important to ensure an efficient workload after an unplanned failure. As long as the system is not CPU bound, setting this to a value of HIGH is a best practice. This causes Oracle to use four times the CPU count (4 X cpu_count) parallel processes for transaction recovery. The default for this parameter is LOW, or two times the CPU count (2 X cpu_count).
  • Set FAST_START_MTTR_TARGET to a non-zero value in seconds. Crash recovery will complete within this desired time frame.
  • In 10g and 11g databases, init parameter ACTIVE_INSTANCE_COUNT should no longer be set. This is because the RACG layer doesn't take this parameter into account. As an alternative, you should create a service with one preferred instance.
  • For versions prior to 11gR2, increase PARALLEL_EXECUTION_MESSAGE_SIZE from default (normallly 2048) to 8192. This can be set higher for datawarehousing based systems where there is a lot of data transferred through PQ. In version 11gR2, the default for PARALLEL_EXECUTION_MESSAGE_SIZE is 16K, which should prove sufficient in most cases.
  • Set OPTIMIZER_DYNAMIC_SAMPLING = 1 or simply analyze your objects because 10g Dynamic sampling can generate extra CR buffers during execution of SQL statements.
  • Tune DataGuard to avoid cluster related waits. Improperly tuned DataGuard settings can cause high LOG FILE SYNC WAIT and GLOBAL CACHE LOG FLUSH TIME. Reference: http://www.oracle.com/technetwork/database/features/availability/maa-wp-10gr2-dataguardnetworkbestpr-134557.pdf, http://www.oracle.com/technetwork/database/features/availability/maa-wp-10gr2-recoverybestpractices-131010.pdf

Performance Tuning Considerations

In any database system, RAC or single instance, the most significant performance gains are usually obtained from traditional application tuning techniques. The benefits of those techniques are even more remarkable in a RAC database.
  • Many sites run with too few redo logs or with logs that are sized too small. With too few redo logs configured, there is the potential that the archiver process(es) cannot keep up which could cause the database to stall. Small redo logs cause frequent log switches, which can put a high load on the buffer cache and I/O system. As a general practice each thread should have at least three redo log groups with two members in each group.
    Oracle Database 10g introduced the Redo Logfile Size Advisor which determines the optimal, smallest online redo log file size based on the current FAST_START_MTTR_TARGET setting and corresponding statistics. Thus, the Redo Logfile Size Advisor is enabled only if FAST_START_MTTR_TARGET is set.
A new column is added to V$INSTANCE_RECOVERY. This column shows the redo log file size (in megabytes) that is considered to be optimal based on the current FAST_START_MTTR_TARGET setting. It is recommended that you set all online redo log files to at least this value.
  • Avoid and eliminate long full table scans in OLTP environments.
  • Use Automatic Segment Space Management (ASSM). Hard to avoid in 10gR2 and higher. All tablespaces except system, temp, and undo should use ASSM.
  • Increasing sequence caches in insert intensive applications improves instance affinity to index keys deriving their values from sequences. Increase the Cache for Application Sequences and some System sequences for better performance. Use a large cache value of maybe 10,000 or more. Additionaly use of the NOORDER attribute is most effective, but it does not guarantee sequence numbers are generated in order of request (this is actually the default.)
  • The default setting for the SYS.AUDSES$ sequence is 20, this is too low for a RAC system where logins can occur concurrently from multiple nodes. Refer to Document 395314.1.

General Configuration Considerations

  • In 10gR2 and above the LMS process is intended to run in the real time scheduling class. In some instances we have seen this prevented due to incorrect ownership or permissions for the oradism executable which is stored in the $ORACLE_HOME/bin directory. See Document 602419.1 for more details on this.
  • Avoid SETTING ORA_CRS_HOME environment variable. Setting this variable can cause problems for various Oracle components, and it is never necessary for CRS programs because they all have wrapper scripts.
  • Use Enterprise Manager or Grid Control to create database services - all features available in one tool. For 10.2 and 10.1 one can use dbca to create these services and hence define the preferred and available instances for these services as part of database creation. However in 11.1.0.6 this is only available in Enterprise Manager and has been removed from DBCA.
  • Configure Oracle Net Services load balancing properly to distribute connections. Load balancing should be used in combination with 10g Workload Services to provide the highest availability. The CLB_GOAL attribute of 10g workload services should be configured appropriately depending upon application requirements. Different workloads might require different load balancing goals. Use separate services for each workload with different CLB_GOAL.
  • For versions prior to 11gR2 (where NUMA is disabled by default), ensure the NUMA (Non Uniform Memory Architecture) feature is turned OFF unless explicitly required and tested, as there have been issues reported with NUMA enabled. Refer to Document 759565.1 for more details.
  • Read and follow the Best Practices Guide for XA and RAC to avoid problems with XA transactions being split across RAC Instances. Reference: http://www.oracle.com/technetwork/database/enterprise-edition/bestpracticesforxaandrac-128676.pdf and http://www.oracle.com/technetwork/database/clustering/overview/distributed-transactions-and-xa-163941.pdf
  • Increase retention period for AWR data from 7 days to at least one business cycle. Use the awrinfo.sql script to budget for the amount of information required to be stored in the AWR and hence sizing the same.
  • ONS spins consuming high CPU and/or memory. This is fixed in 10.2.0.4 & 11.1.0.6. Refer to Document 4417761.8 and Document 731370.1 for more details and workaround.
  • Use SRVCTL to register resources as the Oracle user (not as root user). Registering (database, instances, asm, listener, and services) resources as root can lead to inconsistent behavior. During clusterware install, nodeapps is created by the root user. Only the vip resource should be owned by root. Any other resources owned by root will need to be removed (as root) then re-created via the oracle user. Check the OCRDDUMP output for resource keys owned by root.
  • For versions 10gR2 and 11gR1, it is a best practice on all platforms to set the CSS diagwait parameter to 13 in order to provide time for dumping diagnostics in case of node evictions. Setting the diagwait above 13 is NOT recommended without explicit instruction from Support. This setting is no longer required in Oracle Clusterware 11g Release 2. Reference Document 559365.1 for more details on diagwait.

E-Business Suite (with RAC) Considerations

  • Patch against known issues Bug 6142040 : ICM DOES NOT UPDATE TARGET NODE AFTER FAILOVER and Bug 6161806 : APPSRAP: PCP NODE FAILURE IS NOT WORKING
  • Change RAC APPS default setting to avoid slow Purchase Order approval. Document 339508.1
  • For 10gR2, it is recommended to set the init.ora parameter MAX_COMMIT_PROPAGATION_DELAY = 0 in the init.ora or spfile for E-business Suite on RAC. Reference: Document 259454.1. For 11gR1, zero is the default value for this parameter and that this parameter has been obsoleted in 11gR2.
  • You can use Advanced Planning and Scheduling (APS) on a separate RAC (clustered). Merging APS into OLTP database and isolating the load to a separate RAC instance is supported. Refer to Knowledge Documents Document 279156.1 and Document 286729.1 for more details.
  • You can run Email Center in a RAC environment. Reference Knowledge Document Document 272266.1 for RAC related specific instructions.
  • You can run Oracle Financial Services Applications (OFSA) in a RAC environment? Refer to Knowledge Document Document 280294.1 for RAC related best practices.
  • Activity Based Management (ABM) is supported in a RAC environment. Reference Knowledge Document Document 303542.1 for RAC related best practices.
  • When using Oracle Application Tablespace Migration Utility (OATM) in a RAC environment, be sure to follow the instructions for RAC environments in Document 404954.1.

Peoplesoft (with RAC) Considerations

  • Each instance and service must have its own row in the PSDBOWNER table. PSDBOWNER table must have as many rows as the number of database instances in cluster plus number of services in database.
  • If the batch servers are on database nodes then set USELOCALORACLEDB=1. By default process scheduler connects to database using sqlnet even its running locally and uses TCP/IP. If we set UseLocalOracleDB=1 in process scheduler domain configuration file(prcs.conf), it will use bequeath rather than TCP/IP and will improve performance. If we set UseLocalOracleDB=1, we have to set ORACLE_SID in peoplesoft users profile otherwise process scheduler will not boot.
  • For REN (Remote Event Notification) server work to properly, DB_NAME parameter should match in Application server domain and Process scheduler domain configuration which is being used to run the report. In the case of RAC, we should always use Service name for App and batch server as database name, so it will match the DB_NAME for REN server to work, as well as balance the load across all instances.
  • See Document 747587.1 regarding PeopleSoft Enterprise PeopleTools Certifications

Tools/Utilities for Diagnosing and Working with Oracle Support

  • Install and run OSWATCHER (OSW) proactively for OS resource utilization diagnosability. OSW is a collection of UNIX shell scripts intended to collect and archive operating system and network metrics to aid diagnosing performance issues that is designed to run continuously and to write the metrics to ASCII files which are saved to an archive directory. The amount of archived data saved and frequency of collection are based on user parameters set when starting OSW. It is highly recommended that OSW be installed and run continuously on ALL cluster nodes, at all times. Document 301137.1. Be sure to use separate directories per node for storing OSW output. When using OSWatcher in a RAC environment, each node must write its output files to a separate archive directory. Combining the output files under one archive (on shared storage) is not supported and causes the OSWg tool to crash. Shared storage is fine, but each node needs a separate archive directory.
  • Use the ASM command line utility (ASMCMD) to manage Automatic Storage Management (ASM). Oracle database 10gR2 provides two new options to access and manage ASM files and related information via command line interface - asmcmd and ASM ftp. Document 332180.1 discusses asmcmd and provides sample Linux shell script to demonstrate the asmcmd in action.
  • Use the cluster deinstall tool to remove CRS install - if needed. The clusterdeconfig tool removes and deconfigures all of the software and shared files that are associated with an Oracle Clusterware or Oracle RAC Database installation. The clusterdeconfig tool removes the software and shared files from all of the nodes in a cluster. Reference: http://www.oracle.com/technology/products/database/clustering/index.html
  • Use diagcollection.pl for CRS diagnostic collections. Located in $ORA_CRS_HOME/bin as part of a default installation. Document 330358.1
  • On Windows and Linux Platforms, the Cluster Health Monitor can be used to track OS resource consumption and collect and analyze data cluster-wide. For more information, and to download the tool, refer to the following link on OTN: http://www.oracle.com/technetwork/database/clustering/downloads/ipd-download-homepage-087212.html

11gR2 Specific Considerations

  • Review the note: Complete Checklist for Manual Upgrades to 11gR2 Document:837570.1
  • Review: How to Download and Run Oracle's Database Pre-Upgrade Utility Document:884522.1
  • While upgrading Oracle Clusterware 10g or 11gR1 to Oracle Grid Infrastructure 11gR2, be aware of Bug 8884781, whereby while running the script 'rootupgrade.sh' on the last node it reports that the nodeapps have failed to start. There is, as of yet, no fix for this bug. The most straight forward workaround is to remove and re-add nodeapps (including the public interface) after rootupgrade.sh is done on all nodes but before the config assistants are kicked off.
  • ADVM/ACFS is only available for Linux (and soon Windows) in version 11.2.0.1. ADVM and ACFS will be ported to Solaris and other Unix platforms in the future releases. Reference: Document 973387.1
  • For specific information regarding use of ASM in 11gR2, see the following white paper: http://www.oracle.com/technetwork/database/features/storage/extending.pdf
  • For information on integrating 10gR2 or 11gR1 databases with 11.2 Grid Infrastructure, reference: Document 1058646.1 How to integrate a 10g/11gR1 RAC database with 11gR2
    clusterware (SCAN)

11.2.0.2 Specific Considerations

  • Be sure to review Document 1189783.1 Important Changes to Oracle Database Patch Sets Starting With 11.2.0.2
  • For version 11.2.0.2 multicast traffic must be allowed on the private network for the 230.0.1.0 subnet. Reference: Document 1212703.1
  • Starting from 11.2.0.2, redundant Interconnect without any 3rd-party IP failover technology (bond, IPMP or similar) is supported natively by Grid Infrastructure. Multiple private network adapters can be defined either during the installation phase or afterward using the oifcfg. Reference Document 1210883.1

CRS / RAC Related References