Checked for relevance 8-18-2010.

PURPOSE
-------
To increase awareness of the performance impact that deadlock detection can have
on an instance.

SCOPE & APPLICATION
-------------------

This article does not cover any material about how to solve deadlocks, just how
Oracle detects them and how this detection impacts the overall performance.

It is intended for Application Developers, DBA's, Consultants and Support Engineers.

The Performance Impact of Deadlock Detection
--------------------------------------------

A deadlock can occur when two or more users are waiting for a data locked by each other.
Oracle automatically detects deadlock situations and resolves them by rolling back one
of the statements involved in the deadlock, thereby releasing one set of the conflicting
row locks.

There are other kinds of deadlocks that could involve deadlocks between latches,
between distributed operations, etc. Those are not discussed in this document and are
usually outside the control of the database or are bugs.

But how is that Oracle detects a deadlock and what is the impact it has on performance?

The Basics
----------
The sessions waiting to acquire locks are put in a queue. To know in which queue to go,
they are arranged using a hash function that uses the type of lock and some meaningful values,
depending on the operation they want to do. These queues are called "Hash chains" because
they are a chain of requests that fall within the same hash value.

An example is the TM lock; it uses the table object id as part as its hash hey.

If the lock mode that I want to acquire is "compatible" with the current lock the object,
then I'm put in a list of sessions that holds the lock.

An example is again the TM lock. Any DDL will request (X) lock in exclusive mode, blocking
everyone requesting any lock. But if I insert a lock, I only request a (S) shared lock, allowing
any other session that request a S mode to continue but blocking X modes.

A more detailed explanation of how locks work can be found in the concepts manual.

The Queues are memory structures in the SGA, and latches protect them.
They are called "Enqueue Hash Chain" latches. And there is parent and many child latches.

To be put in a queue, the session grabs an "enqueue resource" which the structure that has
the details of the lock and attempts to acquire the latch that protects the Hash chain
associated with the mode and the rest of the special values. After it is acquired, it
places the "enqueue resource" in the chain either as holder or waiter depending on
the circumstances and releases the latch.

A session waiting for a lock will have a row in v$session_wait view with the event 'enqueue'
and the rest of the columns will have the details of the kind of lock and lock mode being
requested.
See note:34566.1 WAITEVENT: "enqueue" Reference Note for more information on how to
decode the columns.

Initiating Deadlock Detection
-----------------------------

A session launches deadlock detection (we'll call that session the "Requesting Session")
when it has already a lock and is being forced to wait when requesting another.

The Deadlock detention starts by acquiring the Parent "Enqueue Hash Chain" latch.
By doing that, it automatically requests and holds all children "Enqueue Hash Chain" latches
and only releases the latches until the deadlock detection finishes.

One important fact to remember is that the database has many more types of locks besides
table locks, and the deadlock detention includes them all. That is why it needs to acquire
the parent latch and not only the child latch that belongs to the specific mode.

For example, the requesting session could be holding a TM lock on a table and requesting
the ST lock because it needs to allocate another extent to the table. It can be blocked
by SMON because it is doing space consolidation. At that moment, the requesting session
does not know that SMON is holding the ST. It only knows someone is holding it. So the
requesting session needs to verify that someone is not waiting for the requesting
session's TM lock or that someone is not waiting for another someone that is waiting for
the requesting session's TM lock.

Another important fact is that deadlock detection is only initiated when the lock being
requested is an application's type of lock like TM, TX and UL. The rest of the lock types
are usually for internal or very specific operations which is not possible to encounter
a deadlock.

Some Performance Impact
=======================
Once the Parent and child latches are acquired, no one can create or verify any lock until
the latches are released. If the deadlock detection takes too long, it can effectively
feel like an instance hang.

To try to speed things up, the session first check if the immediate owner of the lock
being requested (We'll call this session the "Holding Session") is also waiting on 'enqueue'.

If not, then it is more likely that there is no immediate deadlock. So the deadlock
detection finishes and the latches are released. But it will try again
later since it is still possible that holding session can wait on one of the detecting
session's locks later. Furthermore, we are not the next in line to acquire the lock and the
next one in line can be waiting on the requesting session's lock.

Climbing up the Tree
--------------------
If the Holding session is waiting on 'enqueue' then we start with him.

It can be that the session was killed or is orphaned in which case we need to wait for
the cleanup. We wait up to 15 seconds for it to happen. Otherwise, an internal error
ORA-600 [1151] is given.

The requesting session checks on what sessions are blocking the holding session.
If it finds that it is the requesting session or the holding session itself ( because
of an autonomous or recursive transaction), then a deadlock has been found.

For each of the blocking sessions, it is necessary to do deadlock detection because
they may be waiting on a lock from another session.

The more locks each session has the more complex the scenarios get, and the more recursive
calls are needed.
Also, the more sessions begin to wait on 'enqueue', the more deadlock detections needs
to be done.

More Performance Impact
=======================
Complex applications can easily create multiple locks that causes the suspicious of
deadlocks ,forcing the sessions to consume resources. Also it can block other session by
aquiring the latches until their request is satisfied.

Here is an example of how many times the latches can be requested in a normal database
in a 2hr period statspack, and not one single deadlock was reported.

Statistic Total per Second per Trans
--------------------------------- ------------------ -------------- ------------
enqueue conversions 611,346 84.9 3.8
enqueue requests 1,894,232 263.1 11.9

Latch Name Requests Misses Sleeps Sleeps 1->4
-------------------------- -------------- ----------- ----------- ------------
enqueue hash chains 4,402,265 2,053 39 2014/39/0/0/

Finally when a deadlock is found, a trace file is generated
with the deadlock graph. By default, it dumps the process state of the session that
found the deadlock while still holding the latches!
So the more complex the application and the more state objects it has, the bigger
a process state dump will take. This will cause longer time for the latches to be released.

For instances, where deadlocks are common and the cause is known and unavoidable, there
is little point to generate the traces since they become just extra overhead, So to
alleviate this situation patch 2235386 introduced with event 10027 that when set to level 1
does not do the process state dump. This makes the size of the trace smaller and the
release of the latches quicker. Setting the level to 1 does NOT eliminate the overhead
of the deadlock detection algorithm. It only helps to release the latch a bit more quicker
once the algorithm has finished scanning.

On the other hand, if more information is needed by support, then it can be set at
Level 2 to generate a System State dump. However, the customer must be aware that
it will take even longer time to release the latches. But this will give us more information.

event="10027 trace name context forever,level 1"

The patch is included starting in patchset 9.2.0.3 and in 10g.

The Performance Impact of Deadlock Detection [ID 285270.1]