Controlling Transactions and Locks with Hints and Snapshots in SQL 2000 and 2005.

Locks

Locks are the mechanism databases use to control problems that may arise from simultaneous multi-user access. Some common issues locks try to manage include; what happens when two users try to change the same piece of information at the same time - who wins? Or user one tries to read data while user two is making changes on the same information, - which version of the data should user one see, changed or unchanged? Without locks, these situations may cause the data or results of a query to be logically incorrect. Lock sophistication is a key difference between simple databases designed for single user applications and enterprise databases capable of simultaneously handling large volumes of multi-user actions. Locks are automatically created and managed by both Microsoft SQL 2000 and SQL 2005, but there are cases when a developer or DBA will obtain better performance by dictating the locking behavior SQL should use. This article will introduce Transactions and Locks, as well as methods for controlling them such as using Locking Hints and Snapshots.

Before a review of locks can take place, transactions (the smallest unit of work in TSQL), must be introduced.

[@more@]

Transactions

Think of a transaction as a logical container holding TSQL that can stand alone, not needing any other statements to be complete. Every transaction causes at least one lock to be issued. If you issue a simple one-statement update such as:

UPDATE authors
SET au_fname = 'Don'

Then your transaction consists of only this one statement. However, if the above statement first required that some new records be inserted into the table, then the transaction would consist of two statements:

INSERT INTO authors
 (au_fname)
 SELECT au_fname
 FROM some_other_table
UPDATE pubs
SET first_name = 'Don'

To tell SQL that these two statements must happen together, in the order presented, they are both wrapped together inside a single transaction, noted by the key phrase BEGIN TRANSACTION:

BEGIN TRANSACTION
 INSERT INTO authors
 (au_fname)
 SELECT au_fname
 FROM some_other_table
UPDATE pubs
SET first_name = 'Don'
COMMIT

COMMIT

The COMMIT tells SQL that it has reached the end of the transaction, and whatever data modifications the statements have made should become a permanent part of the database. The other option for ending a transaction is ROLLBACK, which means all the changes just made should be undone. At this point, for both COMMIT and ROLLBACK, all resources and locks used by the transaction are released.

As transactions become longer and more involved, locking becomes more problematic. The code below will create a simple example of this. We will simulate a long running transaction by not issuing a COMMIT; this will hold a lock on records a second statement will request. In SQL 2000, run the following TSQL from Query Analyzer:

USE pubs
GO
BEGIN TRANSACTION
  UPDATE authors
  SET au_fname = 'Don'

Now open a second instance of Query Analyzer and run this select:

SELECT au_fname
FROM authors

The select query will not get a response. Its waiting for the lock SQL automatically issued at the BEGIN TRANSACTION to be released. To release it, issue this key word from the first Query Analyzer session:

ROLLBACK

When you do, the update changes will be discarded and the second Query Analyzer session will get a dataset back.

Transactions and locks increase the complexity of database operations, but they guarantee valid data and query results in multi-user applications. This guarantee is expressed with the acronym ACID. ACID stands for Atomicity, Consistency, Isolation, and Durability.

For a transaction to be Atomic, all of the DML statements (Data Manipulation Language, INSERT, UPDATE, and DELETE) must either all commit, or rollback. Meaning a transaction cannot be left in a half done state. An example would be a transfer of funds from a savings account to checking. If the withdrawal from saving happens, the credit to checking must happen, or if the withdrawal fails, then the credit to checking should not happen. In either case, both must either succeed or fail together.

Consistency means a user should never see data changes in mid transaction. Their view should return the data as it appeared prior to beginning the transaction, or if the transaction is finished, then they should see the changed data. Using the above checking account example, a select run at the same time as the transfer transaction should not return the saving account balance debited, but the checking account not yet credited.

Isolation is at the heart of multi-user transactions. It means one transaction should not disrupt another. In the transfer example, if at the same time we are transferring money from savings to checking, our employer is making a payroll direct deposit into the same checking account, neither transaction should cause the other to corrupt data.

Durability implies that the changes made by the transaction are recorded permanently.

Lock Granularity

The simplest way to meet the ACID requirement would be for a transaction to create a lock that seizes the entire database being affected. Not letting anyone else touch or see any other data until the transaction completed. This solution would of course cause a great deal performance problems as other transactions tried to access data. The other extreme would be to lock on a single row. Alternatively, if the transaction needed a million rows, issue a million locks, one on each row. This would leave most of the database open to other transactions, but each lock issued takes up memory on the server. So even though locking on a row-by-row basis would leave most of the database open, locking a million rows would generate quit a bit of overhead and performance again would suffer. To avoid this, locks usually try to seize a group of something right sized for the transaction. A lock can be issued against a Database, Table, Extent, Page, or a Row. (A Page is usually an eight-kilobyte storage area on the hard drive. An Extent is a group of eight pages.) This type of right sizing will give the best performance with the least amount of restriction.

Conclusion

Locks are occurring on the database at all times. Every Select and DML statement issues a lock. Although SQL can control locks automatically, sometimes controlling them manually will lead to increased performance. Next moth, we will introduce a variety of different locks and methods for controlling or requesting them.

Introduction

In last month's article, Controlling Transactions and Locks in SQL 2000 and 2005 - Part 1, transactions (the smallest unit of TSQL work) were introduced. ACID, the acronym that governs transaction integrity, was used as the framework for transaction behavior. In order for a transaction to meet the requirements of ACID, locks are employed to insure data integrity and multi-user access. The scope, or number of rows held by a lock, is referred to as Lock Granularity. This month, we will begin by introducing several different types of lock modes employed by MS SQL.

Lock Modes

SQL utilizes several different types of locks and modes. The way in which a lock shares, or does not share records it is currently working on, are called Lock Modes. This article will focus on the Lock Modes most commonly found in daily SQL work, and the ones we will most likely wish to control during individual transactions.

Exclusive Lock X

This lock mode is very straightforward, a group of records are taken, (row, page, extent, table, or database), and are held exclusively by one transaction. When an Insert, Update, or Delete statement runs, an Exclusive Lock will be issued. No other operation of any kind, (read or DML), can use the records held by an X lock. If the transaction is very long running, such as a nightly update routine, then any reports trying to run while the Exclusive lock is in place will fail. Because of this, SQL will try to release these types of locks as soon as possible.

Shared Lock S

A Shared Lock is applied on records read by a select statement. It is designed to allow concurrent read access from other transactions, but none can modify the held records. This lock enables reads to comply with ACID by disallowing records to be changed at the same instant a read occurs. Only committed data can be read. Shared lock manipulation is one of the areas we will gain performance by controlling.

Deadlocking

Before moving on to the next lock type, a brief review of deadlocking is required. Deadlocking refers to the condition in which one resource is waiting on the action of a second, while that second action is waiting on the first. This is different from being blocked, or having to wait for a resource. Using the locks above, if a transaction had a shared lock, then you issued a delete on those same records held by the first lock, you would not be deadlocked. Instead, you would be blocked. When the shared lock was released, your delete statement would complete. Blocking implies some performance hit, but the transaction will complete. It simply has to wait for something else to finish first. A deadlock on the other hand, means there is no way to finish. Your transaction is stuck in a loop with some other transaction. At this point, the database system will usually pick one transaction to be killed so the other can complete.

Update Lock U

A common type of deadlocking can occur when statements that have Shared locks want to convert them to Exclusive locks. The example usually cited involves one transaction holding a shared lock, while a second transaction also receives a shared lock on the same records, or a subset of records. This causes no problems because more than one transaction can hold a shared lock at the same time on the same data. But now, the first transaction wants to delete some of the rows, so it requests a conversion to an Exclusive lock. The lock cannot be issued because the second transaction still has that shared lock. Therefore, it waits. As the first transaction waits, the second transaction requests an Exclusive lock to also delete or update some records. It cannot be granted because the first transaction still has a shared lock. So now, it also waits. Now, both transactions are waiting on each other to finish so its locks can be converted. Neither will ever finish because they are deadlocked, or stuck in a loop with each other. An Update Lock stops this type of deadlock from occurring. When a transaction signals intent to convert, an Update lock is requested. Only one transaction can obtain an Update lock on the selected resource at a time. When the records are actually modified, the Update lock will be converted to an Exclusive lock. By only allowing one transaction at a time to obtain an Update lock, the deadlocking on conversion requests is eliminated. This type of lock can be understood by looking at an update statement with a WHERE clause. Before the update can complete, the records meeting the WEHRE clause must be selected. Rather than use a shared lock for this initial select, an Update lock will be requested.

Other Locks

SQL employs several other lock types as well, but they will not be used for lock optimization and control in this series. We will briefly introduce them. Schema locks (Sch*), are usually used when a table is being modified, such as column being added. Bulk Update locks (BU), are used for bulk update statements. Intent locks (I*) are used internally by SQL to increase performance.

SP_LOCK

The mode code mentioned after each lock type (such as S for Shared, X for Exclusive), mirror the built in SQL command SP_LOCK. This procedure will return a list of information about locks that have been issued. As an example, we will begin a transaction but not complete it, then run SP_LOCK to view our transaction information. On SQL 2000, run this statement:

USE pubs
GO
BEGIN TRAN
 UPDATE jobs
 SET job_desc = 'something'


EXEC sp_lock

The following lock information is returned.

Now issue a roll back to end and cancel the transaction:

ROLLBACK

Several of the modes mentioned appear, S for Shared, X for Exclusive. For a complete description of the values returned, see http://msdn.microsoft.com/library/default.asp?url=/library/en-us/tsqlref/ts_sp_la-lz_6cdn.asp

Conclusion

Lock modes provide SQL with a method of controlling data integrity. With an understanding of the modes, next month we will begin to manipulate them for performance gains.

Introduction

In the preceding articles of this series, Lock Granularity, Transactions, and ACID were introduced. Common lock types, such as Shared, Exclusive, and Update were explored, as well as using SP_Lock to obtain current system lock information. In this article, the normal internal SQL locking methods will be manipulated using Lock Hints in order to obtain finer lock control.

Why?

Microsoft SQL does a very good job of self-monitoring and self-adjusting lock control. The engine is smart enough to change from row locks to page locks if the number of records involved increases to a point where individual row locks would no longer be optimal. It will also select a lock type specific to the type of statement being issued. So why bother with manual lock manipulation? There are some common situations where fine lock control will be desirable. One is a long running batch job where you as the developer or DBA, know that the lock you will need at the end is not a lock type SQL could have guessed looking at the beginning statements. The other situation is where the SQL server is doing simultaneous dual roles, such as receiving order entry input and doing reporting at the same time. At low transaction counts, this isn't a problem, but as the number of users and transactions increase, performance degradation occurs. This is especially common in real time production systems, where in addition to typical OLTP (Online Transaction Processing) and OLAP (Online Analytical Processing), the SQL server is issuing control instructions to some type of machine where performance hesitation cannot be tolerated. At the same time, it is unacceptable to have long delays in executive reporting. Ideally, these roles would be split between several different physical machines, but that is not always achievable due to labor or cost restraints. In these situations, manual lock manipulation may solve some of the issues.

Lock Hints

By default, SQL Server 2000 and 2005 both employ a lock type called Read Committed. The purpose of this lock is to ensure that any record set returned by a select only contains committed, accurate, real data. An example of non-real data, or inaccurate data would be if a user began a transaction to change an employee name from Mike to Jim, while this transaction is running, but before it commits, a second user issues a Select on employee name. Because the first transaction is not committed, SQL does not know what the real value of employee should be. Is it Mike, or Jim? It will be Jim if a commit is issued, but Mike if a rollback occurs. Rather than guess at a response, SQL's Read Committed says only committed data will be returned, therefore, the select statement will be blocked until the first transaction either commits or rollbacks. This type of action is also known as "Writers always block Readers." It ensures only accurate data is returned. The downside is the delay encountered by the select statement. In our example, the delay would be extremely brief, but if the database were large and our update affecting many rows, the delay may be unacceptable. Especially if the select is for some type of real time report that users are expecting to be returned almost instantaneously.

One of the first questions to ask in the above scenario is whether the data being returned by the select needs to be one hundred percent accurate. Many real time reports that display running counts, such as "widgets made this hour," need only to be approximate. The nature of the report is to provide a quick count and will be refreshed in a short amount of time. In this situation, there is an easy solution in both SQL 2000 and SQL 2005 called Read Uncommitted.

NOLOCK

Read Uncommitted, also known as No Lock, is the opposite of the Read Committed lock described above. With No Lock, all rows are read, regardless of any exclusive locks that may be placed on them. In the above employee name example, it is possible to receive either name back, Mike or Jim back, depending on the timing of the select, commit, and rollback. The benefit of this is a very fast read that will not be blocked by exclusive locks. For the real time report example, it is a perfect fit. NO LOCK is enabled using the keyword WITH.

SELECT SUM(qty)AS RunningQty
FROM sales WITH (NOLOCK)

No Lock also has secondary benefit. Not only does it ignore Exclusive locks on rows, but it does not issue a Shared Lock on the records it reads. Therefore, it will not delay or block a transaction trying to write, a situation known as "Readers block writers." This can be very important in dual role systems where the database is responsible for simultaneous reporting and data input.

READPAST

Another similar option is called READPAST. With this hint, any rows being locked will be skipped. It is specified using a WITH command like NOLOCK.

SELECT au_lname
FROM authors WITH (READPAST)

Unlike NOLOCK, READPAST will skip only row locks, so if a long running transaction has a page lock, a select issued with READPAST will be blocked until the blocking transaction frees. In addition, READPAST will issue a Shared lock, so this reader will block writers.

Other Hints

There are thirteen different hints and isolation levels. Many are designed for the opposite purpose discussed in this article; they are used to issue blocking locks in order to guarantee a long running transaction has consistent data for the life of a batch.

For a complete list of Lock hints, see http://msdn.microsoft.com/library/default.asp?url=/library/en-us/acdata/ac_8_con_7a_1hf7.asp .

Conclusion

NO LOCK fit our example perfectly. It gave a quick read by ignoring others locks. However, what if our select data needs to be Commit accurate, we do not want to skip any rows, and we still don't want to wait on others locks to free. In SQL 2000, this is not possible, but new in SQL 2005 is Snapshot. This was one of the hot topics at the PASS Community Summit in Dallas last month. In the next article, we will focus on this new isolation level.