It would be simple if improving the hardware solved the problem, but
unfortunately increasing hardware horsepower on its own in this case doesn't
buy a lot for end users. The simple truth is that the end user perception of
performance often comes down to a single database record, which several
processes may be trying to update and several others read at the same time.
Performance and scalability often require special design tricks on the database
side—and very often such tricks are database-engine specific.
From an architectural point of view, there are several ways to improve
performance and scalability of database-driven applications, and you must
usually implement them all to achieve the acceptable results. One technique is
to change the database architecture to improve database performance in certain
areas. Such changes include database splits, data normalization and de-normalization,
table splits (horizontally and vertically), adding indexes and constraints, and
other similar techniques. I won't go into detail on these in this article.
Sometimes however, one requirement—for example a need for fast
updates—contradicts other requirements, such as a need for fast searches and
real-time reporting. The latter requirements mean that you're not free to
change the architecture of the database significantly; it must be preserved in
a certain stable state that satisfies all the requirements to the greatest
possible extent. If at the same time, you need to run a complex process on the
database side, that must perform and scale well, you must usually design it as
carefully crafted database stored procedure.
There are several reasons to write stored procedures. First, from the logical
point of view, if the back-end logic can be divided into business logic and
data logic, the latter should naturally reside on the database side, leading to
cleaner design and better maintainability. Secondly, database engines compile
stored procedures and save execution plans along with them, which improves
performance "for free" (from the developer's point of view). Finally,
and most importantly for this discussion, placing data manipulation logic into
the stored procedures lets you use several approaches and tricks that can
improve their performance and scalability significantly.
Some of these approaches are global, obvious, and easy to follow. Others may be
specific to a particular database engine. Several approaches require
non-standard, task-specific design and ideas to implement properly—and this is
where the true art of database design comes to play. All the database
design-specific approaches, however, usually end up with more complex script
than improper design, and that's the major reason why they are often not
followed.
In this article, I'll share a few tips that I have personally found to be the
most important when developing SQL Server stored procedues. These tricks
reflect both the outcome of my mano-a-mano fights with the database to achieve
better performance, and a summary of the experience of the extremely talented
software developers I've had the privilege to work with. I'll present the tips,
and then illustrate them by the stored procedure-based SUDOKU solution I have
designed for this purpose.
Tips for Writing High-performance SQL
These tips apply broadly when writing high-performance stored procedures.
Unfortunately, unlike some tips, you can't simply apply most of them without
first considering the nature and schema of the data you're querying.
Avoid using cursors (as well as other looping structures)
as much as possible. Cursors are inefficient, and database engines usually
don't have the best loop implementations in terms of performance.
- On the database side, you can
usually replace code involving cursors with aggregate SQL statements (SELECT, INSERT, and UPDATE)
that use vector tables. All database engines are heavily optimized for
aggregate statements, so even if a loop is unavoidable, it is always
better to execute a few aggregate statements in a loop with a small number
of iterations, than to create a cursor and execute simple statements over
a large number of iterations.
Even if initial performance tests, especially with a small amount of data, show cursors to be more efficient than a complex aggregate statement, it is worthwhile to try to optimize the operation by breaking it into smaller portions or using other approaches—unless you can guarantee that the data value will stay small. Cursor approaches will not scale.
- Filter
data wisely. One alternative to using cursors uses a fall-through
approach, filtering and aggregating data in multiple steps via a set of
data storages, which could be physical tables, temporary tables, or table
variables. It is usually best to include some aggregate filters into
aggregate statements to filter out the majority of data in one simple shot
whenever necessary, working on smaller amounts of data. Then you can
proceed with joining and filtering, making sure to keep the number of join
permutations under control at all times.
- It is
usually more efficient to execute multiple statements with one condition
than a single statement with multiple OR conditions
when executing UPDATE and DELETE
statements against permanent database tables that can be accessed by
multiple users simultaneously. This tip is especially important from the
scalability point of view; from the performance point of view the
difference is usually marginal. The major reason for the tip is the
locking of the database records and the lock escalations that occur behind
the scenes.
- Make wise
distinctions between temp tables and table variables. Table variables
are in-memory structures that may work from 2-100 times faster than temp
tables. But keep in mind that access to table variables gets slower as the
volume of data they contain grows. At some point, table variables will
overflow the available memory and that kills the performance. Therefore,
use table variables only when their data content is guaranteed not to grow
unpredictably; the breaking size is around several thousand records. For
larger data volumes, I recommend temp tables with clustered indexes.
Interestingly, I've found that a temp table with one clustered index is
often faster than having multiple simple indexes. In contrast, multiple
simple indexes with physical tables are often faster than one clustered
index.
- Make
careful distinctions between hard rules and assumptions. This is more
of a business design tip, which applies more to code design than to
performance and scalability design in general. In real life however,
performance and scalability are generally the first things to suffer from
improper design. When rules are implemented as assumptions, they usually
cause unnecessary calculations to be performed, affecting performance.
However, when assumptions are implemented as rules they tend to cause
errors and algorithm failures, which usually requires an urgent redesign.
That, in turn, is usually performed with business constraints and results
in inefficient final algorithms. That's because bad design decisions are
often corrected in a rush and without sufficient resources—sometimes under
pressure from customers whose businesses are usually in a critical stage
when problems are uncovered, but must continue operating during the
process.
- Pay
attention to join order. Using proper join order sometimes lets the
database engine generate hints that execute joins with an optimal amount
of records. Most database engines also support hard hints, but in most
cases you should avoid using hard hints and let the database engine figure
out the best way to do its job on its own.
- Be
careful when joining complex views to other views and database tables
in complex SELECT statements. When the database
contains a significant amount of data, SQL Server engine tends to
recalculate the execution plan of the resulting statement, which often
results in an inefficient execution plan and may kill the performance. The
most difficult part is that the behavior of SQL Server engine is
inconsistent in that respect, and heavily depends on the database size,
indexes, foreign keys, and other database structures and constraints. The
consistent work-around is to pre-select data from the view into a temp
table with the reasonable pre-filters, and then use that temp table in
place of the underlying view.
- Create
indexes on temp tables wisely. As mentioned in Tip 4, clustered
indexes are usually the best in terms of performance for temp tables;
however, there is a difference between creating the index before or after
inserting data into the temp table. Creating the index before the insert
complicates the insert, because the database engine must order the
selection. For complex selections such as those mentioned in Tip 7, the
extra ordering may overcomplicate the overall statement and drastically
degrade the performance. On the other hand, creating the index after the
insert forces the database engine to recalculate the execution plan of the
stored procedure every time it is called. Therefore, the decision is
always a trade-off and you should make it based on the relative costs of
the two possibilities.
- In general, try to avoid
execution plan recalculation. One common cause of recalculation occurs
when the stored procedure contains several paths that depend on values
passed in parameters. However, whether avoiding recalculation is possible
depends on the complexity of the stored procedure and on other
circumstances, such as those described in tip 8. When the engine does
recalculate execution, performance always suffers; however, recalculating
the execution plan of the caller does not force the execution plan
recalculation of the called procedure (or view or function). Therefore,
the workaround is to divide one stored procedure into multiple procedures
(depending on the passed-in parameters), and then call the children from
the parent conditionally. You should perform this subdivision very
carefully though, because it can be a maintenance nightmare—but sometimes
it seems to be the only way to achieve acceptable database performance and
scalability.
Finally, although this isn't either a performance or a
scalability tip, I urge you to format your stored procedure scripts legibly.
It's best to agree on common practices such as clause order and formatting
rules with your coworkers in advance. Not only does that help avoid errors, it
also clearly shows the logical structure of the statements and often aids in
figuring out faulty filters and joins.
This list of tips is certainly not exhaustive, but they probably cover the most
important performance and scalability factors. Still, there's nothing like an
example to drive home the point. The Sudoku solution described in the rest of
this article illustrates the techniques in the first six tips.
The Sudoku Solution
I decided to craft this stored-procedure based Sudoku solution example for
multiple reasons. First, Sudoku uses a 9x9 table, so thinking of this table as
a database table is pretty natural. Second, the database solution has a
relatively simple structure, while at the same time—if used properly—nicely
represents a high-performance implementation of the solution, using the first
several tips discussed in the preceding section. Finally, because it's
flexible, a database solution lets you look at the puzzle itself from a generic
perspective, showing ways you can extend the puzzle solution to more generic
tasks.
The solution consists of two
database tables: one that stores the initial puzzle data, and a second that
stores calculation results, the script(s) to fill the tables, and the stored
procedure that calculates the results. You can download the sample code to experiment with the solution
yourself.
The TEMP_GS_V1.sql script contains the table definitions.
In the first table, each record represents a Sudoku cell. In the second table,
each record represents the entire solution. The reason for structuring the
tables in this manner will become clearer as you walk through the solution; but
keep in mind that it's important that the structure be both flexible and avoid
intermixing rules with assumptions (see Tip 5).
|
|
|
|
Figure 1. Unsolved
A1Escargot Puzzle: The figure shows A1Escargot puzzle in its initial unsolved
state. |
|
Sudoku's solution rules state
that each row, column, and 3x3 square must contain a unique set of digits from
1 to 9. The assumption is that given the initial set of numbers, there is only
one resulting solution; otherwise the puzzle is considered invalid. This
assumption, if transferred to real-life applications, sounds unreasonable.
There is no way that developers can ensure a valid data state; therefore, I
wanted my Sudoku solution to reflect that—to work properly whether the puzzle
is valid or not—and for invalid puzzles, simply figure out all the possible
solutions.
I have not created an interface for data entry (to create puzzles), because
that's outside of the solution scope, so you have to enter data into the TEMP_GS table using a script. The Fill_GS.sql
script contains the data entry example, which corresponds to a very complex
Sudoku puzzle called "AlEscargot," which was once claimed to be the
most complex SUDOKU puzzle ever. Figure 1 shows the puzzle in its initial
unsolved state .
The CALCULATE_GS stored procedure (see the CALCULATE_GS_V3.sql script) contains the solution algorithm,
which contains a set of steps that illustrate the aforementioned SQL Server
tips. Note that the CALCULATE_GS procedure accepts a Puzzle Number parameter. This is just a habit to get into when
you're creating solutions suitable for multi-user environments: if puzzles have
distinct IDs, then multiple users can run the stored procedure at the same
time, using the same physical tables.
The CALCULATE_GS stored procedure contains the following
steps:
- Create a table variable,
filling it with numbers from 1 to 9. This table variable illustrates Tip 4,
because there are not that many records in the table, and it is heavily
used within the stored procedure, sometimes joining to itself three times.
In such cases, using a table variable is far more efficient than using a
temp table. The procedure fills in the TEMP_GS table
with all possible rows that correspond to possible cells using two
operations:
- First, it calculates
singletons—the cells which can contain only one possible value to meet the
Sudoku rules. This evaluation gets performed multiple times, because new
cells may become singletons as a result of previous calculations. This
operation does not use a cursor; instead, it executes aggregate statements
until singletons exist, which minimizes the number of possible statements.
This approach illustrates Tip 1.
- Second, when no more
singletons exist, the rest of the cells are filled with multiple possible
digits, again in one operation (see tip 1).
- Next, the procedure creates
and fills in a temporary table that corresponds to possible Sudoku rows.
In this case, because the number of records is unpredictable in complex
scenarios, it's best to use a temp table. It's worth noting that solving
the solution using table variables may be faster in many instances. In
real life, such situations may necessitate two separate stored
procedures—one optimized for low volumes, and another for high volumes.
The creation of the intermediate table over the aggregation steps
illustrates Tip 2.
This table eliminates a lot of invalid combinations based on a very simple
aggregation condition: the sum of cells within each row must be equal to
45. This condition is simple enough (compared to the requirement to have
unique digits within each row) to not slow down the aggregate statement,
yet powerful enough to eliminate the majority of invalid combinations in a
single step.
- Create and fill in a
temporary table that corresponds to possible Sudoku columns, following the
same paradigm as in step 2. This approach illustrates Tip 2:
even though logically you need only the table that corresponds to valid
rows, joining it to itself at a later point may result internally in
billions of combinations after the joins and before filtering, which kills
performance. In this and the next step, creating the extra tables to join
lets you eliminate the possibility of too many intermediate combinations
during joins.
- Create and fill in a temporary
table that corresponds to possible Sudoku squares, following the same
paradigm as in steps 2 and 3.
- Delete the records that do
not adhere to Sudoku's rules from the three temp tables created in steps
2- 4. The tables were pre-filtered using aggregate conditions when the
records were inserted into them, so now deleting the extra rows using
precise conditions is fast because it operates on a small subset of data.
In addition, deleting records separately based on each violation is faster
than using an OR clause (the violations are
represented by the rows where any value is repeated in any pair of
columns). I combined all the delete statements into one loop using dynamic
SQL to save some space. Because the statements are simple and don't
require complex execution plans, using dynamic SQL does not affect
performance much. However, if the statements were complex, it might be
wiser to avoid dynamic SQL. This step illustrates Tip 3,
though as explained earlier, it is not terribly important for temp
tables—it's far more important for permanent database tables.
- After completing step 5, all
three temp tables contain only valid Sudoku structures (rows, columns, and
squares); therefore, simply joining them must result in valid solutions
(note that for some puzzles, multiple solutions are possible). Joining
operations are always fast, as long as the joined tables are small or
indexed, and as long as the intermediate steps do not result in too many rows.
The database engine is usually smart in optimizing the joins; however,
specifying the proper join order helps the engine sometimes. Therefore,
this step joins the three tables in palette order, starting with a row,
then a column, then another row, another column, etc., involving squares
at some point, so that each extra join will potentially result in the
minimal number of virtual rows. The result of the join is inserted into
the TEMP_GS_RESULT table, which contains the
solutions, as mentioned earlier.
For puzzles that have only one
solution, the FinalSelect.sql script selects from the TEMP_GS_RESULT table and outputs the solution in tabular form.
The represented solution is reasonably fast. It solves the
"AlEscargot" puzzle from Figure 1 in five minutes and
twenty seconds on a 2.4 GHz machine with 1 GB of RAM. The solution (output in
text form) is shown below:
1 6 2 8 5 7 4 9 35 3 4 1 2 9 6 7 87 8 9 6 4 3 5 2 14 7 5 3 1 2 9 8 69 1 3 5 8 6 7 4 26 2 8 7 9 4 1 3 53 5 6 4 7 8 2 1 92 4 1 9 3 5 8 6 78 9 7 2 6 1 3 5 4
Apart from the efficiency
however, the solution, being both database-based and properly designed,
represents the following additional advantages:
- It does not assume that
puzzles have only a single solution; it finds all the possible solutions
if more than one is available.
- It has a very clean structure
that prevents mistakes.
The clean structure provides clear paths to extensibility. For example, because all the steps and joins are pretty obvious, it's very easy to rewrite the solution using dynamic SQL, extending it to higher dimensions. The standard Sudoku puzzle is based on a 32 x 32 square. Using a similar clean stored procedure structure and dynamic SQL, it is very easy to extend the solution to solve N2 x N2 square puzzles—and the resulting solution is guaranteed to perform well. However, for big Ns you should redesign the final result table to be cell-based rather than solution based, otherwise it may easily exceed SQL Server's column number and row length limitations. That's a rather simple enhancement though, and doesn't affect the efficiency of the major algorithm. In other words, the solution is scalable, and extended versions of the solution will be scalable as well, and that clearly differentiates it from the other possible solutions.
