The Baker’s Dozen: 26 Productivity Tips for Optimizing SQL Server Queries (Part 1 of 2)
There’s an old programmer adage: “First you make it work, then you make it work fast.” Well, when writing T-SQL queries, you can do both, if you have some knowledge about how the SQL Server optimizer works. This will be a two-part article. In part one, I’ll start with fairly basic optimization tips and techniques for writing SQL queries. In part two, I’ll cover more advanced techniques. What’s on the Menu? Here are the different performance optimization tips and topics for this article: - Basic query optimization fundamentals
- An example of Hash Match Aggregate versus Stream Aggregate
- The Baker’s Dozen Spotlight: how the SQL Server 2012 Columnstore index can help
- More information on the Columnstore index
- Queries using dates - what works well, what doesn’t
- Trying to outsmart the optimizer with NULL checks
- Queries that are search-argument optimized and queries that aren’t
- Are correlated subqueries bad?
- New capabilities in SQL Server 2012 - are they faster?
- Recursive queries versus loops
- APPLY and Table-valued functions versus other approaches
- Inserting and updating multiple rows in one procedure, using INSERT and UPDATE
- Inserting and updating multiple rows in one procedure, using MERGE
SQL Server 2012 Released! While 10 of the 13 tips in this article will apply to SQL Server 2008, three of the tips cover new features in SQL Server 2012, which Microsoft released in the spring of 2012. If you have the opportunity to try out SQL Server 2012, I strongly encourage you to do so. The performance of the new Columnstore index (in Tips 3 and 4) is worth the effort alone! Most of the example listings in this article use the AdventureWorks2008R2 demo database, which you can find on CodePlex. Tip 1: Basic Indexing Fundamentals Have you ever created an index and wondered why SQL Server didn’t appear to utilize it? Consider the code in Listing 1, which does the following: - Creates a table from the existing Purchasing.PurchaseOrderHeader table, and then builds two indexes: a clustered index on the PurchaseOrderID column (presumably for a Primary Key) and a non-clustered index on the VendorID.
- Retrieves a handful of columns based on a specific VendorID.
So here’s the question: will the query use the non-clustered index on the VendorID, and perform an Index Seek (one of the more efficient operators)? The answer is no, which might surprise some people. The SQL Server execution plan shows that the optimizer performs an Index Scan through the clustered index. So why does SQL Server take what seems to be the “long way around?” The answer is because the non-clustered index wouldn’t be as “short” a path as you might think. In Listing 1, the code runs the query again, forcing SQL Server to use the non-clustered index. While the execution plan shows that SQL Server performs an Index Seek, the optimizer also performs a Key Lookup into the clustered index, to retrieve all the “other columns” you want to return, and that winds up being far more costly than just scanning through the clustered index to begin with. How costly? If you run the two queries together in a batch, the query that uses the clustered Index Scan costs about 27% of the batch, while the non-clustered Index Seek/Key Lookup costs 73%! So SQL Server is electing to use the clustered Index Scan. So how can you modify the scenario so that SQL Server will use the more efficient Index Seek - and more important, NOT use the inefficient Key Lookup to retrieve the other columns you want? The answer is with a covering index - where you can store the columns you frequently wish to retrieve “in” the index. The columns (such as the OrderDate, TotalDue, ShipMethodID, etc.) will not be part of the index key, but are instead columns that “come along for the ride.” That way, SQL Server only needs to read the non-clustered index, and doesn’t need to “cross-reference” to the clustered index (or the original heap table) to retrieve all the other columns you want to bring back. | " | So how can you modify the scenario so that SQL Server will use the more efficient Index Seek - and more important, NOT use the inefficient Key Lookup to retrieve the other columns you want? The answer is with a covering index - where you can store the columns you frequently wish to retrieve “in” the index.
| " |
Accordingly, Listing 1 creates a covering index to include the additional columns. If you test all three scenarios (the clustered index, the non-clustered index on just the VendorID, and the covering index on the VendorID that also includes the other columns you want to return), you’ll see that the query costs are 21%, 76%, and 2% respectively. So clearly, a covering index will yield superior performance! Obviously, covering indexes come at a cost: they are larger, and SQL Server must maintain the additional entries when you add rows. So developers and DBAs should use them judiciously. | & | | 
By: Kevin S Goff Kevin S. Goff, a Microsoft SQL Server MVP, has been writing “The Baker’s Dozen” productivity series in CoDe Magazine since 2004. Kevin hosts a weekly webcast on SQL Server and Business Intelligence topics, at http://www.BDBIRadio.com.
Kevin has been an applications developer, architect, and technology mentor since 1987. Kevin has worked in many lines of business, including insurance, manufacturing, health care, consumer packaged goods, accounting and finance, advertising, and many others. Along the way, Kevin has won several awards in both the public and private sector for applications development. Additionally, Kevin has taught SQL Server, Business Intelligence, and Data Warehousing classes to hundreds of students over a period of five years and has built numerous custom courseware modules along the way.
Kevin has authored one book on reporting development and contributed chapters on MDX programming to a second book. He is currently developing a set of commercial video training courses for SQL Server and BI topics. He has been a SQL Server MVP since 2010 and was previously a .NET/C# MVP from 2005 to 2009.
Kevin is a frequent speaker at User Group and other community events (SQL Saturday, SharePoint Saturday, Code Camp) in the Mid-Atlantic region, and speaks occasionally at conferences. In 2012 Kevin worked with Microsoft TechNet Radio on a 13 week webcast series on new features in SQL Server 2012.
For more information, check out Kevin’s main site at http://www.KevinSGoff.net, or email him at kgoff@kevinsgoff.net
kgoff@commongroundsolutions.net Article Project File
You can find the entire source code and project file on my website at www.KevinSGoff.net in the download area. |
| Listing 1: Basic example of index usage and optimization | USE AdventureWorks2008R2 GO
IF ( OBJECT_ID('dbo.TempPurchaseOrderHeader' ) is not NULL) DROP TABLE dbo.TempPurchaseOrderHeader GO
-- Create a table from the contents of PurchaseOrderHeader SELECT * INTO TempPurchaseOrderHeader FROM Purchasing.PurchaseOrderHeader
-- Create a clustered index on the PurchaseOrderID -- and also a non-clustered index
CREATE CLUSTERED INDEX [TPOH_ClusteredIndex] ON TempPurchaseOrderHeader ( PurchaseOrderID)
CREATE NONCLUSTERED INDEX [TPOH_NCLIndex] ON TempPurchaseOrderHeader ( VendorID)
-- Run the following query - does a CLUSTERED Index Scan -- Why doesn't it use the non-clustered index and a seek? SET STATISTICS TIME ON SET STATISTICS IO ON
SELECT PurchaseOrderID, ShipMethodID, OrderDate, TotalDue FROM TempPurchaseOrderHeader WHERE VendorID = 1492
-- Non-clustered index takes about 73% of the cost!!! -- Why? SQL Server must perform ""Key Lookup"" to retrieve -- non-indexed columnns (shipmethodid, OrderDate, TotalDue, etc.) SELECT PurchaseOrderID, ShipMethodID, OrderDate, TotalDue FROM TempPurchaseOrderHeader WITH (INDEX ( [TPOH_NCLIndex])) WHERE VendorID = 1492
-- now create a covering index! CREATE NONCLUSTERED INDEX [TPOH_NCLIndex_Covering] on TempPurchaseOrderHeader (VendorID) INCLUDE (PurchaseOrderID, ShipMethodID, OrderDate, TotalDue) -- clustered index takes 26% of the batch
SELECT PurchaseOrderID, ShipMethodID, OrderDate, TotalDue FROM TempPurchaseOrderHeader WITH (INDEX (0)) WHERE VendorID = 1492 -- non-clustered index takes 71% of the batch SELECT PurchaseOrderID, ShipMethodID, OrderDate, TotalDue FROM TempPurchaseOrderHeader WITH (INDEX ( [TPOH_NCLIndex])) WHERE VendorID = 1492
-- covering index takes 2% of the batch SELECT PurchaseOrderID, ShipMethodID, OrderDate, TotalDue FROM TempPurchaseOrderHeader WITH (INDEX ( [TPOH_NCLIndex_Covering] )) WHERE VendorID = 1492
|
|