SQL Server 2008 Under the Hood: Compression Technologies
Microsoft released SQL Server 2008 to the public this past July. There are many major new features that are striking, such as the Resource Governor and Policy-Based Management; features that are high-profile and have received a lot of attention.
But when you look under the hood, behind these high-visibility features and into the SQL Server engine, some other features also stand out as very notable: database encryption, an entirely new auditing framework, and the topic of this article: compression.
SQL Server 2008 uses compression in three prominent places:
- Data compression allows you to compress database tables and indexes
- Backup compression adds the capability of compressing database backups
- Database mirroring log stream compression, which compresses the log data between the principal and mirror servers in a database mirroring session
Each of these features improves performance, but perhaps the one that will deliver the biggest gains is data compression.
You might initially think that data compression just refers to the way data is stored on disk, but in SQL Server 2008 it means more than that. Data compression actually refers to the way the contents of data and index pages are stored in the page; the resulting compressed data pages are then stored in their compressed format on disk.
You can compress data for tables, indexes, and indexed views. You load database data primarily into tables. But there are two types of subordinate database objects that also contain partial copies of a table’s data. Indexes contain selected subsets of a table’s data in B-tree structures, and indexed views contain portions of a table’s data based on the view definition. All three of these objects can benefit from data compression.
You can compress data at the row or page level. For row compression, the row’s data is compressed in such a way that SQL Server can reconstruct the full row based on the contents of the compressed row. As a result, the compressed row can move from page to page without changing the compressed image of the row. When you compress at the page level, row-level compression is enhanced with a dictionary-type lookup: each data page has a lookup table for repeating patterns in the rows of the page. If a row has to move from one page to another, its page-level compression has to be decoded from the one page and then recoded into the new page, so the row movement is a little more expensive, and so you may see some increased CPU usage on updates to compressed data and indexes.
The benefit of data compression is that the number of I/O’s (logical and physical) to access data in the page or index is reduced because the data is more compact. Because compression of a table at either the row or page level still keeps the same data, indexes on the table behave essentially the same as with uncompressed data, so you don’t need to worry about changing indexes because you’ve compressed some data.
Data compression has great potential for dramatic increases in performance and scalability. It can significantly reduce the number of I/Os required to access data, making it possible to store larger numbers of rows in a table. Meanwhile, existing tables will benefit from better query performance due to the reduction in I/O.
For example, I recently had the opportunity to test out data compression on a 1-million row, 20GB table. Compressing it at the page level reduced it by 60% in size, and accordingly reduced our benchmark query duration by a similar 60%. This was due to a 70% reduction in the number of logical I/Os required for the query. On the other hand, CPU usage for the query increased by 36%.
There is a catch: data compression is slated to be a feature of SQL Server 2008 Enterprise Edition only. If you compress tables or indexes in a SQL Server 2008 database that contains compressed tables or indexes, you will probably need the Enterprise Edition (or Developer Edition) to restore the database. For example, in SQL Server 2005, if you back up a database using table partitioning, an Enterprise Edition feature only, you cannot restore the database to a Standard, Workgroup, or SQL Express edition. We’ll have to wait and see whether data compression has the same restriction.
There are several third-party products that will compress SQL Server backups, and they’ve been available for several years. Perhaps the best known is Quest Software’s SQL LiteSpeed, but there are others available as well, such as Red Gate’s SQL Backup.
SQL Server 2008 introduces native backup compression, and like data compression it is also an Enterprise Edition feature. However, unlike data compression (and table partitioning), any other SQL Server 2008 Edition can restore a compressed backup: the only restriction is that the Standard, Workgroup, and SQL Express Editions cannot actually create compressed backups.
Backup compression is different from data compression because it occurs only at the storage level. That is, the compression occurs when you create the backup, and the result is just a compressed backup file or image. Like data compression, backup compression has the potential to increase performance of backups. For example, in my own tests the SQL Server 2008 AdventureWorks2008 sample database backup file reduces in size from 176MB to 40.5MB, a 75+% reduction in size. The time taken for the backup also reduces because of the lower amount of I/O to disk required. On my laptop, it reduced from 15 seconds uncompressed to 10 seconds compressed. Again CPU usage is higher for the backup process because of the compression activity.
Database Mirroring and Compression of Log Stream
A third area where SQL Server 2008 makes use of compression is in database mirroring. When mirroring between two SQL Server 2008 instances, SQL Server will compress the transaction log records that are streamed from the principal server to the mirror server. This compression is automatic and SQL Server 2008 Books Online states that it will work out of the box and not require the Enterprise Edition.
The Microsoft SQL Server Customer Advisor Team (CAT team) ran some tests on the benefits of compressing log records and published their results at
They found that the benefit was greatest on networks that had lower bandwidths. Again, they also report an increase in CPU usage cost due to the compression, but on balance the gain in transfer speed would seem to outweigh that cost. (Incidentally, their report notes the use of the SQL Server 2008 trace flag 1462, which disables database mirroring log compression.)
SQL Server 2008 Compression
Data compression is likely going to be the heavy hitter among the compression features because it has the potential to improve performance on queries that access large data warehouse fact tables. Even OLTP systems with very large tables will likely benefit, provided the tables are read-mostly. In fact, some people informally have called data compression the “killer feature” of SQL Server 2008.
But all three forms of compression are likely to be beneficial, and a welcome addition to the performance tuning array of tools. The benefit for compression is faster queries and the ability to scale up to even larger data sets, while the cost is an increase in CPU usage. You can see where this is going: it will only reinforce the use of quad-core CPU servers that have recently become widely available.