The Baker’s Dozen: 13 Productivity Tips for Building OLAP Databases with Microsoft SQL Server 2008 Analysis Services
A rewarding experience of writing and speaking is taking a seemingly complex topic and making it more understandable and accessible. In this article, I’ll show how to create and use OLAP databases and cubes using SQL Server Analysis Services 2008 (SSAS 2008). The benefits of OLAP are significant, even monumental - but like most technologies, reaping the benefits means considerable research and effort into leveraging the tools. In the case of OLAP databases, developers need to learn the differences between OLAP databases and relational databases, and how to use the tools that SSAS provides. In this article, I’ll walk through how to create an OLAP database and how to use the tools in Analysis Services to enhance the OLAP database. By the end, you’ll see why businesses and other organizations see the value of OLAP databases. I’ll also briefly cover some of the plans Microsoft has announced for the next generation of OLAP tools.
OLAP and Analysis Services
One of my primary objectives in this article is to “demystify” SSAS and OLAP database technology. I teach SSAS, OLAP, and MDX technologies to hundreds of students, and I understand the complexities of the learning curve with these tools.
In learning all the tools of the Microsoft Business Intelligence platform (also called the “BI stack”), I make the following analogy: that if learning the BI stack were like medical school, SSIS is like learning the bones of the human body and learning SSAS is like learning the brain and central nervous system. SSAS is like the central nervous system of business intelligence applications that use Microsoft OLAP.
Every year, I run into developers at Code Camp and other events who tell me that their employer is just now moving to OLAP. OLAP is a major paradigm shift for those who have worked for years with OLTP relational databases. So the need for walkthroughs that provide the right balance of theory and practice is essential. Too much theory and I’ll sound like a boring college professor - too much “click here, do this, do that” and it’s merely learning by rote.
What’s on the Menu?
In Sergeant Joe Friday-style (i.e., “the facts, just the facts”), here are the 13 items on the menu for this article:
- Defining a data source for an OLAP database
- Identifying fact table and dimension relationships (direct, referenced, many-to-many)
- Using the Cube Wizard to create an OLAP cube
- Using the dimension editor to create a clear presentation of business entities for end users
- Modifying fact table/dimension relationships
- Creating OLAP partitions to physically segregate active data that users query most frequently
- Creating OLAP aggregations to optimize data retrieval
- Creating OLAP calculated members with MDX
- Building graphical KPIs (Key Performance Indicators) using OLAP data, for view in different output tools that read OLAP information
- Using XMLA script to perform certain processing actions against OLAP databases
- Understanding the differences between MOLAP and ROLAP
- Implementing SSAS report actions to access reports against underlying transactional data.
- Implementing data security using SSAS roles
Some Basic OLAP Definitions
I’d like to provide some basic definitions for OLAP technology with Analysis Services. OLAP is a big culture shift for relational database experts. Those who know the old joke about two guys struggling to change a tire and acknowledge, “this stuff isn’t rocket science” will appreciate what I tell my student in my OLAP classes: that there are rocket scientists walking around saying, “you know, this rocket science stuff isn’t exactly OLAP technology.” So I’ll provide some definitions, in a narrative-format that follows this flow of the examples in this article.
I can use Visual Studio to create an SSAS project, which I’ll ultimately deploy to an Analysis Services OLAP database. I can deploy the OLAP database to a particular SSAS Server (localhost, production server, etc.). The source of the OLAP database is usually some type of SQL Server relational database, which could be normalized or de-normalized. In Tip 1, I’ll cover the process of defining the data source for an OLAP project.
The OLAP database usually contains one or more OLAP cubes. Figure 2 shows an example of an OLAP cube. Generally speaking, a cube is a geometric object with many sides (dimensions). In the case of databases, an OLAP cube contains fact tables and business dimensions.
Figure 2: OLAP overview (need to indicate the two different database engines).
OK, so what are fact tables and dimensions? Fact tables contain facts (“measures”), such as revenue, costs, spending, quantity sold, etc. Dimensions are the business entities that provide context for the facts; loosely-speaking, they are the master files from an OLTP system, such as an account master, a product master, date master, cost center master, etc. Virtually every OLAP cube will have a date dimension - it is almost a universal dimension. (Imagine an OLAP cube where users DIDN’T want to slice numbers by year, quarter, week ending date, etc.)
The cube automatically ties (relates) the facts to the dimensions, regardless of whether the relationships are direct, referenced (indirect) or even many-to-many. In Tips 2 - 4, I’ll cover these issues.
Internally, the OLAP cubes contain bitmap indexes that (loosely speaking) “auto-join” facts and dimensions so that end users can “slice and dice” measures by business dimensions, without needing to write any code.
At this point, it’s VERY important to understand that each fact table has a statement of granularity - in other words, the dimension level at which the facts are stored. So an OLAP cube might have one fact table built at the reseller account/product/date level, and a budget fact table at the employee and quarter level. (To give you some historical appreciation - Analysis Services 2000 only permitted one fact table pure OLAP cube.)
Each fact table has a statement of granularity - in other words, the dimension level at which the facts are stored. So an OLAP cube might have one fact table built at the reseller account/product/date level, and a budget fact table at the employee and quarter level.
In Tips 2 - 5, I’ll also cover several dimension components and how they relate to fact tables. Each dimension contains attributes, which are the equivalent of relational table columns. Loosely speaking, attributes are the “by” in business user language (e.g. “I want to look at sales by product color, by geography market, etc.”).
Attributes can form parent-child relationships, such as markets rolling up to regions, regions rolling up to zones, products rolling up to categories, months rolling up to years, etc. These relationships are known as hierarchies, and help end users to traverse these parent-child relationships through rollup and drilldown activities. It should be noted that not all attributes form parent-child hierarchies - some attributes (product color, for instance) might not be related to any other dimension attribute.
When fact tables get very large, I can split each cube fact table into multiple OLAP partitions. I can build my own partitioning scheme, such as creating an active partition for the last full year of data and one or more archived partitions for prior years of data. In Tip 6, I’ll show how to create OLAP partitions.
Additionally, I can also build OLAP aggregations inside each OLAP fact table partition. OLAP aggregations are sometimes one of the more misunderstood features of Analysis Services. Because OLAP users are able to roll up data in OLAP cubes very quickly (e.g. roll up sales from day to year), many assume that OLAP cubes somehow store every possible combination of subtotals for each measure.
This assumption is false - by default, SSAS OLAP databases do not store calculated subtotals. Instead, the Analysis Services engine automatically performs “on-the-fly” aggregations when users want to roll up measures, such as sales by country, brand, and year. Even critics of Microsoft products acknowledge that Analysis Services performs on-the-fly aggregations very rapidly. However, at some point, a fact table partition will still grow in size to the point where pre-stored calculations at higher dimension levels will help with performance. So Analysis Services provides aggregation functionality that allows OLAP developers to see what queries have been run by users (and how long they took), and also to build aggregations based on specific dimension attributes. I’ll look at aggregations in Tip 7.
The programming language to query OLAP databases is MDX (Multidimensional Expressions). MDX initially appears similar to T-SQL - but is actually very different. Suppose I want to see sales summarized by month, along with sales for the same time period a year ago. I can write an MDX query, or I can use one of the OLAP query designers in an Excel OLAP PivotTable, SSRS report, or a PerformancePoint Services dashboard to generate the MDX code. One could write a book on MDX, but I’ll at least cover some MDX basics in Tip 8.
Another important topic in business intelligence applications is KPIs, which stands for Key Performance Indicators. KPIs are business performance metrics (% of sales goal, % damages, % returns, % shortages, % good customer reviews) that developers can render as graphic visualizations (a green light or smiley face or similar image for a “good” status, and a red light for a “bad” status). Some KPIs are “trend-based” - for instance, a company might have a sales goal to increase revenue by at least 5% every year. Developers often write KPIs in MDX. I’ll cover some KPI examples in Tip 9. Note that some executives view KPIs as one of the most important components of an OLAP/Business Intelligence solution.
When OLAP developers are ascending the SSAS curve (and sometimes believing along the way that rocket science is easier!), they often wonder how to programmatically update OLAP cubes. In the course of a week, the business may have ten new accounts, three new products, and a week’s worth of sales. While the easy approach might be to load the SSAS Project in Visual Studio and reprocess/redeploy the OLAP database from the source data, this approach will be very impractical on an on-going basis. And so in Tip 10, I’ll cover some SSAS processing tasks using XMLA script that developers can write in Integration Services, to programmatically (and incrementally) update OLAP fact/dimension contents.
Another topic in the world of OLAP is storage persistence modes. There are three: MOLAP, ROLAP and HOLAP. Tip 11 will cover MOLAP and ROLAP. This topic is very difficult to cover in a few sentences without delving deep into specifics, but for now, here are two pieces of information. First, in MOLAP, data from a relational data store is physically synthesized into an OLAP cube structure. By contrast, in ROLAP, the cube is basically a virtual definition, which means MDX queries to retrieve OLAP data are passed back to the relational source (as SQL queries). HOLAP is a hybrid approach. Second, the vast majority of OLAP implementations use MOLAP (which is the default storage mode). Conservative estimates are that 80% of OLAP installations use MOLAP, while other estimates place it at over 90%.
Database teams sometimes face a challenge on whether to store transaction level data into an OLAP cube. OLAP cubes are traditionally for analysis at higher business dimension levels, where the need to drill down to transaction details is only necessary when one sees some business anomaly. In this instance, SSAS developers can build drill-out report actions from the cube, to “hook” into SSRS reports that might exist in an OLTP or Data Warehouse situation. This allows OLAP developers to extend the analytic path from an OLAP cube to an external report, and provide the end-user with one-click access to an external SSRS report while browsing an OLAP cube in Excel. In Tip 12, I’ll cover an example of this outstanding (but sometimes underutilized) feature.
Finally, in many corporate environments, associating users with segments of the database is extremely critical. A brand manager should only see data for his/her brand, a region manager might only have access to specific accounts, etc. In SQL Server relational databases, developers need to build a variety of solutions such as creating authorization tables that link end users to specific data, creating database views, and/or adding logic to stored procedures. While the relational database engine never “stands in your way” of doing this, it also doesn’t provide anyone with out-of-the-box capabilities for associating users with specific dimension rows. Fortunately, in OLAP databases, we can build SSAS database roles into OLAP cubes. This feature allows developers to associate Windows users (or user groups) directly with specific dimension data - the role feature itself leverages how SSAS exposes dimension data in a way that might remind .NET developers of typed datasets. In Tip 13 I’ll show how SSAS roles work.
So that’s an overview of OLAP terminology and an overview of how I intend to cover these terms in this article. But there’s still one important question - why is OLAP so important? Can’t we simply provide OLAP-style functionality without going through the steps of creating OLAP databases using SSAS?
By: Kevin S Goff
Kevin S. Goff, a Microsoft SQL Server MVP, has been writing “The Baker’s Dozen” productivity series in CoDe Magazine since 2004. Kevin hosts a weekly webcast on SQL Server and Business Intelligence topics, at http://www.BDBIRadio.com.
Kevin has been an applications developer, architect, and technology mentor since 1987. Kevin has worked in many lines of business, including insurance, manufacturing, health care, consumer packaged goods, accounting and finance, advertising, and many others. Along the way, Kevin has won several awards in both the public and private sector for applications development. Additionally, Kevin has taught SQL Server, Business Intelligence, and Data Warehousing classes to hundreds of students over a period of five years and has built numerous custom courseware modules along the way.
Kevin has authored one book on reporting development and contributed chapters on MDX programming to a second book. He is currently developing a set of commercial video training courses for SQL Server and BI topics. He has been a SQL Server MVP since 2010 and was previously a .NET/C# MVP from 2005 to 2009.
Kevin is a frequent speaker at User Group and other community events (SQL Saturday, SharePoint Saturday, Code Camp) in the Mid-Atlantic region, and speaks occasionally at conferences. In 2012 Kevin worked with Microsoft TechNet Radio on a 13 week webcast series on new features in SQL Server 2012.
For more information, check out Kevin’s main site at http://www.KevinSGoff.net, or email him at firstname.lastname@example.org