Expert Cube Development with Microsoft SQL Server 2008 Analysis Services

ExpertCubeDesign

Overview

I have had my eye on this book for awhile as I have been looking for a good analysis services book. After it was recommended to me a few times at the PASS Summit, and after looking at it I saw that Brian Knight also recommended it in the reviews on Amazon, I knew I had to grab it. So, when the book arrived on my doorstep it wasn’t very long before I had it open in front of me. I finished this book in a week (I started it 2011-10-30 and finished 2011-11-05) and can say that it was a relatively easy read. There were a few chapters I would say which were more in depth than others, but overall it felt very approachable.

Note that this is going to be a fairly lengthy review. This is as much because I am / was using this as notes from my read-through as anything else. I’m not sure if this is boring or not and will potentially make sure to prune future reviews lengths if this appears to long to people. Let me know what you think.

Preface

This short introduction went into what to expect from the book, some of the conventions used throughout the book to denote tips and tricks or extra resources and other information. It also discussed who would be best suited to read the book.

Chapter 01: Designing the Data Warehouse for Analysis Services

This chapter discussed some very basic knowledge of what a data warehouse is versus what a data mart is, some of the common interpretations of the two, and the importance of building up the database from which the cube will be based. A brief discussion on fact tables versus dimension; star schema versus snow-flakes; Junk / SCD / Degenerate dimension and bridge tables; Snapshot and transaction fact tables are all also covered.

Essentially, we should be building a data warehouse which will house all of the information, preferably in a star schema, which will then be used to feed the data marts which will conform to the cubes structure. All of the data necessary for the cube should be in this mart, because, although it is possible to pull data from different sources, it is inefficient. It is further mentioned that it is possible to build the cubes directly from the OLTP systems, although it is strongly suggested that this never be the route that is taken.

Finally, the author (s) make an argument to use views in the relational database as opposed to creating the views in the data source view.

This was a very strong introductory level chapter, if a bit low level for a book which claims to be for users that already have working knowledge of SSAS.

Chapter 02: Building Basic Dimensions and Cubes

This chapter went through the entry level information for putting together your first cube. We walk through the different editions of SSAS, the method for creating the data source and data source views, creating a few simple dimensions with a couple of simple user defined hierarchies and user friendly names, and then creating the cube by walking through the deployment and processing steps.

This again seems a little strange for a more advanced book, but they do sprinkle some good information throughout. Such as, it is often good to go ahead and deploy incrementally; build out your first dimension and then deploy to ensure that it is acting the way you think it should; set your project to deploy but not process; etc.

Chapter 03: Designing More Complex Dimensions

It seems almost like they didn’t know quite where to put some of the information in this chapter so they just broke it off. Perhaps a common theme? In any case, we go through the theory of Grouping and Banding to create artificial hierarchies or buckets and why it is that you probably want to do this in your ETL process rather than using the built in bucketing functions; next, the authors explain SCD type 1, 2 and 3 dimensions and how it is that they are dealt with, as well as some of their peculiarities; rigid aggregations versus flexible aggregations are explained; Junk dimensions are described next; finally, we take a look at ragged hierarchies (Parent / Child).

Of particular note in my opinion is the trick for SCD type 2 dimensions in which we expose the attributes as properties of the Business Key as opposed to the surrogate key where appropriate (Or even creating a composite key between the key and attribute).

Chapter 04: Measures and Measure Groups

This chapter begins with talk about some of the useful properties on the measures, such as the format strings (and their corresponding pitfalls) and the display folders. Next, we go into the aggregation types (Sum, Count, Distinct Count, None) and the semi-additive types (AverageOfChildren, FirstChild, LastChild, FirstNonEmpty, LastNonEmpty). There is an explanation on how the semi-additive aggregations are related to the time dimension and of how you might roll your own semi-additive aggregations.

After this, we look at the seldom used “By Account” feature which allows you to set the aggregation type by account type (Usually used in financial systems). Then we look at the unary operators and the weights as well as how you can apply custom member formulas. There is a comparison of Custom Member Formulas and MDX Script assignments. There is a look at non-aggregatable measures, creating multiple measure groups, creating measure groups from the dimension tables, handling different dimensionality and granularity. There is a discussion about using linked dimensions and measure groups (and why you should probably avoid them). We look at role-playing dimensions and whether we want to use different views for these or use the built in role-playing features. Finally, we look at how we define the relationships between dimensions and measure groups, including the bridge tables. There is a brief mention of data mining dimensions.

In my mind, this chapter starts to get into the slightly more advanced topics of the book. There is a LOT of really good information here. I am sure that this is a chapter I will look over several times as a reference for the topics at hand.

Chapter 05: Adding Transactional Data such as Invoice Line and Sales Reason

This chapter is really broken down into two subjects, drillthrough or related information for the measures and many-to-many relationships. I’m not quite sure how these two are related and it seems like this is another chapter where they didn’t quite know where to fit things in at.

Drillthrough actions are explained, along with the warning that these actions are presented to the client but that some clients don’t have the capability to deal with them (Excel 2003). It is noted that the drillthrough actions are really just a list of metadata which defines the columns that will be used to build the drillthrough statement to retrieve the data. There are some hacks presented for organizing the data from the drillthrough. Next we look a little at the drillthrough data model and whether we should use a transaction details dimension, whether we should use ROLAP dimensions to drill through (rule of thumb, don’t do it!) or whether we should use an alternate fact table that we point to in order to gather the information.

Next we look at the many-to-many dimension relationships and some basic use cases (notable, sales reasons). We take a look at how we can define the relationship between dimensions on an attribute other than the surrogate key to allow for mixing the granularities of these relationships. Finally, we look at some of the performance implications for using the many-to-many relationships.

Chapter 06: Adding Calculations to the Cube

The authors go out of their way to state that this is not a book you would use to learn MDX. For that there are other books (MDX Solutions, Fast track to MDX; Microsoft SQL Server 2008 MDX Step by Step). Instead, in this chapter we are first shown the different types of calculated members: Query Scoped Calculated Members (defined in the WITH clause of the MDX query), Session Scoped Calculated Members (client tool CREATE MEMBER statement) and Global Scoped Calculated Members (Created in the cube’s MDX Script). We find out that each of these has implications on where the output can be cached and reused and that we really should be striving to use Global Scoped Calculated Members to avoid duplication of work, better caching, etc.

After explaining the theory of the different types of calculated members, we are walked through lots of good stuff, from some simple scripts, to the methods of referencing cell values, aggregations, Year To Date functionality, Ratios over a Hierarchy, and comparing the current member to previous periods and to the same time last year. We then examine ranks and formatting calculated members before reaching my favorite section of the chapter, and one I’m sure I will be referencing again, Calculated Dimensions. Calculated dimensions are a method of providing a little more flexibility to calculated members by effectively allowing the calculated member to be applied to the current measure, regardless of which measure it is, by applying the calculations against this pseudo dimension. However, calculated dimensions are not without their own pitfalls and problems, which are also discussed. Finally, this chapter ends with a discussion of named sets and dynamic named sets.

This seems to me to be another of the more in depth chapters of the book. There is just loads of good information here that is really easy to get through. I am sure that I will return several times over to implement the calculated dimensions.

Chapter 07: Adding Currency Conversion

This chapter goes over the various forms of currency conversion and how each might be implemented as well as asking important questions relating to conversion rates and how they would be applied. This depends in large part on the type of currency conversion necessary. There are a few different varieties of this problem: multiple currencies per measurement to be represented as one currency; One currency per measurement to be represented by multiple currencies; and finally multiple currencies per measurement represented as multiple currencies. We walk through the various steps you would use to work with each of the situations discussed above in the “Add Business Intelligence Wizard”. Finally, the direct slice property is mentioned, along with the advice not to use it, as well as an introduction to writeback functionality.

All in all I would say that this is a very good primer for dealing with currency conversion. It is easy to follow and gives lots of good advice.

Chapter 08: Query Performance Tuning

This chapter goes over some useful information on things you can do to make sure that your cube meets your performance needs. There is a quick blurb about the two major components of Analysis Services that are used when processing a query – The Formula Engine and The Storage Engine – before the chapter dives into the meat of the subject with information on partitioning your measure groups and building aggregations (particularly with usage based optimizations for the aggregations). We then take a look at how you can begin to tune your MDX calculations and queries (Make sure to do block computation vs cell based calculations if possible), and how you can use the cache to your advantage. Finally, the chapter looks at some scale-out options for SSAS.

I don’t know how I felt about the chapter. It seemed like there was some good information here, but that at the same time it fell a little flat. I think that may have as much to do with the breadth of the subject as anything else. Furthermore, the authors DID state that it should be read in conjunction with “The Analysis Services 2008 Performance Guide” whitepaper. Really, there were a lot of links sprinkled this chapter to other whitepapers, blogs and books for more information on various subjects. So, I suppose that if the rest of this information is taken into account this chapter is really, really good. I just now need to take the time to go read everything they told me to here.

Chapter 09: Securing the Cube

Security is usually the pink elephant in the room when discussing a tool like Analysis Services, which is meant to aggregate and then expose as much information as possible to enable users to properly access and study data for trends and areas of improvements. This chapter does a good job at driving home the fact that Analysis Services cubes will do MUCH better if we think about security from the design process forward, instead of as an after thought. The chapter discusses roles, membership, securable objects, data security both fixed and dynamic, visual totals, the problems with dynamic security and parent / child ragged hierarchies, and how you might access Analysis Services from outside of the domain. Furthermore, it is mentioned – briefly but importantly – how using some of these security features can impact your cube’s performance. Overall this is a very important read BEFORE you start to build your cube if at all possible.

Chapter 10: Productionization

This chapter picks up some of the odds and ends for handling your cube once it has exited the development portion of the lifecycle and has moved on into maintenance. The first topic of discussion, although brief, is that of how changes are made to a cube once it has already gone into production. The next topic, which goes into some length, relates to how you can manage partitions and is complete with examples. Afterwards, there is a very well put together primer on processing – also with examples of processing and the impacts of each type. This is followed by some basic information on proactive caching, backups and synchronization between Analysis Services instances.

Chapter 11: Monitoring Cube Performance and Usage

Some basic information is imparted on the user relating to the SSAS service and how you can collect data from such items as the services pane, task manager, performance monitor, sql profiler, and resource manager. We get a discussion focusing on memory and how it is that SSAS uses it, as well as how the machine may be interacting with other programs or processes that may also need memory. We also get a similar treatment for basic information on I/O. There is a VERY basic and generic framework for looking at monitoring the processing with performance monitor counters, the DMVs and trace data. After this brief treatment we get an even briefer blurb on monitoring query performance and then also on monitoring usage through the same tools.

Again, I’m not sure exactly how I felt about the chapter; but again, there were a lot of links to other papers, blogs and in this case tools or tool suites. I believe that the links to some of these tool suites will make all of the difference in the world for a place to start on monitoring the SSAS instances. So, although the chapter itself may have left some to be desired, I think it will have pointed me in exactly the direction I needed to go. And that may very well have been worth the price of the book on its’ own.

Advertisements
This entry was posted in Books, Data Warehousing and tagged , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s