Book Review: SQL Server Hardware

Overview

imageFirst off, I really enjoyed this book. I have been following Glenn Berry’s posts for a few years now and when I heard that he was finally going to write up his musings on the aspects of hardware as it relates to SQL Server I was excited. The book is a great introduction to the subject and consolidates a lot of the information I have been receiving piecemeal from Glenn’s blogs for the past several years. I would highly recommend this to a junior or mid level DBA who is looking to enhance their knowledge of hardware and how the choices when you are building out your server can effect your SQL Instance.

Chapter 01: Processors and Associated Hardware

The beginning chapter goes into detail on how you might evaluate processors by looking at the cache sizes, clock speeds, hyper threading, sockets, cores, logical cores, etc. We get some experience talking about the various generations of both intel and AMD processors as well as a look at evaluating motherboards. Although the main theme of the book seemed to be bigger better faster, this chapter went out of the way to explain that a newer two socket machine can easily outperform an older four socket or larger machine for drastically cheaper licensing costs.

Chapter 02: The Storage Subsystem

This chapter goes over the other major component of your server: the disk subsystem. Here we look at various types of disks, how these are attached or communicated with by the server, and how they might be arranged in RAID arrays. The chapter does a decent job of tying it all together and relaying the correct information to help out the junior or mid-level DBA who has not had to learn this information in the past.

Chapter 03: Benchmarking Tools

This chapter introduces us to a lot of really good tools for understanding how it is that your hardware is performing. We first look a little at some of the standard application benchmarks with the TPC-C, TPC-E and TPC-H benchmarks. After that we investigate other tools like SPEC, Geekbench, HDTune, CrystalDiskMark, SQLIO, SQLStress and SQLSim. These give us some general metrics where we can compare the CPU, Memory, Disk subsystem and overall application scores. None of the tools are investigated in great depth, but it is a really good place to look to discover tools which might greatly help you out.

Chapter 04: Hardware Discovery

We take a brief look at some other tools – CPU-Z, MSINFO32, The Task Manager, and the properties dialog – which are used to investigate the type of hardware on which you are currently running.

Chapter 05: Operating System Selection and Configuration

The chapter investigated the differences between Windows 2003, 2003 R2, 2008 and 2008 R2. There is some discourse on support, power savings modes, configurations, and service packs. All in all interesting material, that boils down to wanting the latest and greatest.

Chapter 06: SQL Server Version and Edition Selection

The chapter goes through versions of SQL Server from 2005 to 2008 R2 (with a few hints of what is to come in Denali (a.k.a. SQL 2012)) along with what the improvements and differences there were between versions and editions. I thought it did a really good job of walking through these differences as well as crafting a relatively good argument for why you want the latest and greatest.

Chapter 07: SQL Server Installation and Configuration

This chapter was quick to read as it included a tutorial with lots of good screenshots for walking through an installation which is probably a pretty good start for those users without much experience installing SQL Server. It also included a pre-installation checklist and some hints on settings you might want to change post installation which are good for users of all levels of experience.

Appendix A: Intel and AMD Processors and Chipsets

I really enjoyed this appendix. Not so much that it is all that terribly helpful, but rather that it is interesting to see the walk down memory lane of somewhat recent processors and how it is that they have evolved along with approximate dates of release. I’m not sure how terribly helpful this would be for helping to determine what processor and chipset it is that you would want to purchase, but I don’t think that this was the idea of the appendix anyway.

Appendix B: Installing a SQL Server 2008 R2 Cumulative Update

An appendix which walks through the steps you would need to find, download and install a CU. This is relatively verbose and probably very good instructions for a beginner.

Appendix C: Abbreviations

Two or three pages listing out some abbreviations used in the industry. Might be helpful for a beginner who is trying to learn the alphabet soup.

Posted in Books, SQL | Tagged | Leave a comment

Pool League Project: Post 1 Introduction

Overview

Our pool league generally plays once a week (we miss some weeks for holidays and events).  To this point there have been 9 “seasons”, only the past two or three of which I have been a part.  However, we have the records going back to the beginning.

My intent is to go back to the first session and enter the information into a database, make it easy to enter new information (probably through excel), and then report off of this information both at the relational level and as an analysis services cube.

Pool League Description

In the league, we have had between four and five teams per session with four players per team per match.  It should be noted that players have been known to switch teams between sessions, and to fill in for other players if their team has a “bye” week.  Furthermore, teams can have more than four players in a single session, but only four (or less) will be active in a particular week.

During each match, two teams will play each other with each person competing in 8 games; two games against each of the four opponents.  It is not unheard of for players to have to “double up” if the team does not have all four players so it is possible for one person to play double the normal amount.  Sometimes, although very infrequently, two players will play twice per team.  If a player would have to play all four matches the game is a forfeit.

The league is played as a “handicap” league where the average number of victories per eight games of each player is used to determine the results of which team “should” win and the other team is given a number which is equal to the difference between the two in an effort to help even things out.   If a player has not played at least two games, they have no handicap and the other team is able to remove one of their players handicaps (to this point, it has always been the top handicap).  During each game of the match it is tracked who broke, who won, and whether or not the nine ball was sunk on the break (an automatic win).  At the end of the match, the games won are totaled, along with the handicap, and a winner is decided.  If there is a tie, the team with no handicap wins.  If there was no handicap, the team that won the most rounds (sets of games between opponents) wins.  If the rounds were all ties, or if each team won one round and tied the other two, the game is a tie.

The players handicaps are held in an excel spreadsheet where we keep track of the total number of games played and the total number of victories.  The spreadsheet does not currently have the details to get to these aggregate numbers.

Example

Example: “The Fuglies” beating the mighty “Trololo”.

Example Match Score Sheet: “The Fuglies” beat the mighty “Trololo”.

As you can see, the handicaps meant that this game should have been an even game (15 to 15).  However, as there are 32 games per match, both teams were slated to win less than one half of the games (15 vs 16).  So, even if we both met their handicaps, someone would go over and win (32 – 30 = 2 games over handicaps).  This shows that, in essence, the handicaps are just used to get a relative measure of the skill of each team.  No nines were sunk on the break (would put an asterisk next to the 1 in the sheet if that were the case).

Posted in Data Warehousing, SQL | Tagged , , | Leave a comment

Book Review–Database Benchmarking

Database Benchmarking by Mike Ault, Kevin Kline, Dr. Bert Scalzo, Claudia Fernandez, Donald K. Burleson

Overview

image

I will give this book one thing, it was relatively easy to read.  I finished it in just a few days while working on a request at work to help determine standards going forward for benchmarking databases to do performance testing prior to major code changes or new implementations. 

In essence, I’ve been told if you don’t have anything good to say, don’t say anything at all.  So, I’ll try to find something good to say…

The book did remind me about the TPC-C and TPC-H database standards that are often used for benchmarking.  It caused me to look up other blog posts that introduced me to Hammerora which appears to be a decent tool that I’ll be using to produce database load while testing monitoring software.  The book introduced the readers to the concepts of benchmarking and had a few measurements that I would consider reasonable and good for tracking.  It told the user that tracking measurements over time is good and showed various examples of graphs showing the trending of that information. It even had some pretty pictures and silly cartoons.

With that said (See I said something nice!), the book should have been labeled as “Database Benchmarking with Quest Benchmark Factory”.  It felt like a marketing book for this software instead of a good book for theory.  The writing was inconsistent and it felt like the authors didn’t really talk (and the editor missed big time) in that several of the concepts were very briefly explained, again and again, in each chapter.  There was only a minor portion of one chapter which mentioned how you would actually benchmark a non-standardized database.  This caused the book to feel like database benchmarking was really only applicable for testing new hardware or new RDBMS software.  And that is not at all correct.

My advice would be to find another source of information for this topic.

A big thanks to Kevin Kline whom I got this autographed copy from at one of the SQL Saturdays a year or so back.  Even though it wasn’t what I was hoping for it was good to have. 

I read this book 2011-12-31 to 2012-01-02.

Posted in Books | Tagged , | Leave a comment

2012–Goals

A Look Forward

This year I think that my goals will be focused a little differently, probably a little more on home life with some of the events I have planned for this year. Some of these will be boring as they will sound a lot like what I did last year. However, accomplishing the same thing isn’t necessarily a bad thing if what you are doing is enriching your life.  None of these goals will list things like playing games with my friends, getting better at pool for my nine ball league or other things although I guess I could list those.  Instead I’ll try to focus things on what I feel will be important to accomplish.  I feel like this list might be a little more ambitious than last years, but I’ll attempt to revisit it every quarter and readjust if other things come up or to tighten it down to what is really important if things are slipping.

Finances

  • I want to buy a house and ensure that I am able to be comfortable and happy there. I don’t want to put myself in debt so far that I won’t be able to spend money to help fix it up or to put too much pressure in other aspects of life.
  • I want to celebrate paying off my student loans. Paying off the loans isn’t really a goal as it will happen this summer one way or the other. But, it is a major milestone and one that I want to look back on.
  • I’d like to end the year with at least three months of expenses in savings

Exercise and Health

I would like to get into shape. I’m not 100% positive how I do this as it often seems that I fall of the wagon after awhile.  So, here are some things that I can actually measure.  The last one should ensure that I am at least reasonably in shape with the others helping me get there.

  • Run at least 200 miles.
  • Total at least 500 miles between running and walking.
  • Go on at least six hikes of ten or more miles
  • I would like to run a 5k in under 30 minutes (this doesn’t have to be an actual race, but if it is all the better).
  • I would like to be back in the “normal BMI” range, which according to the CDC BMI Calculation is somewhere between 140 and 144 to 189 and 194 (I’m between 6’1 and 6’2). I’ll shoot for 189 or lower. In reality I’d rather be sitting at 180.

Technical Learning

  • I’d like to read another six or more technical books.
    • I’d like to again make sure that I read at least fifteen non-technical books.
  • I’d like to attend at least three SQL Saturdays.
    • I would enjoy going to Pass Summit again; however, that probably won’t happen this year unless if I’m speaking.
  • I’d like to watch at least twenty six of the sessions from last years Pass Summit that I didn’t get to see while I am there (I just got the DVDs in a few days ago).
  • I’d like to get one more of the SQL 2008 MCITP certifications.  Right now I am leaning towards the 70-452 Certification. I’m theoretically getting back into the Business Intelligence role at work so going for this certification makes sense again.
  • SQL 2012 Certification: There are going to be at least seven tests for SQL 2012. I’m not certain how they will all work and what will qualify people as a Technical Specialist versus a Technical Professional or if they are even making that distinction any more. I believe that everyone will have to take the Querying Microsoft SQL Server certification. So, I would like to take at least this certification. In reality I would like to complete at least one other, but with the 70-452 I’m not sure how realistic that will or won’t be.

Technical Teaching

  • I would like to write up at least one new presentation.
  • I’d like to speak at three or more SQL Saturdays (I’m not sure which ones as those requiring out of town travel might get cut back some due to budget and time constraints).
  • I’d like to Speak at JSSUG again.
  • I’d like to write technical blog posts at least twice a month. I don’t want these to be book reviews. And, I’d like to keep these more regular instead of having a lot some months and not many others.

Community Involvement

This is another hard one to write up. I used to think of community involvement more as just helping people out. But, I think that my mind is changing on this one. It is just as important to me to attend the mixers at conferences and user group meetings, interact with peers online and all in all have good relationships with people who can help me out and whom I can help out in a pinch. However, I really think that the goals for community involvement will be hard to measure. These are more like your typical resolutions… So, I guess I have a resolution to continue to interact with people via twitter, G+, Google Hangouts, the Midnight DBA show, etc and feel like I have a connection with other members of the community.

Posted in Uncategorized | Tagged | 2 Comments

2011–A look back

A Look Back

As has been my habit for the past few years, I wrote out goals at the beginning of 2011. I try to no longer make resolutions, as I am not sure that resolutions are really a good thing. Goals I can work towards. I can track progress. I feel like I accomplish something when I make the goal; and that is important, at least to me.

I feel like this year was a good one.  I was able to keep track of a lot of my goals, and I feel like both my professional life and my home life are in good shape.  Both could probably use a tweak here or there, but on the whole I’m very happy.  As I’m writing this, I’ve been spending time with my family and will continue to do so as we bring in the new year. 

2011 Goals

Goal 1 – Exercise

This goal was to exercise at least three times a week.  This did not happen.  There were a few different times where I was able to do this far more than the goal over the course of a month or more, but on the whole this is the only goal I feel that I failed at (even if it is possible that it averaged out to a half hour three times a week, which I doubt).

Goal 2 – Read Novels

I was able to meet my goal of reading fifteen or more non-technical books.  I enjoy reading for pleasure.  I think it calms me and it jump starts my imagination. 

This year, I read: The Water Method Man (John Irving); The 158 Pound Marriage (John Irving); Home Fires (Gene Wolfe); The Girl with the Dragon Tattoo (Stieg Larsson); The Girl Who Played with Fire (Stieg Larsson); The Girl Who Kicked the Hornets Nest (Stieg Larsson); A Game of Thrones (George R R Martin – A Reread); A Clash of Kings (George R R Martin – A Reread); A Storm of Swords (George R R Martin – A Reread); A Feast For Crows (George R R Martin – A Reread); A Dance with Dragons (George R R Martin); The Island of Doctor Death and Other Stories and Other Stories (Gene Wolfe); When Gravity Falls (George Effinger); A Fire in the Sun (George Effinger); The Exile Kiss (George Effinger); The Fourth Hand (John Irving); The Fire in His Hands (Glen Cook – A Reread); With Mercy Towards None (Glen Cook – A Reread).  I also read several graphic novels, which some might count and others not – Walking Dead TPB 9 – 14; 30 Days of Night TPB 1 – 3; Spawn Collection Volume 1.

Goal 3 – Save Money

The goal was to save three months worth of living expenses.  I’m relatively certain that I have done so.  I did not keep track of my budget as closely as I could have, but I feel like I have a decent hang of things.  With that said, I think that I’ll be resetting this back towards zero shortly as I expect we will probably be attempting to buy a house soon.

Goal 4 – Training. 

I feel like I potentially overachieved on this one.  I was able to read finish my 70-451 blog post series and pass the MCITP test that accompanies it.  I have received good feedback from several people saying that it was helpful to them.  I was also able to complete watching all of the MCM videos.  I was able to attend several conferences from SQL Saturdays to SQL Rally to Pass Summit.  I attended every user group meeting.  And, finally, I was able to complete my goal of reading at least six tech books.

In 2008 I read the following tech books: SQL 2008 Internals; Guide to SQL Server Team-Based Development; SQL Anti-Patterns; DBA Survivor; Performance Tuning With SQL Server Dynamic Management Views; SQL Server Statistics; Data Warehouse Lifecycle Toolkit – 2nd ed.; Expert Cube Development; Troubleshooting SQL Server: A Guide for the Accidental DBA. 

Goal 5 – Community Involvement. 

I feel like I partially completed this goal.  With that said, I didn’t have a good idea of what the goal meant.  I did present at SQL Saturday Tampa, Orlando, Pensacola, Jacksonville and Atlanta.  I also presented for JSSUG and at SQL Rally.  I attended all of the user group meetings and the mixers for the user group meetings.  I answered questions here and there on twitter, although not much at all.  But I do feel like I talk back and forth with people in the community a lot more these days.  I don’t know that I logged in to the MSDN forums to answer a question at all.  I also tried to do as much as I could at work helping out my teammates and other co-workers.  And, finally, I was able to keep up with the blog for the first half of the year pretty well, although I feel like the second half I have been overlooking it the majority of the time.

Goal 6 – Don’t Overdue it

I don’t really feel like I burned out too bad at any point in time this year.  And, as I mentioned above I feel like my home life is going pretty good.  So, I consider that a success.  I do feel like the months where I pay attention to my hobbies I don’t do as much with the technical community, but that is ok.

Posted in General Ramblings | Tagged | Leave a comment

Troubleshooting SQL Server: A Guide for the Accidental DBA

Overview

imageFrom the moment I heard that Jonathan Kehayias and Ted Krueger were writing a book together (and that Gail Shaw was going to be doing the Technical Editing) I knew that I would be grabbing it as soon as possible.  As it happens, they released this one at the Pass Summit, so I was able to talk to all three after its’ release (albeit it briefly with Jonathan and Gail).  It did, however, allow me to act quite silly and run up to Ted like a gushing fan-boy asking for his autograph (this was based on advice of Jes Borland-Shultz); his response was great in that he turned bright red and started to groan and ask “are you kidding me?”.  Quite fun.

All kidding aside though, this book was a good primer on some of the more important topics that a DBA will face.  I have already started to recommend it to some of my co-workers and to the local user group.  One of the only things I have to nit-pick about the book is the title.  I don’t know that I would have put in “Accidental DBA” as this will surely help out lots of people who have willingly taken on the title, and it might cause some people who would get lots of good info here to steer clear.

Anther thing to note about the book is that you can grab an electronic copy for free! (http://www.simple-talk.com/books/sql-books/troubleshooting-sql-server-a-guide-for-the-accidental-dba/).  Also, note the zip location for the corresponding code (http://www.simple-talk.com/RedGateBooks/JonathanKehayias/TroubleshootingSQLServer_Code.zip )

Chapter 1: A Performance Troubleshooting Methodology

In this chapter the reader is exposed to a series of steps that can be used to drill down into what problems the instance might currently be experiencing.  This starts by looking at a combination of the wait statistics, the virtual file statistics, several performance counters and the plan cache to get a feel for the state of the instance.  All of these areas have an explanation as to what the results will signify, and, more importantly, where we might dig in a little deeper to expose the root of the problems.

Chapter 2: Disk I/O Configuration

In this chapter, the authors present lots of great background information on storage.  There is a very good section presenting the basic information on the different RAID levels, the differences between direct attached storage and SAN, as well as some basic SAN terminology.  We are further exposed to some basic information on disk speeds, seek time, Random versus Sequential reads and writes, the patterns for data files and log files (as well as the special patterns for tempdb) and the concept of IOPS.  In addition, the authors mention a few different tools that might be helpful in measuring I/O Performance.  Finally, the chapter concludes with a few sections on diagnosing disk I/O issues and some common disk performance problems. 

Chapter 3: High CPU Utilization

Ted does a good job at giving us some quick hints for checking on high CPU utilization such as using Performance Monitor, capturing trace information and examining the dynamic management views.  However, the real meat of the chapter is the extensive list of topics that are covered which might be the cause of high CPU: Missing Indexes, Outdated Statistics, Search Argument usability, Implicit conversions, Parameter sniffing, non-parameterized queries, inappropriate parallelism, and Windows / BIOS Power saving mode.  In addition there are a lot of great links to get more background information on these and other items. 

Chapter 4: Memory Management

Jonathan goes into a good amount of detail explaining the concepts of how the SQL Engine allocates memory and more importantly how it deals with memory pressure.  He then goes into insane detail on the differences in 32 bit memory versus 64 bit memory and what the implications of each entail.  In fact, I would have to say that he perhaps goes a bit too far into the 32 bit memory as I hope that most people these days have left their 32 bit processors behind sometime in the mid 2000’s. 

After this explanation we are shown a few pointers for memory relating to trace flags and operating system settings before diving into various performance counters and DMVs that will  be useful in diagnosing memory problems.  Finally, we are shown several common memory related problems as well as their root causes (or it is noted that the symptoms reported are not really a problem).

Chapter 5: Missing Indexes

The chapter starts out with some basic information on indexes, key column ordering and the use of include columns before moving on to the tools that one might find useful in identifying indexes which will help the workload of the server.  This includes information on running traces, using the Data Tuning Advisor, looking at the missing index DMVs and examining query plans and the plan cache.  With that said, this chapter should have perhaps just been named “Indexes” as in addition to lots of great information on determining which indexes may be missing, we are also shown how we can identify where we have indexes which might not be necessary. 

Chapter 6: Blocking

The second of Ted’s chapters is a good introduction to some of the elements that every DBA will need to learn about: Isolation levels, locking and blocking.  After we have gone through these basics we are next shown how we can monitor for blocking within the database.  We are first shown how we might accomplish this using backwards compatible processes for SQL 2000 before moving on to the dynamic management views (do we sense a pattern with the DMVs here?  Yeah, they are extremely helpful in administering your systems…) before once again moving on to performance monitor.  Next we see some automated methods for capturing this information from using trace flags, to traces to event notifications and extended events. 

Finally, we get to the heart of the matter: How do we resolve these issues? Unfortunately, this book must succumb, like all of the others, to the fact that there are no silver bullets. Instead, we come to the understanding that resolving these issues will more than likely take a combination of many items: from database design (or redesign), to index tuning, to watching our isolation levels, and using query hints.  And, since every database is built just a little bit different on a different hardware configuration with differing front end applications, your experiences at tuning them will be different every time.  Which is part of the fun of being a DBA, right?

Chapter 7: Handling Deadlocks

This chapter builds on the previous chapter relating to blocking particularly in light of the fact that a deadlock is really just a special form of blocking.  As such, a lot of the techniques for detecting and dealing with these issues are going to be the same.  However, we do get a good explanation of what deadlocks are along with some great explanations of the methods that we use to gather information which is specific to deadlocks, in particular obtaining the deadlock graphs (and how to read them).  Finally, we look at some of the common causes of deadlocks and how we can alleviate them.

One of the key takeaways from this chapter, and possibly from this book, is in the source code that is available as a download for the book: The event notifications that can be used to capture deadlock information.  This is similar to one of the sessions that I have seen Jonathan give at an SQL Saturday event a few years back, which itself is probably worth your time to look over the slides and code (http://sqlsaturday.com/viewsession.aspx?sat=32&sessionid=1069). 

Chapter 8: Large or Full Transaction Log

This chapter once again starts with the basics; this time we learn about the transaction log, Virtual Log Files, sizing and growing of your log files, and understanding log truncation.  We look into why the transaction log might be growing, from rebuilding or reorganizing your logs, to long running and uncommitted transactions. Finally we look at how we can handle the “transaction log full” errors, and, perhaps more importantly, how we SHOULDN’T handle these errors.

Chapter 9: Truncated Tables, Dropped Objects and Other Accidents Waiting to Happen

This chapter walks through a few different methods of restoring databases using marked transactions, point in time and full along with some of the various options that the restore process can use to aid in examining the contents of the database.  These are all important as they relate to recovering data or dropped objects.  The chapter also briefly mentions source control, log recovery tools and determining who might have been the cause of the issues at hand.  Finally there is a brief mention of security, along with the principal of least privileges, and implementation of triggers to help prevent inadvertent data and / or schema modifications.

Appendix: What to Do When All Else Fails

A very quick look through forums, blogs, the Microsoft CSS, and Consultants.  There are probably some people that will not have heard of some of the fine blogs or forums listed, particularly if they are new to the DBA area.

I read this book between 2011-11-05 and 2011-11-25.

Posted in Books, SQL | Tagged | 4 Comments

Expert Cube Development with Microsoft SQL Server 2008 Analysis Services

ExpertCubeDesign

Overview

I have had my eye on this book for awhile as I have been looking for a good analysis services book. After it was recommended to me a few times at the PASS Summit, and after looking at it I saw that Brian Knight also recommended it in the reviews on Amazon, I knew I had to grab it. So, when the book arrived on my doorstep it wasn’t very long before I had it open in front of me. I finished this book in a week (I started it 2011-10-30 and finished 2011-11-05) and can say that it was a relatively easy read. There were a few chapters I would say which were more in depth than others, but overall it felt very approachable.

Note that this is going to be a fairly lengthy review. This is as much because I am / was using this as notes from my read-through as anything else. I’m not sure if this is boring or not and will potentially make sure to prune future reviews lengths if this appears to long to people. Let me know what you think.

Preface

This short introduction went into what to expect from the book, some of the conventions used throughout the book to denote tips and tricks or extra resources and other information. It also discussed who would be best suited to read the book.

Chapter 01: Designing the Data Warehouse for Analysis Services

This chapter discussed some very basic knowledge of what a data warehouse is versus what a data mart is, some of the common interpretations of the two, and the importance of building up the database from which the cube will be based. A brief discussion on fact tables versus dimension; star schema versus snow-flakes; Junk / SCD / Degenerate dimension and bridge tables; Snapshot and transaction fact tables are all also covered.

Essentially, we should be building a data warehouse which will house all of the information, preferably in a star schema, which will then be used to feed the data marts which will conform to the cubes structure. All of the data necessary for the cube should be in this mart, because, although it is possible to pull data from different sources, it is inefficient. It is further mentioned that it is possible to build the cubes directly from the OLTP systems, although it is strongly suggested that this never be the route that is taken.

Finally, the author (s) make an argument to use views in the relational database as opposed to creating the views in the data source view.

This was a very strong introductory level chapter, if a bit low level for a book which claims to be for users that already have working knowledge of SSAS.

Chapter 02: Building Basic Dimensions and Cubes

This chapter went through the entry level information for putting together your first cube. We walk through the different editions of SSAS, the method for creating the data source and data source views, creating a few simple dimensions with a couple of simple user defined hierarchies and user friendly names, and then creating the cube by walking through the deployment and processing steps.

This again seems a little strange for a more advanced book, but they do sprinkle some good information throughout. Such as, it is often good to go ahead and deploy incrementally; build out your first dimension and then deploy to ensure that it is acting the way you think it should; set your project to deploy but not process; etc.

Chapter 03: Designing More Complex Dimensions

It seems almost like they didn’t know quite where to put some of the information in this chapter so they just broke it off. Perhaps a common theme? In any case, we go through the theory of Grouping and Banding to create artificial hierarchies or buckets and why it is that you probably want to do this in your ETL process rather than using the built in bucketing functions; next, the authors explain SCD type 1, 2 and 3 dimensions and how it is that they are dealt with, as well as some of their peculiarities; rigid aggregations versus flexible aggregations are explained; Junk dimensions are described next; finally, we take a look at ragged hierarchies (Parent / Child).

Of particular note in my opinion is the trick for SCD type 2 dimensions in which we expose the attributes as properties of the Business Key as opposed to the surrogate key where appropriate (Or even creating a composite key between the key and attribute).

Chapter 04: Measures and Measure Groups

This chapter begins with talk about some of the useful properties on the measures, such as the format strings (and their corresponding pitfalls) and the display folders. Next, we go into the aggregation types (Sum, Count, Distinct Count, None) and the semi-additive types (AverageOfChildren, FirstChild, LastChild, FirstNonEmpty, LastNonEmpty). There is an explanation on how the semi-additive aggregations are related to the time dimension and of how you might roll your own semi-additive aggregations.

After this, we look at the seldom used “By Account” feature which allows you to set the aggregation type by account type (Usually used in financial systems). Then we look at the unary operators and the weights as well as how you can apply custom member formulas. There is a comparison of Custom Member Formulas and MDX Script assignments. There is a look at non-aggregatable measures, creating multiple measure groups, creating measure groups from the dimension tables, handling different dimensionality and granularity. There is a discussion about using linked dimensions and measure groups (and why you should probably avoid them). We look at role-playing dimensions and whether we want to use different views for these or use the built in role-playing features. Finally, we look at how we define the relationships between dimensions and measure groups, including the bridge tables. There is a brief mention of data mining dimensions.

In my mind, this chapter starts to get into the slightly more advanced topics of the book. There is a LOT of really good information here. I am sure that this is a chapter I will look over several times as a reference for the topics at hand.

Chapter 05: Adding Transactional Data such as Invoice Line and Sales Reason

This chapter is really broken down into two subjects, drillthrough or related information for the measures and many-to-many relationships. I’m not quite sure how these two are related and it seems like this is another chapter where they didn’t quite know where to fit things in at.

Drillthrough actions are explained, along with the warning that these actions are presented to the client but that some clients don’t have the capability to deal with them (Excel 2003). It is noted that the drillthrough actions are really just a list of metadata which defines the columns that will be used to build the drillthrough statement to retrieve the data. There are some hacks presented for organizing the data from the drillthrough. Next we look a little at the drillthrough data model and whether we should use a transaction details dimension, whether we should use ROLAP dimensions to drill through (rule of thumb, don’t do it!) or whether we should use an alternate fact table that we point to in order to gather the information.

Next we look at the many-to-many dimension relationships and some basic use cases (notable, sales reasons). We take a look at how we can define the relationship between dimensions on an attribute other than the surrogate key to allow for mixing the granularities of these relationships. Finally, we look at some of the performance implications for using the many-to-many relationships.

Chapter 06: Adding Calculations to the Cube

The authors go out of their way to state that this is not a book you would use to learn MDX. For that there are other books (MDX Solutions, Fast track to MDX; Microsoft SQL Server 2008 MDX Step by Step). Instead, in this chapter we are first shown the different types of calculated members: Query Scoped Calculated Members (defined in the WITH clause of the MDX query), Session Scoped Calculated Members (client tool CREATE MEMBER statement) and Global Scoped Calculated Members (Created in the cube’s MDX Script). We find out that each of these has implications on where the output can be cached and reused and that we really should be striving to use Global Scoped Calculated Members to avoid duplication of work, better caching, etc.

After explaining the theory of the different types of calculated members, we are walked through lots of good stuff, from some simple scripts, to the methods of referencing cell values, aggregations, Year To Date functionality, Ratios over a Hierarchy, and comparing the current member to previous periods and to the same time last year. We then examine ranks and formatting calculated members before reaching my favorite section of the chapter, and one I’m sure I will be referencing again, Calculated Dimensions. Calculated dimensions are a method of providing a little more flexibility to calculated members by effectively allowing the calculated member to be applied to the current measure, regardless of which measure it is, by applying the calculations against this pseudo dimension. However, calculated dimensions are not without their own pitfalls and problems, which are also discussed. Finally, this chapter ends with a discussion of named sets and dynamic named sets.

This seems to me to be another of the more in depth chapters of the book. There is just loads of good information here that is really easy to get through. I am sure that I will return several times over to implement the calculated dimensions.

Chapter 07: Adding Currency Conversion

This chapter goes over the various forms of currency conversion and how each might be implemented as well as asking important questions relating to conversion rates and how they would be applied. This depends in large part on the type of currency conversion necessary. There are a few different varieties of this problem: multiple currencies per measurement to be represented as one currency; One currency per measurement to be represented by multiple currencies; and finally multiple currencies per measurement represented as multiple currencies. We walk through the various steps you would use to work with each of the situations discussed above in the “Add Business Intelligence Wizard”. Finally, the direct slice property is mentioned, along with the advice not to use it, as well as an introduction to writeback functionality.

All in all I would say that this is a very good primer for dealing with currency conversion. It is easy to follow and gives lots of good advice.

Chapter 08: Query Performance Tuning

This chapter goes over some useful information on things you can do to make sure that your cube meets your performance needs. There is a quick blurb about the two major components of Analysis Services that are used when processing a query – The Formula Engine and The Storage Engine – before the chapter dives into the meat of the subject with information on partitioning your measure groups and building aggregations (particularly with usage based optimizations for the aggregations). We then take a look at how you can begin to tune your MDX calculations and queries (Make sure to do block computation vs cell based calculations if possible), and how you can use the cache to your advantage. Finally, the chapter looks at some scale-out options for SSAS.

I don’t know how I felt about the chapter. It seemed like there was some good information here, but that at the same time it fell a little flat. I think that may have as much to do with the breadth of the subject as anything else. Furthermore, the authors DID state that it should be read in conjunction with “The Analysis Services 2008 Performance Guide” whitepaper. Really, there were a lot of links sprinkled this chapter to other whitepapers, blogs and books for more information on various subjects. So, I suppose that if the rest of this information is taken into account this chapter is really, really good. I just now need to take the time to go read everything they told me to here.

Chapter 09: Securing the Cube

Security is usually the pink elephant in the room when discussing a tool like Analysis Services, which is meant to aggregate and then expose as much information as possible to enable users to properly access and study data for trends and areas of improvements. This chapter does a good job at driving home the fact that Analysis Services cubes will do MUCH better if we think about security from the design process forward, instead of as an after thought. The chapter discusses roles, membership, securable objects, data security both fixed and dynamic, visual totals, the problems with dynamic security and parent / child ragged hierarchies, and how you might access Analysis Services from outside of the domain. Furthermore, it is mentioned – briefly but importantly – how using some of these security features can impact your cube’s performance. Overall this is a very important read BEFORE you start to build your cube if at all possible.

Chapter 10: Productionization

This chapter picks up some of the odds and ends for handling your cube once it has exited the development portion of the lifecycle and has moved on into maintenance. The first topic of discussion, although brief, is that of how changes are made to a cube once it has already gone into production. The next topic, which goes into some length, relates to how you can manage partitions and is complete with examples. Afterwards, there is a very well put together primer on processing – also with examples of processing and the impacts of each type. This is followed by some basic information on proactive caching, backups and synchronization between Analysis Services instances.

Chapter 11: Monitoring Cube Performance and Usage

Some basic information is imparted on the user relating to the SSAS service and how you can collect data from such items as the services pane, task manager, performance monitor, sql profiler, and resource manager. We get a discussion focusing on memory and how it is that SSAS uses it, as well as how the machine may be interacting with other programs or processes that may also need memory. We also get a similar treatment for basic information on I/O. There is a VERY basic and generic framework for looking at monitoring the processing with performance monitor counters, the DMVs and trace data. After this brief treatment we get an even briefer blurb on monitoring query performance and then also on monitoring usage through the same tools.

Again, I’m not sure exactly how I felt about the chapter; but again, there were a lot of links to other papers, blogs and in this case tools or tool suites. I believe that the links to some of these tool suites will make all of the difference in the world for a place to start on monitoring the SSAS instances. So, although the chapter itself may have left some to be desired, I think it will have pointed me in exactly the direction I needed to go. And that may very well have been worth the price of the book on its’ own.

Posted in Books, Data Warehousing | Tagged , | Leave a comment

SQL Saturday #85 Orlando, FL

So, this post is long overdue.  I was able to go down to SQL Saturday #85 in Orlando on 24 September 2011 and I had a bunch of fun doing so. 

Speaker’s Party

Well, we got rained out a little as the plan was to use the porch of Liam Fitpatrick’s.  However, we were able to cram into a very small room in the back of the restaurant and chat for several hours.  Lots of good conversation with good friends and a chance to see some new faces.

Registration / Breakfast

Everything went very smoothly here.  No long lines, doughnuts / coffee / soda / etc were plentiful.  The vendors were already set up and happy to talk, with swag available.  All in all, it went very smoothly.  This was a really good sign since we were in a new building this year (and under new leadership for the event organizers).

Session 1 (with Vendors)

I skipped the first session to spend some time talking to the vendors.  Mainly I was looking into monitoring tools and wanted to get some one on one time to talk about why they thought their tool was the tool of choice and give me some demos.  There were a few sessions I would have liked to attend (anything Buck Woody is great, would have liked to see Erin Stellato’s session as well), but this was something I thought I ought to do for work so I skipped out.

Session 2 – SSIS Data Flow Buffer Breakdown (Eric Wisdahl)

This one was, in my mind, a disaster.  I was using a different machine than normal, and even though I had run through the demos two nights prior, I couldn’t find the version of the demos that was most up to date and specific to this machine.  I was still able to walk through all of the presentation; and I believe I did a decent job with the demos that I had available, but I don’t like it when things go wrong. 

However, the evaluations I got back were still very high for the most part, and I’m happy that it seems like most people still got a lot out of the session.

Session 3 (Recovering)

I spent most of this session just going over what I could have done differently in my head.  I also spent some time visiting with other speakers and volunteers who were in the vendor area. 

Lunch

Lunch was awesome.  All of the speakers this year got their chef’s apron, which was wholly inappropriate but funny.  The event served catered bbq and the speakers actually helped to dish it out.  The food was really good, particularly the bbq sauces.

Session 4 – Page & Row Compression (Bradley Ball)

I have meant to go to this session several times over the past few years.  Bradley did a good job of explaining the differences between page and row compression, how it is that each are enacted, and why you might want to use them.  I highly recommend this session if you get the chance.

Session 5 – Becoming DAX: An Introduction (William Pearson)

This was a very low level introduction to DAX, which was perfect for me as I had never really seen it before.  DAX is the language that is used with PowerPivot, which is the excel add on to do analysis.  It is very similar to the excel functions, but expands the capabilities.  The reason this is more important than it might have been in the past is that DAX will also be used with the Business Intelligence Semantic Model (BISM) and Vertipaq in the next release of Analysis Services. 

This appears to be something I should look into more in the near future as I will be implementing a data warehouse program shortly.

Session 6 – SQL Smackdown: SSIS vs T-SQL (Mike Davis / Adam Jorgenson / Bradley Ball)

A 100 level light-hearted session that explains the pros and cons of both SSIS and T-SQL to do integration and data warehouse loads. 

Closing Ceremony

The closing ceremony was not quite the same without the normal courtyard that I’ve become used to over the past three years.  However, there WAS still prize tossing by Andy Warren, one of my favorite parts of these events.  For those of you who haven’t had the opportunity, Andy stands up on the balcony and chucks prizes into the audience.  Quite fun.  This was a good opportunity to thank the vendors, the event organizers and the attendees.

After Party

A little bit better luck this evening.  This time we were able to use the patio at Liam Fitzpatrick’s.  I think about thirty people came out to the event.  I ordered the shepard’s pie, which I am a complete sucker for… I love me some irish food. 

I have lots of fun talking with the people in the community.  There are quite a few people whom I consider to be friends at this point who I get to see every few months because of these events.  In my opinion, some of the conversations you get and the contacts you make are as important as the sessions you attend.  If you see I’m going to be speaking and you have any inclination, come out to the after party and introduce yourself. 

Posted in SQL, Travel | Tagged | Leave a comment

SQL Saturday #89 Atlanta, GA Recap

SQLSaturday.89.Atlanta.MotleyCrew

Yet another great event has come and gone.  Even though the drive was somewhat long and painful to get up to Atlanta (Alpharetta), I still had a blast.  There were so many great presentations going on that it was extremely hard to pick which ones to go to. 

Speaker Dinner

We ate at Bahama Breeze, which I had had for the first time a month or two ago.  The food was pretty good.  There were probably about thirty or forty people who showed up for the dinner and it was interesting as for the most part over the past few years I have been around the same basic areas so had known most of the people present.  I still new quite a few of the people, but there were definitely as many if not more people who I had not yet met.  The atmosphere was good, as were the conversations.  In fact, the wait staff eventually had to remind everyone that we could actually talk while sitting and that we might like to order soon.  I ended up chatting away with both Mr. Nelsons, John Welch, Julie Smith and many others. 

We next set off to find Taco Mac for the after dinner party that was being put on by the good folks at SQL Sentry.  Unfortunately I relied on GPS and ended up going to the wrong one.  Happily I was not the only one to have made the mistake and Grant Fritchey and I soon realized our mistake and set off for the correct location.  There were far fewer people here, but for those of us from out of town we were more than willing to sit around talking for a few hours. 

Check In / Breakfast

I had no trouble, and saw no trouble with people checking in to the event.  The breakfast was interesting as it was catered (Bagels, Energy Bars, Juice, Coffee, etc) and they decided to put the breakfast table in the center of the room with the vendors.  I think this is probably a good way to get some early morning interaction between attendees and the sponsors.  I’m not sure that it would work for all venues, but it seemed to do pretty well here.

Master Data Services (Eric Melcher)

I dropped in on the Master Data Services session as it is a portion of the SQL Server toolset which I have not had any chance to look into until this point.  After sitting through the session, which was at the perfect level for me, I feel like I have a pretty good feel for what it is that the tool provides.  I’m not certain that it is something that the company I work for will get any good use out of, but I can see where it might be useful.  I’m just not certain whether or not it will be worth my time to learn more about it until I have a better use case.  Eric did a great job at providing the information that was necessary to explain everything and seemed very knowledgeable.

SQL Server Parallelism and Performance (Adam Machanic)

This was one of the sessions that I was most interested in once I saw the published list of sessions.  This was, I believe, the first time I had the chance of meeting Adam and his session did not disappoint.  I learned a good amount about a subject I thought I was pretty good at, reading query plans.  He was very clear and concise in explaining how the query engine broke down parallel queries, what the plans looked like, how you could investigate them to understand what was happening and a few tips on how you might optimize the query structure and underlying tables and indexes.  Overall a very informative session.

Lunch

Lunch was one of the only points that seemed to have some problems.  Apparently, during the pre-con the day before there had been some issues, so they decided to change some things up.  Unfortunately, they didn’t convey this throughout the team.  Also, the caterer was a bit late with the food and the corresponding set-up.  However, once these issues were worked through the lunch itself was pretty good.  It was a boxed lunch type deal with sandwiches, cookies, etc.  Not too bad.

Also, during lunch, there were a few sessions.  I’ve heard that the SSIS vs ETL Smackdown was a success, with the line to get in out the door.  However, I was feeling anxious and found a corner to look over my presentation and demos.

Inside SQL Server Waits Types (Bob Ward)

I am pretty sure that I have sat through the recording of this session from PASS Summit 2009.  However, it was more than worth sitting in again.  Bob Ward has so much knowledge about the underlying tool and he is just amazing at conveying information.  If you have the opportunity to attend one of his sessions I would highly suggest you take it up.  Seeing how the product passes the wait information to the sql os, as well as watching Bob use the debugger was very interesting.  Hearing him explain how he often has to tell customers that if you are seeing a wait of type X it is something with your application, not with the SQL Server product was amusing and informative.  The most annoying part of Bob’s sessions was that I had to skip the next one because I was presenting in the same timeslot.

SSIS Data Flow Buffer Breakdown (Eric Wisdahl)

The room was relatively full and there were lots of great questions even before the session started.  Unfortunately, as seems to be the case about half the time, I ran out of time prior to getting through all of the demos and slides.  I really have to take a look at perhaps cutting some of the material.  Perhaps I will do so after the session next week.  All in all, I still feel that I was able to convey lots of good information to the people who attended.  The room seemed engaged throughout.  After talking to a few of the people who were in the room at the after party, I feel like I did a decent job.  I guess I can wait for the survey results whenever I get them in.

Closing

SQLSaturday.89.Atlanta.CrowdThe closing of the event was the only other point that I would say was potentially a problem.  There just wasn’t a good place to put all of the people (and there were a lot).  Still, there were plenty of good prizes, the event organizers got their messages out and the vendors seemed happy.

After Party

SQLSaturday.89.Atlanta.AfterPartyGroupWe went to the five seasons, a brewery / restaurant that was only a block or two from the event site.  I was surprised to see how few people showed up compared to the number who attended the event.  However, I really enjoyed hanging out with all of those people who did make it.  I had a good chance to talk to a few of the people who attended my session and answer their questions and give them some advice.   I also got my new favorite shirt (for tonight anyhow) with the slogan: I work with models.  The women in the group got the corresponding t-shirt: Data Model.  I love it!

Posted in SQL, Travel | Tagged , | 1 Comment

Data Warehouse Lifecycle Toolkit (2nd Edition)

imageI believe I bought this book a week or two after it came out (January 2008) with the intention of reading it immediately.  Unfortunately, life happens and I have no idea what came up but I’m sure it was good.  And, to be honest, I’m somewhat happy that whatever it was did come up because this was a good time for me to read through this. 

I started reading this book in 20 August 2011 and finished it 11 September 2011. Oddly enough, I started this just prior to being told that the data warehousing project that we are planning at my current employer may be coming along a bit more quickly than originally thought.  As such, I’m ramping up on some refresher material and this was perfect for that need.  (I’m also thinking that I’ll be refreshing my SSRS / SSAS and that I might take a stab at the MCITP 70-452 Business Intelligence Developer exam)

As some background information, I have read several other Kimball Group books, including:

  • The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling (Second Edition)
    • which is thoroughly dog-eared and highlighted; a rather rare occurrence for me.
  • The Data Warehouse ETL Toolkit: Practical Techniques for Extracting, Cleaning, Conforming and Delivering Data
  • The Microsoft Data Warehouse Toolkit: With SQL Server 2005 and the Microsoft Business Intelligence Toolset.

If you have read more than one of these books you will realize that there is a lot of material that overlaps between them but that each has its’ own focus.  The focus of the Lifecycle Toolkit seemed to me to be on the project manager, or at least on giving an overview of the process as a whole.  It seemed to me to be more of a “whet the appetite” with the ability to dive in more expansively through some of the other books referenced above. What The Data Warehouse Lifecycle Toolkit provided more than any of the others listed above was to give a good list of all of the tasks that will be required (or at least recommended) to complete a Data Warehouse project.  I believe that a good portion of the text was set aside to drive home how much work is really involved and that this is not a trivial task.  It continuously mentioned brining in the business, and having the project manager read through what was required in each of the major steps so that this could be conveyed back up the chain.

With that said, this book was a somewhat generic introduction to the topic at hand and no one topic was pursued in depth to the point that you would have all of the knowledge necessary to be an expert.  However, it provided lots of great information relating to the relevant portions of the project, from preparing the project definition, to gathering the key players, to what is really necessary for the business requirements and how you can begin to put them together, etc.  There is also lots of information relating to the dimensional modeling, the ETL subsystems, data profiling, data quality and the B.I. presentation layers. 

As such, I’d recommend The Data Warehouse Lifecycle Toolkit to anyone looking to get involved in a Data Warehouse Project, particularly if you are going to be expected to be in a leadership role on the project.  If you are to be involved as a data modeler or an ETL Developer and are a bit strapped for time it might be more advantageous to skip this volume and go straight to The Data Warehouse Toolkit followed by The Data Warehouse ETL Toolkit.  (Alternately, you can start with the Microsoft Toolkit if that is the technology stack you will be using).

Posted in Books, Data Warehousing | Tagged , , , , | Leave a comment