Design scalable database solutions.
This objective may include but is not limited to: scale up vs. scale out, federated databases, distributed partitioned views, scalable shared databases, replication, offloading read-only query (database mirroring)
In this section we will very briefly take a look at the age old question of scale out vs scale up and what that really means with SQL Server. After this discussion, we’ll take a look at some of the options of distributing the load among other servers. I am a little surprised that we aren’t really talking about service broker here. As an aside, prior to this I don’t remember even hearing of scalable shared databases as they related to sql server and had only really heard of federated databases as they related to a DB2 setup that isn’t anything at all like what the sql server version (or at least our implementation wasn’t).
Coincidentally, I just purchased Pro SQL Server 2008 Mirroring by Robert L Davis and Ken Simmons… So, I’ll probably know a lot more about the subject after reading through (when I get the chance, right now about fourth in line for upcoming books to read).
Scale up vs Scale out
When the database that you are currently using begins to hit the limits of operation due to resource constraints, it is time to make more resources available. When doing so, you have the choice of scaling up, meaning that you are buying a bigger single server (or multiple servers if involved in high availability) that will handle the whole load, or scaling out, meaning that you are purchasing many smaller boxes that will share the load. However, this decision might not be quite as simple as it sounds. Scaling out could mean significant changes to the underlying architecture and the application. I think that this is the main reason that most companies choose to instead scale up. It’s simple, you just throw more hardware at the problem, and you’re done. There is no cost to re-architect the application or database and no cost associated with investigating the right solution for scaling out as well as the particular options within those options. We will look in a little more detail at some of the options for scaling out in SQL Server below. However, some methods that aren’t mentioned include using a Service Oriented Architecture (SOA) with Service Broker, and using a data dependent routing mechanism within the application to route the inquiries and updates to the appropriate server (with no underlying relationship between the various servers).
Some interesting links:
- Scaling out SQL Server 2005 (Not the most current version, but all of the options apply and this is probably the most relevant white paper available)
- Planning, Implementing and Administering Scale out Solutions with SQL Server 2005
- Scaling Up vs. Scaling Out: Hidden Costs (Jeff Atwood) – Comments have interesting discussion
- Sql Server Scale Out
In sql server, Federated Databases are essentially a large number of servers that collectively, as a single “federation”, are used to process the load (another term for this is a database “shard“). Although each server in the federation is managed independently, it is presented to the applications as a single database server. In these systems, the data is horizontally partitioned. This type of architecture is generally associated with distributed computing and uses distributed partitioned views, as explained below. See the article Designing Applications to Use Federated Database Servers for more information about how the back end should probably be set up as it relates to the systems which will be accessing the data.
Distributed Partitioned Views
A Distributed Partitioned View is a partitioned view wherein at least one of the participating tables resides on another server. For more information on Partitioned Views, refer back to the post on Designing Views. As a reminder, there are special rules for Modifying Data in Partitioned Views. In the discussion of Federated Database Servers Planning Guidelines, there was some talk of the differences between Symmetric Partitions and Asymmetric Partitions. Essentially, A symmetric partition is one in which it is possible to divide out the data in a fashion which would lead to approximately 80% of the queries against the data on a server being able to reference the data contained in all of the other tables which are involved in a referential integrity relationship with the distributed partitioned view without having to rely on the other member servers.
When looking into this subject, we’ll want to take a look at some of the guidelines for Designing Distributed Partitioned Views as well as how the server deals with Resolving Distributed Partitioned Views.
Special note of the day: Partitioned views are one of the few places where “Select *” is recommended in a production system!
Scalable Shared Databases
Scalable shared databases are an architecture of using common read-only “reporting” databases which are built upon a read only volume and then mounted on multiple reporting servers. This allows for the ability to scale out the workload of your reporting servers by supplying an identical view of the underlying data to all of the servers without the cost of having to maintain the data separately. It also allows for the workload on these servers to be contained such that a single query will not negatively impact the whole of the reporting environment since each database is on it’s own instance, with its own resources (i.e. memory, cpu, i/o, tempdb, etc).
For maximum performance with the scalable shared databases, you will probably want to have two separate volumes that can be alternated, since the databases will have to be detached and reattached any time that you will be updating the data (and in order to update, the mounted volume will need to be set to writeable).
In my mind, I see SQL Server Replication being covered in more detail from the administrative side of the house (see the skills sections for MCTS 40-432 and MCITP 40-450). However, it should be considered when designing a scale out solution since it is an efficient way to distribute data across multiple servers. It does this by copying and distributing the data from one database to another and synchronizing these databases to maintain consistency. So, in effect, you are going to have a full copy of the data on each server. This data will be in a consistent state
Replication often uses metaphors associated with the publishing business in order to describe the role that each of the members will play. Among the roles are the following:
- Publishers – make the data available to the other locations
- Distributors – store the data for the publishers and runs the ‘replication agents’
- Subscribers – consume the data from the publisher (Can consume from one or more publishers)
- Articles – the database object that is involved in the publication (views, tables, sprocs, etc)
- Subscription – A request of the publication from the publisher to the subscirber
Since I don’t think that we will really need to understand everything that there is to know about replication, particularly as only a single bullet point of the designing scalable database solutions skillset, I think it is safe to leave this off with just the knowledge of approximately what it is that replication is and how it works. For a fairly straightforward read, without having to click from article to article like you do in B.O.L. for this topic, you can see the SQL Server Replication Crib Sheet for more information.
Offloading Read-only Query (Database Mirroring)
With Database Mirroring the data once again resides in multiple places. In this case, there is a principal database, where the original data resides, and the mirror server, where the copy of the data is sent. This copy is sent by “redoing” every data modification action performed on the principal on mirroring partner by sending the transaction log records for that action from the principal to the mirroring partner. Mirroring can take place with one of two options, high-performance mode or high-safety mode. In high-performance mode, the mirroring takes place asynchronously wherein the principal returns information to the client as soon as the command is sent to the mirroring partner. In high-safety mode, the mirroring takes place in a synchronous fashion in that the principal must wait for confirmation from the mirroring partner prior to returning to the client. With the high-safety mode you have the option to enable automatic failover (requires a third server as the “witness”).
Some important items to note, with database mirroring the database must be in full recovery mode in order to function and with mirroring there can only be two servers which participate (however, the next version of sql server, codename denali, will apparently increase this limitation along with many other improvement with HADRON or High Availability Disaster Recovery Always On).