70-451–MCITP–Designing Queries for Performance–Evaluate the use of Row-based Operations vs. Set-based Operations

Requirements

Evaluate the use of row-based operations vs. set-based operations.
This objective may include but is not limited to: row-based logic vs. set-based logic, batching, splitting implicit transactions

My Take

In this section we will go over some of the reasons why set-based processing is usually more efficient than row-based processing (also lovingly referred to as RBAR – Row by agonizing row). We will then list some of the valid areas where row-based processing should be considered. 

Note: I have no clue if the definitions I have given for “batching” and “Splitting Implicit Transactions” are correct (I couldn’t find anything with this terminology in B.O.L.).  However, I believe that even without the particulars for these topics, it is useful to realize that there is a time and a place for row-based logic.  Just probably not as frequently as it appears.

Row-Based vs. Set-Based Logic

In general, SQL Server will be optimized when presented with set-based logic as compared to row-based logic. This is because SQL itself (really all RDBMSs languages) are based off of relational algebra, set theory and predicate logic. It is a fairly radical change in thinking going from logic that consists of operating against a single object at a time to logic which considers sets, or groups of items, at a time.

Some people will naturally think that row-based vs set-based logic can be distilled to whether or not all of the logic is done in a query, or whether or not the logic uses a cursor or a loop. Although these statements are generally a good identifier of what constitutes row-based logic, they are not the only items (and in fact, can be misleading as set based logic might be used over multiple steps using stored results). See Hidden RBAR: Triangular Joins as an example of an apparent set-based query that is causing row-by-row processing. Also, see Subquery Fundamentals.

With that said, there are several valid scenarios where row-based logic is correct. For instance, it is useful for many administrative tasks to scroll through a cursor to query or update every database / schema / table / index / user, to manipulate ordered data, to perform running aggregations, to break up large updates or deletions, etc.

For a greater understanding of this logic, I would recommend Chapter 2 – Set Theory and Predicate Logic, Chapter 3 – The Relational Model and Chapter 5 – Classic Algorithms and Complexity of Inside Microsoft SQL Server 2008: T-SQL Querying (link to amazon) and Chapter 8 – Cursors of Inside Microsoft SQL Server 2008: T-SQL Programming.

Batching

One meaning that this objective might take describes batches of sql code. This would be talking about several statements, separated by a semi colon (;) that are completed when the batch is submitted or when a GO (or other batch terminator) is presented. Presenting code to the database engine in this fashion will prevent unnecessary round trips and network traffic.

Another meaning that this objective might take would describe the act of breaking query results into smaller, more selective, sizes. For instance, grabbing only the records which will be presented on the current web page as opposed to returning the whole set and scrolling through until you find the set to display within the returned set.

Splitting Implicit Transactions

For this subject, I will assume that splitting an implicit transaction is used to describe the act of splitting up a large operation, such as an update or delete. An operation of this size would have the potential to lock a table, to cause the log file to grow at a large rate or to have the potential of causing a rollback that would not complete in a timely fashion. Recently I have had to deal with this issue at work when I would need to batch a job that deleted old information from a table. This table was vital to running the application on the front end and when it locked the application would go belly up. To deal with this situation we had to ensure that the table would not lock and were able to implement this by using a while loop and deleting the records in smaller batches (~4k at a time if I remember correctly, just enough to get under the escalation of the locks to the table level).

About these ads
This entry was posted in SQL and tagged , , . Bookmark the permalink.

One Response to 70-451–MCITP–Designing Queries for Performance–Evaluate the use of Row-based Operations vs. Set-based Operations

  1. Pingback: MCITP 70-451 Links Page | Destination: Change

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s