Interesting topic today:
On a busy production environment there are always several processing reading, updating, deleting & inserting records into different sections of the same table’s B-Tree. This is part of normal operations. To a large extent B-Trees allow these operations to occur as efficiently as possible minimizing overhead or contention. On occasions there will be overlaps that results in transient blocking; Where Process A’s UPDATE is blocking Process B’s SELECT operation (if there is some overlap of records between both operations).
While this is unfortunate, this is a natural outcome at higher concurrency levels. Good thing is this is just transient. Sql Server is very efficient in juggling these processes so all operations complete as fast as possible, while maintaining integrity of the data (ACID properties).
In newer versions of Sql Server, starting 2008, there are several ways to minimize this overlap or contention. Isolation Levels, Table Partitioning, Efficient code constructs, Best practices for code, etc. Today, in this post, we’ll touch on a specific example in such busy VLDB systems.
In our production system, we have a busy large table with billions of records. A lot of processes are simultaneously trying to SELECT, UPDATE, DELETE & INSERT into this table every second of the day; Obviously, this results in some occasional, transient contention and short bursts of blocking. In worst case, there are events of extended blocking or repetitive extended blocking or deadlocks. This results in heavy latency in application performance.
Adding to this, we have a new requirement to DELETE records that meet certain conditions; all through the day. Now this gets tricky. If it were daily purges (at night), we could handle them through partitioning. But these do not align with partitioned scheme, so we needed a new solution.
Introducing READPAST table hint.
In short, when we use this table hint, Sql Server when attempting to read records, if they are locked by a different process, it moves on. It just does not read them. So, row-level locks are skipped. In default behavior, when qualifying records are locked by a different process, our process waits until they are released or ‘LOCK_TIMEOUT’ occurs.
Note: This behavior is not good for general queries. Ex: When you want to get a SUM of all order amounts, you want correct number. Not a ball park figure. So, it these situations it does not apply.
For us, we just needed a way to DELETE as many records as possible, when possible. It does not matter if we DELETE all or some. Just deleting as many as possible without causing any disruptive overhead is important.
We did something like this:
Just a code snippet copied from a larger code section, so it is not complete.
-- In a loop keep deleting data
WHILE @RunAgain = 1
-- Reset loop flag
SELECT @RunAgain = 0
DELETE TOP (1000)
FROM dbo.SomeTable WITH (ROWLOCK, READPAST)
WHERE Deleted = 1
-- Gather status information after execution
SELECT @Err = @@ERROR
, @TotalPurged = @TotalPurged + @@ROWCOUNT
, @RC = @@ROWCOUNT
, @Iteration = DATEDIFF(SECOND, @LoopTime, GETDATE())
SELECT @ProgressReport = @ProgressReport + 'Touch point at : ' + CONVERT(VARCHAR, @RC) + ' deletes took ' + CONVERT(VARCHAR, @Iteration) + ' seconds (rounded down)' + CHAR(13) + CHAR(10)
, @LoopTime = GETDATE()
-- Keep purging in a loop, only if both conditions are met
-- 1) We actually purged 1000 records
-- 2) The purge has been running for less than 5 minutes
IF (@RC = 1000) AND (DATEDIFF(SECOND, @StartTime, GETDATE()) < 580)
SELECT @RunAgain = 1