Early this week, I had an interesting challenge. One of our clients has a large database environment. They have a farm of Sql Server database servers, with databases, on each, crossing 1 terabyte size.
Side bar: Terabyte sized databases are common these days, but what is not common is a supporting scalable storage architecture for VLDBs. It is important that VLDBs are architected appropriately to accommodate the size and load to meet the SLA’s performance metrics. Databases everywhere, have been growing continuously, for several years now, and a lot of them are hitting (and crossing) the 1 terabyte size.
But I digress.
This is one such client.
We have two tables (Table_Main & Table_Archive) with both having close to billion records. As part of daily purge, records from Main table are moved to Archive table and then deleted from main table. The purge logic is not a simple ‘date based‘ archiving, but, at the same time, it’s not too complicated either. The ideal way to do this is would have been to implement ‘Table Partitioning’ and let it play it’s magic every time we need to purge data. But, its not implemented, yet. So, for now, we need to find a lightweight alternative solution to get past the current backlog and gain some breathing time before we implement a long term plan.
The current purge process is a convoluted logic, but primarily does four things:
- Get a total count of records
- Using UPDATE, flag qualifying records for purge. But not all, just a small subset using SET ROWCOUNT 1800 and with ROWLOCK hint.
- The reasoning was to avoid lock escalation to table lock, which could lock the table for a long time preventing any other queries from reading this table.
- Another side bar: In Sql Server, query processing starts with the least amount of locks needed to complete any given task. As needed locks are escalated to successfully complete the job. Acquiring row lock on 10 rows is different from acquiring row locks on 100 million rows in a 200 million table. After a point, the cost of maintaining locks goes up so high that it makes sense to acquire table lock (lock escalation). Sql Server efficiently makes this judgement call, so I am not a particular fan of query hints. I use them occasionally, only after careful analysis. Query Optimizer is very efficient in figuring out the best way to retrieve data. But I digress.
- INSERT flagged records into an Archive table.
- DELETE flagged records (Using ROWCOUNT 1800)
There is a lot of fluff code, around the above steps, to navigate through the process. But, I think, most of the fluff could be done without. So, essentially, the Table_Main, is touched 4 times for every iteration of Purge. Obviously, this is not very efficient.
Using OUTPUT clause, I modified DELETE statement to perform most of the above steps in one go. Below is the modified version of the same code:
DELETE TOP 1800
((DELETED.QtyEarned-DELETED.QtyUsed) * DELETED.Value * (1) ),
INTO dbo.Table_Archive ( LocalID
Breaking it down:
- Just one command that does most of the tasks; All in one shot:
DELETEs records based on the same logic and as they are being deleted, they are archived to ‘Table_Archive’. It also performs some column manipulations and since we can list individual columns you want to archive, the schema of main and archive tables does not need to match.
The next steps are to identify the number of records that could be DELETEd without causing any overhead on the server. The number 1800 was identified as the’ sweet spot‘ for the old logic (with ROWLOCK hint). With new logic, I believe that number could be different. So the next steps, for me, are to run some tests and see where the new ‘sweet spot‘ is. Hopefully, the new number would be greater than 1800, while still minimizing the overall impact on the server resources.
This is pretty exciting !!
Hope this helps,
Read Full Post »