Feeds:
Posts
Comments

Posts Tagged ‘Indirect Checkpoints’

Interesting one today:

A few months ago, we saw a point on CHECKPOINT and its counterparts. Now, lets dig a little deeper into it.

Starting Sql Server 2012, we have a new advanced option called TARGET_RECOVERY_TIME; It helps with setting Indirect Checkpoints to change the recovery time after a crash.

Automatic CheckPoints are the default (system level setting in sys.configurations) settings that decide how frequently the dirty pages in buffer pool are written to disk. Usually it is once every minute (generalization), but there are some nunaces to it (see NOTE below).   This helps in reducing the amount of time it takes to bring the system back to working after a crash.

Note:1 minute is just a general guideline, the actual interval depends on the amount of traffic on the database system. For higher traffic systems, there will be a lot of transactions each second, so there will be more dirty pages. So, the CHECKPOINT (background writer) kicks off more frequently than once a minute.

With new Indirect Checkpoints, a database level setting, we could configure a custom checkpoint settings to enable faster & predictable recovery times after crash.

TARGET_Recovery_Time_Setting

Context:

When we UPDATE/INSERT data into Sql Server, it is written to buffer pool, not disk. Only when (automatic/default) CHECKPOINT occurs, all the dirty pages in buffer pool are written to disk. This occurs at one minute intervals (varies based on workload, but 1 min in a good general guideline). So, approximately every minute, at the 60th second, you’ll see a HUGE spike in I/O to the MDF/NDF files as all the dirty pages are being written to disk. Then it waits for another ~60 seconds for the next CHECKPOINT, to write all the dirty pages to disk again. So, you see a pattern here.

The entire dirty page workload is being written to disk in one shot —  then wait (sit idle) for the next 60 seconds; And then again write the next workload to disk in one shot. As you can see, the I/O subsystem will be more active during these CHECKPOINT periods than at anytime in between.

If your storage is designed to handled, let’s say, 100 MB/sec and you have 1000 MB worth of dirty pages since the last checkpoint (1 min), it might take storage subsystem more than 10 seconds to fully process the workload. This results in unnecessary spikes in I/O metrics.

See the image below, where it shows the Maximum reading on the amount of dirty pages written to disk.

Default_CheckPoint_1min_interval

This presents an incorrect picture that there is something wrong with your storage. While staying idle the remaining 50 (to 45) seconds of the minute.

--
--	Change TARGET_RECOVERY_TIME
--
ALTER DATABASE SampleDB
SET TARGET_RECOVERY_TIME = 30 SECONDS;
GO

Advantages

  • I/O bottlenecks:
    • Now, if we could write more frequently, then the same workload could be accomplished without triggering off any false positive metrics (and also reducing recovery time after a crash).
    • In our above example, if the same 1000 MB dirty page workload per minute, could be written 2 times within a minute, we’ll have ~500MB workload every 30 seconds.
    • Now, the same storage metrics will show much better picture.
    • Then we could tweak the design of storage to the requirements of Sql Server dirty page workload.

TARGET_Recovery_Time_30seconds

  • Indirect Checkpoints enable you to control recovery time after a crash to fit within your business requirements

Caution:

  • Adjustment Period:
    • When we first change this setting, Sql Server makes some changes to the amount of dirty pages in memory. This may result, initially, in sending some pages to disk immediately to achieve the new Recovery time.
    •  During this adjustment period, we’ll see the above disk activity as Sql Server is going through the adjustment period.
    • Once the adjustment period is completed, Sql Server will go back to a predictable schedule in its attempts to send pages to disk.
    • So, do not be alarmed during this adjustment period.
  • Preparation:
    • For OLTP workloads, sometimes this setting could result in performance degradation. Looks like the background writer, that writes dirty pages to disk, increases total write workload for server instance.
    • If different databases have different settings, the instance ends up doing more work, which might result in performance degradation.
    • So, this setting needs to be tested in performance environment before enabling it in Production environments.
In the next post, we’ll see the interaction between ‘recovery interval‘ & TARGET_RECOVERY_TIME setting.
Hope this helps,
_Sqltimes
Advertisements

Read Full Post »