Archive for the ‘High Availability’ Category

Quick one today:

In replication, the transactions that come into the system are stored in Distribution database first; From there they are replicated to each Subscriber. The data stays there for several hours. This is determined by a ‘Transaction Retention‘ setting on the Distributor.

Similarly, as the data is replicated, log entries are made in the Distributor database. These entries also have a retention policy that could be set using ‘History Retention‘ setting.

Go to Distribution instance, Replication >> Right click >> Distributor Properties

Distributor Retention Properties

Distributor Retention Properties

Under general tab, we see these settings. Change them as needed to achieve longer/shorter retention of both data & log.

Retention Policy

Retention Policy

Hope this helps,

Read Full Post »

Interesting one today:

On one of our production machines, we recently added a new LUN to a SQL cluster. A task like this is a team effort. Sysadmins perform some steps and DBA carry out the remaining. In this article, the main focus is on covering the steps after the LUN is added to the OS & Sql Cluster by the sysadmins.  For context, we’ll start with high level steps before going into details.

Sysadmins steps

  1. Add new storage to the machine/OS as an available storage
  2. Format the available drive with appropriate settings (cluster size) and add it as a new drive
  3. Make drive available to the Cluster using “Add Disk” screen in FailOver Cluster Management tool.

DBAs steps

  1. Add available storage to Sql Cluster
  2. Configure dependency (and check the report before & after)
  3. Add data file on the new cluster storage drive

Here we’ll cover the DBA steps in detail:

Some of these steps were covered under a different recent article as part of dealing with an error message, but here we’ll cover it as a task by itself (which it is).

Add New Storage

Once sysadmins have made the new storage as an ‘available storage’ to OS Cluster, it needs to be added as a new storage location to the SQL Cluster.

In FailOver cluster manager, go to Sql Server Resource Group for this SQL Cluster and right click for detailed options and choose “Add Storage” (see image below)


Once successful, go to Storage\Disks under in FailOver Cluster Manager to confirm the availability. See image below:


Configure Dependency

Adding the storage is an important step, and equally important step is adding the new drive to Sql Cluster Dependency Chain. Dependency Chain informs Sql Sever “how to act”, when any resource in the Cluster becomes unavailable. Some resources automatically trigger cluster failover to other node; some resources do not. This decision is made based on the configurations in Dependency Chain.


Critical: Data drive/LUN that has database files is critical for optimal availability of the Sql Cluster. So, if it becomes unavailable, failing over to other available nodes is imperative to keep the cluster available.

Non-Critical: In some scenarios, Sql Server Agent is not considered as Critical. So if it stops for some reason, Cluster will make multiple attempts to start it on the same node, but may not necessarily cause failover.

This is a business decision. All these “response actions” will be configured in Cluster settings.

Now, check the dependency report (before); We can see that new drive exists in Cluster, but is not yet added to the Dependency Chain.


To Configure Dependency Chain, go to the Sql Server Resource Group under Roles in FailOver Cluster Manager. See the image below for clarity:

Then go to the bottom section for this Resource Group, where all the individual resources that are part of this Resource Group are displayed.

Under “Other Resources“, right click on Sql Server Resource and choose properties.

do As show


In the “Sql Server Properties” window, we can see the existing resources already added to dependency chain logic.


Now, go to the end of the logic list and choose “AND” for condition and pick the new Cluster Storage to be included. See image below for clarity:


After saving the settings, regenerate the Dependency Chain report. Now, we’ll see the new drive as part of the logic.


Add Database Data File to New Cluster Storage

Now, that the new drive is ready, we could easily add a new data file to the new location.

-- Add data file to new storage location
USE [master]
		  NAME 			= 	N'SampleDB_Data3'
		, FILENAME 		= 	N'U:\MSSQL\Data\SampleDB_Data3.NDF'
		, SIZE 			= 	3500 GB
		, FILEGROWTH 	= 	100 GB
		, MAXSIZE 		= 	3900 GB
Hope this helps,

Read Full Post »

Interesting one today:

MSDTC is one of the popular software components that is on all Windows systems. It is one of the Windows  Operating System components that Sql Server relies on it to perform some crucial tasks (when needed).

What does it do?

MSDTC, Microsoft Distributed Transaction Coordinator, is essentially, as name suggests, a coordinator/manager to handle transactions that are distributed over multiple machines. Let’s say we start a transaction, where one of the steps includes querying data from a different Sql Server instance on a different physical machine; MSDTC comes into action with these specific tasks that need transaction coordination across different physical machines. It executes the section of code that is supposed to run on remote machines and brings back the results to local Sql instance. In this process, if any issue were to occur, on the remote machine that results in rollback, MSDTC makes sure the original transaction on this machine also rolls-back safely.

How does it do?

MSDTC comes with necessary Operating System controls and memory structures to carry out these operations independent of the Sql Instances, while keeping integrity of the transaction across the multiple physical Sql machines a.k.a. the complete two-phase distributed commit protocol and the recovery of distributed transactions.

Where does Sql Server use it?

The key point here is that these need to be Sql Instances on different physical machines. Queries that request data across different instances on the same physical box do not go through MSDTC.

MSDTC is used by query activities like

  • Linked Servers
  • RPC (Remote Procedure Calls)
  • Ones with
  • etc…

So, every time we run SQL queries that utilize above techniques, they rely on MSDTC to carry out operation while maintaining transaction integrity.

Who else uses it?

MSDTC is an Operating System resource that is used by applications other than Sql Server, to perform any distributed transaction activities; Like eXtended Architecture applications.

Is MSDTC required?

MSDTC is not required for Sql Server installation or operation. If you are only going to use Database Engine, then it is not required or used. If your Sql uses any of the above mentioned query techniques (Linked Server, OPENQUERY, etc), or SSIS or Workstation Components then MSDTC is required.

If you are installing only the Database Engine, the MSDTC cluster resource is not required. If you are installing the Database Engine and SSIS, Workstation Components, or if you will use distributed transactions, you must install MSDTC. Note that MSDTC is not required for Analysis Services-only instances.

What about Sql Cluster?

Same rules as above apply to Sql Clusters as well with one additional rule. If you have two instances on the same machine (that are clustered across different physical machines), then you’ll need MSDTC. Since the Cluster could failover to remote machine at anytime.

Let’s take an example:

Let’s say Instance1 is on physical machines A & B, with B as active node. Instance2 is on machines B & C, with B as active node. A query going from Instance1 to Instance2 will need MSDTC (even if both the instances are active on the same physical machine B at that given point in time.).

This is because, there is no guarantee that they will remain on the same physical machine at any given time; They might failover to other machines, resulting in instances being on physically different machines. So MSDTC is required (when distributed operations are performed).

Also the recent Sql Server versions do not required MSDTC during Sql Server installations.

Other points in a Clustered Environment

We could have multiple instances of MSDTC as different clustered resource (along with default MSDTC resource).

In scenario with multiple MSDTC, we could configure each Sql Cluster resource to have a dedicated MSDTC instance. If such mapping does not exist, it automatically falls back to use the default MSDTC resource.

Hope this helps,

Read Full Post »

Interesting article today on troubleshooting replication errors.

A few weeks ago, on production, we received an alert with on replication failures. Upon further inspection the error looks like this:

Command attempted:
if @@trancount > 0 rollback tran
(Transaction sequence number: 0x001031C200001B06000700000000, Command ID: 1)

Error messages:
The row was not found at the Subscriber when applying the replicated command. (Source: MSSQLServer, Error number: 20598)
Get help: http://help/20598
The row was not found at the Subscriber when applying the replicated command. (Source: MSSQLServer, Error number: 20598)
Get help: http://help/20598

Identify Root Cause:

To understand the course of action, we need to first understand the underlying issues. Go to Replication Monitor >> Open Details Window on the subscription that has errors. Go to ‘Distributor to Subscriber’ tab for more details. See the image below:


Now we see that, replication is attempting to perform an action on the Subscriber, but the row does not exist. So, lets find out more.

Find the exact command that is being replicated (or executed on Subscriber as part of replication) that throws this error. Use replication procedure sp_browsereplcmds for that.

Query the Distribution agent ID from dbo.MSdistribution_agents and use it in the query below.

-- Uncover the replication command throwing error
EXEC sp_browsereplcmds @xact_seqno_start = '0x001031C200001A620004000000000000'
                     , @xact_seqno_end = '0x001031C200001A620004000000000000'
                     , @agent_id = 49
                     , @publisher_database_id = 3

You’ll see something like this:


Under the command column, we’ll see the exact command that is running into this error.

--  Error 20598 occurring in
{CALL [mtx_rpldel_ReportCriterion] (908236,71357,250,-1)}

Now, lets go to that stored procedure ‘mtx_rpldel_ReportCriterion’ and see what tables are involved in manipulation. In my scenario, the table ReportCriterion does not have the record with ID = 908236


Once you understand the root cause, we have a few options.

  1. Data Integrity: Looks like we have synchronization issues between Publisher and Subscriber. If it is a non-production environment or an environment where reinitializing is an option, then we could take that route to sync up the data first.
    1. Once data integrity issues are resolved, all subsequent replication commends would be successful.
  2. Manual fix: Manually insert the missing record at Subscriber and then allow replication to perform its operations on the new record.
    1. With this option, the more records we uncover as missing, the more manual operation would be required. Its not ideal, but it is a workaround to get things going again.
  3. Ignore, for now: In some situations, until further solution is identified, we may need to ignore this one record and move forward with rest of the replication commands.
    1. Take necessary precautions to make sure there are no more such missing records. Or gather a list of all missing ones.
    2. Configure replication to ignore error 20598 using skiperrors parameter. There are a couple of ways to achieve this; here we’ll look at one.
    3. Go to the Agent Profile for this particular Distributor Agent. One of the profiles allows us to skip certain errors. See the image below.
    4. replication_error_20598_resolution

For more information, please refer to Microsoft Support article on similar issue.

Hope this helps,

Read Full Post »

Quick one today:

In a previous post, we covered one of the techniques used to generate text version of cluster Log files using command prompt. Today, we’ll cover another technique, a more common one going forward; Using PowerShell.


In Windows Server 2012, looks like the cluster.exe command prompt interface is not installed by default, when you install FailOver Cluster.



So, we’ll use PowerShell cmdlets to generate these cluster logs.

#  Generate cluster log from SampleCluster and save in temp folder.
Get-ClusterLog -Cluster SampleCluster -Destination "c:\temp\"

When you run in PowerShell window, the response looks something like this:


Hope this helps,

Read Full Post »

Quick one today:

When we install Sql Server software on a Windows machine, sometimes, we need to install Service Packs (SP); Either SP1 or SP2 or SP3, etc. As service packs are cumulative, it helps to keep it down to a two step process. In some scenarios, this two step process is not posible due to version conflicts that do not allow RTM Sql Server versions on later versions of OS.

SlipStream allows us to combine these two steps into one. Combine RTM version with ServicePacks (SP) and install them together.


  • Reduce or remove need for multiple restarts
  • Reduce install duration times
  • Avoid version conflicts mentioned above (RTM Sql Server versions may not work on later versions of OS)

Preparatory Steps:

  • Install any prerequisites like .Net 3.5
  • Install Windows Installer 4.5
  • Download and extract SP file to a local drive a.k.a. C:\Temp\SP1\
  • Run the Service Pack (SP) first to install Setup & Support files. This prevents any surprises when actual install is performed.

SlipStream Install:

  • Open Command prompt as Administrator
  • Go to the location where install DVD is located.
  • Use this command to run install in SlipStream mode.
    • Setup.exe /PCUSource=C:\Temp\SP1
  • After a second or two, the Sql Installer Wizard opens
    • Walk through the normal install steps.
  • When you get to the “Ready to Install” screen, it indicates that SlipStream is engaged during this install (see in the image below).
SlipStream SqlCluster Install

SlipStream SqlCluster Install

  • Perform restart if you need to.

Please note that this is just one of the techniques. On Microsoft Support, they have other options detailed with troubleshooting techniques.

Hope this helps,

Read Full Post »

Interesting one today:

In replication, there are several amazing features & configurations to make it robust, dependable & highly performing. These settings need to be correctly leveraged to squeeze out the best performance needed or applicable for each environment. Today, we’ll cover a popular setting called NOT FOR REPLICATION on IDENTITY columns.


In short, when NOT FOR REPLICATION is enabled on IDENTITY columns (or other constraints), the IDENTITY value is not incremented when INSERTs occur due to replication traffic. But all other direct application traffic will increment IDENTITY value.

Imagine a Sql Server Publisher, let’s say P, that is publishing data to a Sql Server Subscriber, let’s say S. Now, both P & S have table called SampleTable with an IDENTITY column called ID. To make it easy to see the difference, let’s make their IDENTITY definition different at each location (P & S).

  • At Publisher, the IDENTITY value is defined as (1,10).
    • So, its values will be 1, 11, 21, 31, 41, etc.
  • At Subscriber, it is defined as (2, 10).
    • So, its values will be 2, 12, 22, 32, 42, etc.

The Set Up

With the above points, let’s create the table and set up replication between P & S. Followins some of the code used to create table at Publisher (P).

At Publisher

CREATE TABLE dbo.SampleTable(
   , Name   VARCHAR(20)  NULL      DEFAULT('A')

At Subscriber:

Similarly, on Subscriber, create a similar table with different IDENTITY definition.

CREATE TABLE dbo.SampleTable(
   , Name   VARCHAR(20)  NULL      DEFAULT('B')

So, there is no overlap between IDENTITY values generated at P & S.

Now let’s watch their behavior, as data in INSERTED into both servers.

  1. When data in INSERTED directly into each location (P & S)
  2. When data is indirectly INSERTED into S due to replication traffic from P

Below is some more code used to check IDENTITY values, Insert new data, etc. in these expirements.

-- Query the data
FROM dbo.SampleTable

-- Check the value of IDENTITY column at each step

-- Insert data directly into P

-- Manually insert data to introduce interesting scenarios
INSERT INTO dbo.SampleTable (ID) VALUES(201)

Run Experiments

With the above set up, lets run through some scenarios and observe Sql Server behavior in each situation.

Scenario 1:

When data in INSERTed directly into P:

  • The IDENTITY values increment with each insert as 1, 11, 21, 31, etc.
  • Subsequently, those records are replicated to S, with same IDENTITY values.
  • But in all of this, the IDENTITY value at S, stays at 2
    • Since NOT FOR REPLICATION is set on the IDENTITY column on S.

When data is INSERTed directly to S:

  • The IDENTITY values are incrementing as per definition to 2, 12, 22, etc
  • Irrespective of the replication traffic from P, the IDENTITY at S only depends on the records INSERTed directly into S.
  • Table at S, has records from both P & S.
    • S will look something like: 1, 2, 11, 12, 21, 22, 31, 32, etc
    • Table at P, will look at 1, 11, 21, 31, etc


When manual entry is made at P (using IDENTITY_INSERT) to a new IDENTITY value that does not match with the pattern of IDENTITY definition, subsequent IDENTITY values, at P, are based on the highest entry in the table. It uses the same INCREMENT definition, but it is incremented based on the current highest entry value in the table.

At Publisher:

  • Let’s say the SampleTable, at P, has entries like 1, 11, 21, 31 with next IDENTITY value as 41.
  • Now, if a new record is entered manually using IDENTITY_INSERT, with new value as 26. It is successfully INSERTed.
    • Next IDENTITY value still remains at 41.
  • We can keep repeating these steps with different values like 7, 9, 13, 15, 17, 25, 28, 29 (as long as they are below 31).
    • INSERTs will be successful with no impact to next IDENTITY value, which is still at 41.
  • Now, if you perform a regular INSERT, the new record will get IDENTITY value as 41.

At Subscriber:

  • At S, all new entries, 26, 7, 9, 13, 15, 41, etc, are successfully INSERTed with no impact to IDENTITY definition at S.
    • At S, the next identity value is still set to 42
  • Any new direct INSERTs at S, will get IDENTITY values consistent with its previous behavior a.k.a. 42, 52, etc

Scenario 3: PRIMARY KEY Violation

Now, lets make a manual entry at P that matches with the next IDENTITY value at S.

  • For this, let’s assume that the highest value at P is 41, with next IDENTITY value as 51
  • At S, the current highest value is 52, with next IDENTITY value as 62.

Introduce problems:

  • At P, perform a manual INSERT (with IDENTITY_INSERT), with ID value as 62.
    • INSERT is successful at P; And it is replicated to S successfully.
  • After above operation, next IDENTITY value
    • At P is set to 72 (62+10).
    • At S, it is still at 62 (even though a new record in INSERTed with 62). Since NOT FOR REPLICATION is set, replication traffic does not influence IDENTITY increments at S.
  • Now, when a new record is directly INSERTed into S, the next IDENTITY value will be computed as 62, which results in PRIMARY KEY violation.
    • Violation of PRIMARY KEY constraint 'PK_SampleTable'. Cannot insert duplicate key in object 'dbo.SampleTable'
    • Interestingly, the next IDENTITY value for S, is incremented to 72.
    • Subsequent direct INSERTs into S will be 72, 82, etc

Viscious cycle:

  • In the above test, the next IDENTITY value at P is still at 72.
  • Similarly, the next IDENTITY value at S, is also set to 72.
  • So any new inserts at P, will be replicated to S with 72, 82, 92, etc.
    • If there are any existing records, at S, with same identity values, then replication traffic (from P to S) will fail with primary key violation.
    • But if S does not have any records with those identity values (from P), then replication traffic (a.k.a. 82, 92, 102) from P is successfully INSERTed into S
    • Any new traffic, directly at S, will run into PRIMARY KEY violation.
  • So, the summary is, one BAD entry is all it takes to screw up the IDENTITY definition with NOT FOR REPLICATION.


  • When this happens, just RESEED, Identity values at P to a non-overlapping value that is consistent with its expected behavior.
    • Something like 151 or 201. To give it a fresh start with no overlaps with P or S existing records.
Hope this helps,

Read Full Post »

Older Posts »