Interesting one today:
In replication, there are several amazing features & configurations to make it robust, dependable & highly performing. These settings need to be correctly leveraged to squeeze out the best performance needed or applicable for each environment. Today, we’ll cover a popular setting called NOT FOR REPLICATION on IDENTITY columns.
Concept:
In short, when NOT FOR REPLICATION is enabled on IDENTITY columns (or other constraints), the IDENTITY value is not incremented when INSERTs occur due to replication traffic. But all other direct application traffic will increment IDENTITY value.
Imagine a Sql Server Publisher, let’s say P, that is publishing data to a Sql Server Subscriber, let’s say S. Now, both P & S have table called SampleTable with an IDENTITY column called ID. To make it easy to see the difference, let’s make their IDENTITY definition different at each location (P & S).
- At Publisher, the IDENTITY value is defined as (1,10).
- So, its values will be 1, 11, 21, 31, 41, etc.
- At Subscriber, it is defined as (2, 10).
- So, its values will be 2, 12, 22, 32, 42, etc.
The Set Up
With the above points, let’s create the table and set up replication between P & S. Followins some of the code used to create table at Publisher (P).
At Publisher
--
-- CREATE TABLE with IDENTITY set for NOT FOR REPLICATION
--
CREATE TABLE dbo.SampleTable(
ID INT NOT NULL IDENTITY(1,10) NOT FOR REPLICATION PRIMARY KEY CLUSTERED
, Name VARCHAR(20) NULL DEFAULT('A')
)
GO
At Subscriber:
Similarly, on Subscriber, create a similar table with different IDENTITY definition.
--
-- CREATE TABLE with IDENTITY set for NOT FOR REPLICATION
--
CREATE TABLE dbo.SampleTable(
ID INT NOT NULL IDENTITY(2,10) NOT FOR REPLICATION PRIMARY KEY CLUSTERED
, Name VARCHAR(20) NULL DEFAULT('B')
)
GO
So, there is no overlap between IDENTITY values generated at P & S.
Now let’s watch their behavior, as data in INSERTED into both servers.
- When data in INSERTED directly into each location (P & S)
- When data is indirectly INSERTED into S due to replication traffic from P
Below is some more code used to check IDENTITY values, Insert new data, etc. in these expirements.
--
-- Query the data
--
SELECT *
FROM dbo.SampleTable
ORDER BY ID ASC
--
-- Check the value of IDENTITY column at each step
--
SELECT IDENT_CURRENT('SampleTable')
--
-- Insert data directly into P
--
INSERT INTO dbo.SampleTable DEFAULT VALUES
GO
--
-- Manually insert data to introduce interesting scenarios
--
SET IDENTITY_INSERT dbo.SampleTable ON
INSERT INTO dbo.SampleTable (ID) VALUES(201)
SET IDENTITY_INSERT dbo.SampleTable OFF
GO
Run Experiments
With the above set up, lets run through some scenarios and observe Sql Server behavior in each situation.
Scenario 1:
When data in INSERTed directly into P:
- The IDENTITY values increment with each insert as 1, 11, 21, 31, etc.
- Subsequently, those records are replicated to S, with same IDENTITY values.
- But in all of this, the IDENTITY value at S, stays at 2
- Since NOT FOR REPLICATION is set on the IDENTITY column on S.
When data is INSERTed directly to S:
- The IDENTITY values are incrementing as per definition to 2, 12, 22, etc
- Irrespective of the replication traffic from P, the IDENTITY at S only depends on the records INSERTed directly into S.
- Table at S, has records from both P & S.
- S will look something like: 1, 2, 11, 12, 21, 22, 31, 32, etc
- Table at P, will look at 1, 11, 21, 31, etc
Scenario 2: IDENTITY_INSERT
When manual entry is made at P (using IDENTITY_INSERT) to a new IDENTITY value that does not match with the pattern of IDENTITY definition, subsequent IDENTITY values, at P, are based on the highest entry in the table. It uses the same INCREMENT definition, but it is incremented based on the current highest entry value in the table.
At Publisher:
- Let’s say the SampleTable, at P, has entries like 1, 11, 21, 31 with next IDENTITY value as 41.
- Now, if a new record is entered manually using IDENTITY_INSERT, with new value as 26. It is successfully INSERTed.
- Next IDENTITY value still remains at 41.
- We can keep repeating these steps with different values like 7, 9, 13, 15, 17, 25, 28, 29 (as long as they are below 31).
- INSERTs will be successful with no impact to next IDENTITY value, which is still at 41.
- Now, if you perform a regular INSERT, the new record will get IDENTITY value as 41.
At Subscriber:
- At S, all new entries, 26, 7, 9, 13, 15, 41, etc, are successfully INSERTed with no impact to IDENTITY definition at S.
- At S, the next identity value is still set to 42
- Any new direct INSERTs at S, will get IDENTITY values consistent with its previous behavior a.k.a. 42, 52, etc
Scenario 3: PRIMARY KEY Violation
Now, lets make a manual entry at P that matches with the next IDENTITY value at S.
- For this, let’s assume that the highest value at P is 41, with next IDENTITY value as 51
- At S, the current highest value is 52, with next IDENTITY value as 62.
Introduce problems:
- At P, perform a manual INSERT (with IDENTITY_INSERT), with ID value as 62.
- INSERT is successful at P; And it is replicated to S successfully.
- After above operation, next IDENTITY value
- At P is set to 72 (62+10).
- At S, it is still at 62 (even though a new record in INSERTed with 62). Since NOT FOR REPLICATION is set, replication traffic does not influence IDENTITY increments at S.
- Now, when a new record is directly INSERTed into S, the next IDENTITY value will be computed as 62, which results in PRIMARY KEY violation.
Viscious cycle:
- In the above test, the next IDENTITY value at P is still at 72.
- Similarly, the next IDENTITY value at S, is also set to 72.
- So any new inserts at P, will be replicated to S with 72, 82, 92, etc.
- If there are any existing records, at S, with same identity values, then replication traffic (from P to S) will fail with primary key violation.
- But if S does not have any records with those identity values (from P), then replication traffic (a.k.a. 82, 92, 102) from P is successfully INSERTed into S
- Any new traffic, directly at S, will run into PRIMARY KEY violation.
- So, the summary is, one BAD entry is all it takes to screw up the IDENTITY definition with NOT FOR REPLICATION.
Solution:
- When this happens, just RESEED, Identity values at P to a non-overlapping value that is consistent with its expected behavior.
- Something like 151 or 201. To give it a fresh start with no overlaps with P or S existing records.
Read Full Post »