Feeds:
Posts
Comments

Posts Tagged ‘In short’

Interesting one today:

In replication, there are several amazing features & configurations to make it robust, dependable & highly performing. These settings need to be correctly leveraged to squeeze out the best performance needed or applicable for each environment. Today, we’ll cover a popular setting called NOT FOR REPLICATION on IDENTITY┬ácolumns.

Concept:

In short, when NOT FOR REPLICATION is enabled on IDENTITY columns (or other constraints), the IDENTITY value is not incremented when INSERTs occur due to replication traffic. But all other direct application traffic will increment IDENTITY value.

Imagine a Sql Server Publisher, let’s say P, that is publishing data to a Sql Server Subscriber, let’s say S. Now, both P & S have table called SampleTable with an IDENTITY column called ID. To make it easy to see the difference, let’s make their IDENTITY definition different at each location (P & S).

  • At Publisher, the IDENTITY value is defined as (1,10).
    • So, its values will be 1, 11, 21, 31, 41, etc.
  • At Subscriber, it is defined as (2, 10).
    • So, its values will be 2, 12, 22, 32, 42, etc.

The Set Up

With the above points, let’s create the table and set up replication between P & S. Followins some of the code used to create table at Publisher (P).

At Publisher

--
-- CREATE TABLE with IDENTITY set for NOT FOR REPLICATION
--
CREATE TABLE dbo.SampleTable(
     ID     INT          NOT NULL  IDENTITY(1,10)  NOT FOR REPLICATION   PRIMARY KEY   CLUSTERED
   , Name   VARCHAR(20)  NULL      DEFAULT('A')
)
GO

At Subscriber:

Similarly, on Subscriber, create a similar table with different IDENTITY definition.

--
-- CREATE TABLE with IDENTITY set for NOT FOR REPLICATION
--
CREATE TABLE dbo.SampleTable(
     ID     INT          NOT NULL  IDENTITY(2,10)  NOT FOR REPLICATION   PRIMARY KEY     CLUSTERED
   , Name   VARCHAR(20)  NULL      DEFAULT('B')
)
GO

So, there is no overlap between IDENTITY values generated at P & S.

Now let’s watch their behavior, as data in INSERTED into both servers.

  1. When data in INSERTED directly into each location (P & S)
  2. When data is indirectly INSERTED into S due to replication traffic from P

Below is some more code used to check IDENTITY values, Insert new data, etc. in these expirements.

--
-- Query the data
--
SELECT *
FROM dbo.SampleTable
ORDER BY ID ASC

--
-- Check the value of IDENTITY column at each step
--
SELECT IDENT_CURRENT('SampleTable')

--
-- Insert data directly into P
--
INSERT INTO dbo.SampleTable DEFAULT VALUES
GO

--
-- Manually insert data to introduce interesting scenarios
--
SET IDENTITY_INSERT dbo.SampleTable ON
INSERT INTO dbo.SampleTable (ID) VALUES(201)
SET IDENTITY_INSERT dbo.SampleTable OFF
GO

Run Experiments

With the above set up, lets run through some scenarios and observe Sql Server behavior in each situation.

Scenario 1:

When data in INSERTed directly into P:

  • The IDENTITY values increment with each insert as 1, 11, 21, 31, etc.
  • Subsequently, those records are replicated to S, with same IDENTITY values.
  • But in all of this, the IDENTITY value at S, stays at 2
    • Since NOT FOR REPLICATION is set on the IDENTITY column on S.

When data is INSERTed directly to S:

  • The IDENTITY values are incrementing as per definition to 2, 12, 22, etc
  • Irrespective of the replication traffic from P, the IDENTITY at S only depends on the records INSERTed directly into S.
  • Table at S, has records from both P & S.
    • S will look something like: 1, 2, 11, 12, 21, 22, 31, 32, etc
    • Table at P, will look at 1, 11, 21, 31, etc

Scenario 2: IDENTITY_INSERT

When manual entry is made at P (using IDENTITY_INSERT) to a new IDENTITY value that does not match with the pattern of IDENTITY definition, subsequent IDENTITY values, at P, are based on the highest entry in the table. It uses the same INCREMENT definition, but it is incremented based on the current highest entry value in the table.

At Publisher:

  • Let’s say the SampleTable, at P, has entries like 1, 11, 21, 31 with next IDENTITY value as 41.
  • Now, if a new record is entered manually using IDENTITY_INSERT, with new value as 26. It is successfully INSERTed.
    • Next IDENTITY value still remains at 41.
  • We can keep repeating these steps with different values like 7, 9, 13, 15, 17, 25, 28, 29 (as long as they are below 31).
    • INSERTs will be successful with no impact to next IDENTITY value, which is still at 41.
  • Now, if you perform a regular INSERT, the new record will get IDENTITY value as 41.

At Subscriber:

  • At S, all new entries, 26, 7, 9, 13, 15, 41, etc, are successfully INSERTed with no impact to IDENTITY definition at S.
    • At S, the next identity value is still set to 42
  • Any new direct INSERTs at S, will get IDENTITY values consistent with its previous behavior a.k.a. 42, 52, etc

Scenario 3: PRIMARY KEY Violation

Now, lets make a manual entry at P that matches with the next IDENTITY value at S.

  • For this, let’s assume that the highest value at P is 41, with next IDENTITY value as 51
  • At S, the current highest value is 52, with next IDENTITY value as 62.

Introduce problems:

  • At P, perform a manual INSERT (with IDENTITY_INSERT), with ID value as 62.
    • INSERT is successful at P; And it is replicated to S successfully.
  • After above operation, next IDENTITY value
    • At P is set to 72 (62+10).
    • At S, it is still at 62 (even though a new record in INSERTed with 62). Since NOT FOR REPLICATION is set, replication traffic does not influence IDENTITY increments at S.
  • Now, when a new record is directly INSERTed into S, the next IDENTITY value will be computed as 62, which results in PRIMARY KEY violation.
    • Violation of PRIMARY KEY constraint 'PK_SampleTable'. Cannot insert duplicate key in object 'dbo.SampleTable'
    • Interestingly, the next IDENTITY value for S, is incremented to 72.
    • Subsequent direct INSERTs into S will be 72, 82, etc

Viscious cycle:

  • In the above test, the next IDENTITY value at P is still at 72.
  • Similarly, the next IDENTITY value at S, is also set to 72.
  • So any new inserts at P, will be replicated to S with 72, 82, 92, etc.
    • If there are any existing records, at S, with same identity values, then replication traffic (from P to S) will fail with primary key violation.
    • But if S does not have any records with those identity values (from P), then replication traffic (a.k.a. 82, 92, 102) from P is successfully INSERTed into S
    • Any new traffic, directly at S, will run into PRIMARY KEY violation.
  • So, the summary is, one BAD entry is all it takes to screw up the IDENTITY definition with NOT FOR REPLICATION.

Solution:

  • When this happens, just RESEED, Identity values at P to a non-overlapping value that is consistent with its expected behavior.
    • Something like 151 or 201. To give it a fresh start with no overlaps with P or S existing records.
Hope this helps,
_Sqltimes
Advertisements

Read Full Post »