Interesting one today:

This article is part 2 in a series on Advanced Query Tuning Concepts, that are good to be familiar with. Full list is here.

Merge Join

When both the inputs are fairly large size with indexed and sorted data sets, Merge Join is very efficient in returning matching records. See the image below:


In this example, we have 2 similarly sized tables (this is important), that are Indexed (and hence sorted — this is also important). When they are joined on ID columns (ON B.ID = S.ID) that are indexed, Sql Server uses Merge Join

Simple Explanation:

From the sorted input lists, Sql Server takes one record from each table and compares them; If they match, it is returned. If not, the lower value row is discarded and next row from the same table is obtained for next comparison. It keeps iterating until the end of that table.

  • Both the tables need to be similarly sized in comparison to each other.
  • The join columns in both the data sets need to be indexed & sorted
    • This is important for efficient processing of the records.
    • If they are not already sorted, Merge Join adds a sorting step resulting in longer processing times — making it inefficient.



Hope this helps,

Interesting one today:

This is Part 1 of a series on Advanced Query Tuning Concepts, that are good to be familiar with. Today we’ll cover Nested Loop Joins. Full list is here.

Nested Loop Join

Nested Loop Join is utilized when, in a join, one input is a small data set (fewer than 10 records) and the other is a large data set that is indexed on the columns used in join.


Simple Explanation:

For each record in the SmallTable, it searches entire LargeTable for matching records. It keeps iterating for all the records in the SmallTable. On the first glance it seems like an inefficient method, but it is the most efficient method.

  • Since we are making the SmallTable as the outer input table, it limits the number of times we need to loop through.
  • Since the LargeTable is indexed, a quick Index Seek returns the matching value for the given value from SmallTable (i.e. ON B.ID = S.ID).
    • Hence the reason for Index Scan for SmallTable, with Index Seek for Large table.
    • The other way round would be very inefficient (Scaning BigTable and Seeking on SmallTable)

In this combination of Small vs. Large data sets, Nested Loop Join is the most efficient operator.



Hope this helps,

Interesting one today:

Usually, during query performance tuning, we look at query plan to see where most of the time is spent during query execution. Microsoft uses different Graphical Execution Plan Icons to to help easily convey the query execution steps & related costs at each step. Today we’ll look at three of those that are crucial in distilling large data sets and returning only the pertinent records. They are essentially Join Operators used in comparing multiple large data sets and retrieve necessary records.

  1. Nested Loop Join
  2. Merge Join
  3. Hash Join
    • In-memory hash join
    • Grace hash join
    • Recursive hash join

These are part of Advanced Query Tuning Concepts, that are good to be familiar with.

We’ll cover each in an individual post with an example, in this 3 part series.

  1. Nested Loop Join: Best when one of the data sets is small
  2. Merge Join: Best when both the data sets are of similar sizes
  3. Hash Join: Can efficiently handle large data sets, either sorted or not.


Hope this helps,

Interesting one today:

This is Part 2 on blogs related to Replication Setup. Full list is here.

In a recent post, we walked through the steps for setting up replication using T-SQL commands. Today, we’ll look at the commands to remove/drop replication
using T-SQL commands.

Essentially there are 3 major steps to dropping replication; And all steps are executed at Publisher instance. As and when needed, these steps will communicate with Subscriber & Distributor to remove relevant artifacts at each step.

Main Steps:

  1. Remove definitions for Publication, Subscription & all relevant articles
  2. Change database settings
  3. Change Distributor settings

As expected, each of these major steps could have multiple sub-steps, which we’ll get into in future posts.

NOTE: All steps are carried out at Publisher instance

Replication Step T-SQL Step
1. Remove Publication, Subscriptions, etc

a. Remove subscription to each subscriber

b. Remove subscription with articles

c. Remove articles associated with the publication

d. Finally, remove the Publication

1. Run Publisher Instance

a. sp_dropsubscription

b. sp_dropsubscription

c. sp_droparticle

d. sp_droppublication

2. Change database settings

a. Disable database from Publishing

b. Remove associations with all Subscribers

2. Run at Publisher

a. sp_replicationdboption

b. sp_dropsubscriber

3. Distributor Settings

a. Remove association with Distributor

3. Run at Publisher

a. sp_dropdistributor

In a future post, we’ll get into the next set of details.

Hope this helps,

Interesting one today:

This is part of the series on Replication set up using T-SQL. The full list is here.

Today, we’ll go over setting up Distributor using T-SQL. This is the first step in our replication configuration process.

The major steps in setting up Distributor are:

  1. Configure an instance as Distributor
  2. Create Distributor database
  3. Add the instances that will use this instance as a distributor

For this example, we’ll move forward with remote distributor a.k.a. Publisher, Subscriber & Distributor are on dedicated instances.

To accomplish the above steps, we use the following T-SQL procedures:

  1. sp_adddistributor
  2. sp_adddistributiondb
  3. sp_adddistpublisher

1. sp_adddistributor

For setting up replication, the first step we need to configure is setting up Distributor. Go to the Distributor instance, and enable the instance as a Distributor.

-- Enable the instance as Distributor
use master
exec sp_adddistributor    @distributor = N'InstanceName'
	 					, @password = N'distributor_admin password'

2. sp_adddistributiondb

Next step is to create Distribution database in the Distributor instance to hold all the replication traffic.

-- Create Distribution database
use master
exec sp_adddistributiondb @database = N'SalesDistribution'
			, @data_folder = N'E:\MSSQL\Data'
			, @data_file = N'SalesDistribution.mdf'
			, @data_file_size = 4096
			, @log_folder = N'E:\MSSQL\Data'
			, @log_file = N'SalesDistribution.LDF'
			, @log_file_size = 2048

			, @min_distretention = 0
			, @max_distretention = 120
			, @history_retention = 120

			, @security_mode = 1

Most of the parameters are self-explanatory; So, we’ll look at brief descriptions.

Following set of parameters indicate the name of the Distribution database with location for its data & log files with initial sizes.

    •  @database
    • @data_folder
    • @data_file
    • @data_file_size
    • @log_folder
    • @log_file
    • @log_file_size

The next set of parameters, indicate the duration for retention of replication traffic (transactions & commands) and retention for history log entries.

    • @min_distretention
    • @max_distretention
    • @history_retention

The last parameter, shows the authentication mechanism for communicating with the Distributor.

  • @security_mode : 1 indicates Windows Authentication; 0 indicates Sql Authentication (default)

3. sp_adddistpublisher

Now that Distribution database is configured, lets inform the distributor about the Publishers that it will rely on it (this Distributor).

-- Associate Distributor with Publishers
exec sp_adddistpublisher  @publisher = N'PublisherName'
			, @distribution_db = N'SalesDistribution'
			, @publisher_type = N'MSSQLSERVER'
			, @working_directory = N'E:\MSSQL\ReplData'

			, @security_mode = 1
			, @thirdparty_flag = 0

The first four parameters indicate the name of the Publisher instance; And if it is Sql Server Publisher (or Oracle, etc); Working directory indicates where the replication data is stored while being communicated between Publisher and Distributor; And the name of the Distribution database.

    •  @publisher
    • @distribution_db
    • @publisher_type
    • @working_directory

The next two parameters indicate the authentication mechanism used by Replication Agents to communicate with Publisher (for Queued Updating Subscriptions); And

    • @security_mode
    • @thirdparty_flag : Indicates if Publisher is Sql Server or non-Sql Server instances (i.e. Oracle, etc)
Hope this helps,

Interesting one today:

This is Part 1 on blogs related to Replication Setup. Full list is here.

Quite often, we setup replication on our lab machines to replicate production environment settings. We have one “Gold” version of scripts that are used every time. Today, w’ll cover the fundamental procedures & their related stored procedures:

Create Replication

Essentially, there are 4 fundamental steps to creating replication. Some steps are carried out pointed to Distribution instance, some pointed to Publisher instance; But there are no steps pointed to Subscriber instance.

  1. Configure an instance as Distributor
  2. Configure an instance as Publisher
  3. Configure a database as Publisher & Create Publication
  4. Configure Subscription

As you can imagine, there are several sub-steps in each of these. We’ll get into details about the sub-steps in a new post.

Replication Step T-SQL Stored Procedure

a. Configure an instance as Distributor

b. Create Distributor database

c. Add the instances that will use this instance as a distributor

1. Run at Distributor instance

a. sp_adddistributor

b. sp_adddistributiondb

c. sp_adddistpublisher

2. Configure Publisher

a.  Configure the instance to be Distributor

b.  Configure the Subscribers

3. Configure Publisher Database & Publication

a. Enable the database as a Publisher

b. Create LogReader Agent

c. Configure publication

d. Assign permissions on this Publication

e. Add articles (tables, SP, etc)

4. Configure Subscription

a. Add subscribers for this publication

b. Create the distribution agent

2. Run at Publisher instance

a. sp_adddistributor

b. sp_addsubscriber

3. Run pointing to Publisher database

a. sp_replicationdboption

b. sp_addlogreader_agent

c. sp_addpublication

d. sp_grant_publication_access

e. sp_addarticle

4. Run pointing to Publisher database

a. sp_addsubscription

b. sp_addpushsubscription_agent

In a future post, we’ll get into the next set of details.

Hope this helps,

Interesting one today:

As part of the series on Replication, we’ll cover Tracer Tokens topic today. Tracer Tokens is one of the techniques to measure the latency in replication topology; It is unique and a powerful way to measure the health and latency of replication set up.


In replication, we have a Publisher, Distributor & Subscriber. Publisher has a Publication as the source of data to be replicated to Subscriber(s). Distributor helps in getting data from Publisher to Subscriber. In this topology, data is constantly flowing from Publisher to Distributor and eventually to all the Subscribers. At every step, as data flows through the topology, there is latency. Tracer Tokens helps in measuring this latency at each step.

Tracer tokens are dummy replication traffic inserted at the Publisher; As it flows through the topology, it captures the time it takes to arrive at each step (Distributor) and eventually to the destination (Subscriber). This BoL article has more details on this concept.

T-SQL to Insert Tracer Tokens

There are 4 main T-SQL procedures to managing Tracker Tokens:

  1. Insert tracer token at the Publisher
  2. Get a list of all tracer tokens
  3. Gather details on a given tracer token
  4. Delete tracer token history

Insert Tracer Token at Publisher

Connect to Publisher and point to the publisher database. Then run the ‘sp_posttracertoken‘ procedure with appropriate parameters to insert token into this particular publication. See the example below and the attached result.

@tokenID is the OUTPUT variable, that returns the ID of the token after successfully inserting at the Publisher.

-- Insert token at publisher

EXEC sp_posttracertoken   @publication		= 'SamplePublication'
						, @tracer_token_id	= @tokenID OUTPUT

SELECT @tokenID AS [TokenID]
Insert Tracer Token

Insert Tracer Token

Get a list of all tracer tokens

In situations, where we do not have the token Id readily available, we could query and get a list of all the tokens inserted with their IDs.

In SSMS, go to Distributor instance and point to the Distributor database and run ‘sys.sp_helptracertokens’ procedure with relevant parameters. See below:

-- Get the list of tokens already inserted
EXEC sys.sp_helptracertokens @publication	= 'SamplePublication'
							, @publisher	= 'ABC_Instance'
							, @publisher_db = 'SampleDatabase'

List of Tracer Tokens

List of Tracer Tokens


Gather details on a given tracer token

Now this is the important procedure that shows us the latency numbers at each step of the replication topology. Open SSMS and point to Distribution instance and point to Distribution database and run ‘sys.sp_helptracertokenhistory’ procedure with pertinent parameters. Se below:

-- Query latency number gathered by a particular tracer token
EXEC sys.sp_helptracertokenhistory	  @publication = 'SamplePublication'
									, @publisher = 'ABC_Instance'
									, @publisher_db = 'SampleDatabase'
									, @tracer_id = -2147483574
Latencies gathered from Tracer Token

Latencies gathered from Tracer Token


Delete tracer token history

Finally, removing the tokens from the metadata. Sql Server provides ‘sp_deletetracertokenhistory’ procedure to delete a given token from a publication. See below:

-- Delete a particular token
EXEC sp_deletetracertokenhistory  @publication = 'SamplePublication'
								, @publisher = 'ABC_Instance'
								, @publisher_db = 'SampleDatabase'
								, @tracer_id = -2147483573
Delete Tracer Token

Delete Tracer Token

Hope this helps,