Feeds:
Posts
Comments

Archive for the ‘DBA Interview’ Category

Interesting one today:

This article is part 2 in a series on Advanced Query Tuning Concepts, that are good to be familiar with. Full list is here.

Merge Join

When both the inputs are fairly large size with indexed and sorted data sets, Merge Join is very efficient in returning matching records. See the image below:

MergeJoin_QueryPlan

In this example, we have 2 similarly sized tables (this is important), that are Indexed (and hence sorted — this is also important). When they are joined on ID columns (ON B.ID = S.ID) that are indexed, Sql Server uses Merge Join

Simple Explanation:

From the sorted input lists, Sql Server takes one record from each table and compares them; If they match, it is returned. If not, the lower value row is discarded and next row from the same table is obtained for next comparison. It keeps iterating until the end of that table.

  • Both the tables need to be similarly sized in comparison to each other.
  • The join columns in both the data sets need to be indexed & sorted
    • This is important for efficient processing of the records.
    • If they are not already sorted, Merge Join adds a sorting step resulting in longer processing times — making it inefficient.

 

 

Hope this helps,
_Sqltimes
Advertisements

Read Full Post »

Interesting one today:

This is Part 1 of a series on Advanced Query Tuning Concepts, that are good to be familiar with. Today we’ll cover Nested Loop Joins. Full list is here.

Nested Loop Join

Nested Loop Join is utilized when, in a join, one input is a small data set (fewer than 10 records) and the other is a large data set that is indexed on the columns used in join.

NestedLoopJoin_Plan

Simple Explanation:

For each record in the SmallTable, it searches entire LargeTable for matching records. It keeps iterating for all the records in the SmallTable. On the first glance it seems like an inefficient method, but it is the most efficient method.

  • Since we are making the SmallTable as the outer input table, it limits the number of times we need to loop through.
  • Since the LargeTable is indexed, a quick Index Seek returns the matching value for the given value from SmallTable (i.e. ON B.ID = S.ID).
    • Hence the reason for Index Scan for SmallTable, with Index Seek for Large table.
    • The other way round would be very inefficient (Scaning BigTable and Seeking on SmallTable)

In this combination of Small vs. Large data sets, Nested Loop Join is the most efficient operator.

 

 

Hope this helps,
_Sqltimes

Read Full Post »

Interesting one today:

Usually, during query performance tuning, we look at query plan to see where most of the time is spent during query execution. Microsoft uses different Graphical Execution Plan Icons to to help easily convey the query execution steps & related costs at each step. Today we’ll look at three of those that are crucial in distilling large data sets and returning only the pertinent records. They are essentially Join Operators used in comparing multiple large data sets and retrieve necessary records.

  1. Nested Loop Join
  2. Merge Join
  3. Hash Join
    • In-memory hash join
    • Grace hash join
    • Recursive hash join

These are part of Advanced Query Tuning Concepts, that are good to be familiar with.

We’ll cover each in an individual post with an example, in this 3 part series.

  1. Nested Loop Join: Best when one of the data sets is small
  2. Merge Join: Best when both the data sets are of similar sizes
  3. Hash Join: Can efficiently handle large data sets, either sorted or not.

 

Hope this helps,
_Sqltimes

Read Full Post »

Interesting one today:

This is Part 2 on blogs related to Replication Setup. Full list is here.

In a recent post, we walked through the steps for setting up replication using T-SQL commands. Today, we’ll look at the commands to remove/drop replication
using T-SQL commands.

Essentially there are 3 major steps to dropping replication; And all steps are executed at Publisher instance. As and when needed, these steps will communicate with Subscriber & Distributor to remove relevant artifacts at each step.

Main Steps:

  1. Remove definitions for Publication, Subscription & all relevant articles
  2. Change database settings
  3. Change Distributor settings

As expected, each of these major steps could have multiple sub-steps, which we’ll get into in future posts.

NOTE: All steps are carried out at Publisher instance

Replication Step T-SQL Step
1. Remove Publication, Subscriptions, etc

a. Remove subscription to each subscriber

b. Remove subscription with articles

c. Remove articles associated with the publication

d. Finally, remove the Publication

1. Run Publisher Instance

a. sp_dropsubscription

b. sp_dropsubscription

c. sp_droparticle

d. sp_droppublication

2. Change database settings

a. Disable database from Publishing

b. Remove associations with all Subscribers

2. Run at Publisher

a. sp_replicationdboption

b. sp_dropsubscriber

3. Distributor Settings

a. Remove association with Distributor

3. Run at Publisher

a. sp_dropdistributor

In a future post, we’ll get into the next set of details.

Hope this helps,
_Sqltimes

Read Full Post »

Interesting one today:

As part of the series on Replication, we’ll cover Tracer Tokens topic today. Tracer Tokens is one of the techniques to measure the latency in replication topology; It is unique and a powerful way to measure the health and latency of replication set up.

Concept:

In replication, we have a Publisher, Distributor & Subscriber. Publisher has a Publication as the source of data to be replicated to Subscriber(s). Distributor helps in getting data from Publisher to Subscriber. In this topology, data is constantly flowing from Publisher to Distributor and eventually to all the Subscribers. At every step, as data flows through the topology, there is latency. Tracer Tokens helps in measuring this latency at each step.

Tracer tokens are dummy replication traffic inserted at the Publisher; As it flows through the topology, it captures the time it takes to arrive at each step (Distributor) and eventually to the destination (Subscriber). This BoL article has more details on this concept.

T-SQL to Insert Tracer Tokens

There are 4 main T-SQL procedures to managing Tracker Tokens:

  1. Insert tracer token at the Publisher
  2. Get a list of all tracer tokens
  3. Gather details on a given tracer token
  4. Delete tracer token history

Insert Tracer Token at Publisher

Connect to Publisher and point to the publisher database. Then run the ‘sp_posttracertoken‘ procedure with appropriate parameters to insert token into this particular publication. See the example below and the attached result.

@tokenID is the OUTPUT variable, that returns the ID of the token after successfully inserting at the Publisher.

--
-- Insert token at publisher
--
DECLARE @tokenID AS INT

EXEC sp_posttracertoken   @publication		= 'SamplePublication'
						, @tracer_token_id	= @tokenID OUTPUT

SELECT @tokenID AS [TokenID]
GO
Insert Tracer Token

Insert Tracer Token

Get a list of all tracer tokens

In situations, where we do not have the token Id readily available, we could query and get a list of all the tokens inserted with their IDs.

In SSMS, go to Distributor instance and point to the Distributor database and run ‘sys.sp_helptracertokens’ procedure with relevant parameters. See below:

--
-- Get the list of tokens already inserted
--
EXEC sys.sp_helptracertokens @publication	= 'SamplePublication'
							, @publisher	= 'ABC_Instance'
							, @publisher_db = 'SampleDatabase'

GO
List of Tracer Tokens

List of Tracer Tokens

 

Gather details on a given tracer token

Now this is the important procedure that shows us the latency numbers at each step of the replication topology. Open SSMS and point to Distribution instance and point to Distribution database and run ‘sys.sp_helptracertokenhistory’ procedure with pertinent parameters. Se below:

--
-- Query latency number gathered by a particular tracer token
--
EXEC sys.sp_helptracertokenhistory	  @publication = 'SamplePublication'
									, @publisher = 'ABC_Instance'
									, @publisher_db = 'SampleDatabase'
									, @tracer_id = -2147483574
GO
Latencies gathered from Tracer Token

Latencies gathered from Tracer Token

 

Delete tracer token history

Finally, removing the tokens from the metadata. Sql Server provides ‘sp_deletetracertokenhistory’ procedure to delete a given token from a publication. See below:

--
-- Delete a particular token
--
EXEC sp_deletetracertokenhistory  @publication = 'SamplePublication'
								, @publisher = 'ABC_Instance'
								, @publisher_db = 'SampleDatabase'
								, @tracer_id = -2147483573
GO
Delete Tracer Token

Delete Tracer Token

Hope this helps,
_Sqltimes

Read Full Post »

Quick one today:

On a regular bases, on production machines, selected perfmon metrics are captured into local files. Each day’s metrics are captured into an individual file — making it easier to analyze the data as and when needed.

Sometimes, to uncover any patterns, we’d need to combine a few days worth of files into one BLG file. This is rare, but needed. Microsoft provides a command to achieve this action. Enter relog command. This command could do a lot of things, but today we’ll look at file concatenation.

relog SqlCounters_08112017_48.blg SqlCounters_08122017_49.blg -f BIN -o C:\PerfLogs\Sql2014Counters\1\s.blg

Explanation:

The following Perfmon files

  • SqlCounters_08112017_48.blg
  • SqlCounters_08122017_49.blg

are combined into a final binary file called Combined.blg.

  • -f flag indicates the format of the output (concatenated file)
  • -o flag indicated the path of the output file

The following image shows, the output when you run it from command prompt.

Concatenate_Perfmon_BLG

Hope this helps,
_Sqltimes

Read Full Post »

Interesting one today:

Lately, Access Methods performance metrics have been helpful in troubleshooting some recent issues. Access Methods has several important metrics that show the metrics to measure the usage internals of logical data with in Sql Server.

Here we’ll look at 3 of them:

  1. \SQLServer:Access Methods\FreeSpace Scans/sec
  2. \SQLServer:Access Methods\Table Lock Escalations/sec
  3. \SQLServer:Access Methods\Workfiles Created/sec

FreeSpace Scans

Objects in Sql Server are written on database pages. A group of these pages are called Extents (8 pages). Allocation of space occurs in units of extents. Extents are of two types Mixed & Uniform Extents.

Usually small tables (and sometimes Heap tables) are written to Mixed extents. So, when the data in those tables/objects increases, Sql Server needs to find more free space to  accommodate for the growth. In these situation, Sql Server performs Free Space Scans. This counter measure how many times these occur every second, hence FreeSpace Scans/sec.

Not sure what Microsoft’s recommended range on this metric is, but in our environment, this metric stays low i.e. under 5 or 10. So, as a rule of thumb, lets say as long as the number is below 20 we are okay. Anything higher for extended periods of time might need some attention.

So the best approach is to gather baseline first; Then you’ll know what is normal and what is out of the ordinary.

Table Lock Escalations

When a query is trying to read rows from a table, it uses the smallest lock as possible to maintain concurrency. In some rare (but not uncommon) occasions, this lock gets escalated higher level, either Page or Table level. While this reduces concurrency on that table, it improves the efficiency of this particular query execution.

Issuing thousands of locks costs a lot of memory; So it is easier to issue table lock and read as much data as needed. But the down side is that it will prevent other connections from reading this table.

For more, read this previous post on Lock Escalations.

This counter measures, the number of such escalations occur per second. Which this is a common occurrence, higher numbers for extended periods of time are not good. So, look into optimizing queries, so they only query the exact amount of data they need (a.k.a. use better JOINs and WHERE clause conditions)

Workfiles Created

When a large query that handles large data sets is executed, sometimes the intermediary data sets (or virtual tables) are written to disk. This helps with efficient processing of data. Sql Server reads this data into memory as and when needed and completes query processing.

This counter measures, how many work files are created each second. On a busy system, that handles large data set manipulation queries, many WorkFiles & WorkTables are created each second. So, this number needs to be considered in context. Capture a baseline first; Then measure any aberrations from baseline and look into possible reasons.

Usually when a query manipulates large data sets, Sql Server uses Hash Joins to manipulate the data to find matches. So, if you have a lot of queries that perform Hash Joins or Hash Aggregates, this counter spikes up.

 

Hope this helps,
_Sqltimes

Read Full Post »

Older Posts »