Archive for the ‘Operating System’ Category

Quick one today:

On a regular bases, on production machines, selected perfmon metrics are captured into local files. Each day’s metrics are captured into an individual file — making it easier to analyze the data as and when needed.

Sometimes, to uncover any patterns, we’d need to combine a few days worth of files into one BLG file. This is rare, but needed. Microsoft provides a command to achieve this action. Enter relog command. This command could do a lot of things, but today we’ll look at file concatenation.

relog SqlCounters_08112017_48.blg SqlCounters_08122017_49.blg -f BIN -o C:\PerfLogs\Sql2014Counters\1\s.blg


The following Perfmon files

  • SqlCounters_08112017_48.blg
  • SqlCounters_08122017_49.blg

are combined into a final binary file called Combined.blg.

  • -f flag indicates the format of the output (concatenated file)
  • -o flag indicated the path of the output file

The following image shows, the output when you run it from command prompt.


Hope this helps,

Read Full Post »

Interesting one today:

During one of the Sql Cluster set up in our lab environment, we ran into this interesting (and common) error message.

The Sql Server failover cluster instance name '' already exists as a clustered resource. 
Specify a different failover cluster name.
Sql Cluster Set up Error

Sql Cluster Set up Error


Turns out, the sysadmins created a Resource Group with the same name and allocated all the cluster disks to it. So, when I attempt to create a new Resource Group (for Sql Server resources), it throws this error.


SqlCluster Install Resolution

SqlCluster Install Resolution


Solution: Once the dummy resource group was removed, the cluster set up wizard progressed without any errors.

The moral of the story is, when we are about to create a resource group, make sure a resource group by the same name does not already exist.



Hope this helps,

Read Full Post »

Interesting one today:

Lately, Access Methods performance metrics have been helpful in troubleshooting some recent issues. Access Methods has several important metrics that show the metrics to measure the usage internals of logical data with in Sql Server.

Here we’ll look at 3 of them:

  1. \SQLServer:Access Methods\FreeSpace Scans/sec
  2. \SQLServer:Access Methods\Table Lock Escalations/sec
  3. \SQLServer:Access Methods\Workfiles Created/sec

FreeSpace Scans

Objects in Sql Server are written on database pages. A group of these pages are called Extents (8 pages). Allocation of space occurs in units of extents. Extents are of two types Mixed & Uniform Extents.

Usually small tables (and sometimes Heap tables) are written to Mixed extents. So, when the data in those tables/objects increases, Sql Server needs to find more free space to  accommodate for the growth. In these situation, Sql Server performs Free Space Scans. This counter measure how many times these occur every second, hence FreeSpace Scans/sec.

Not sure what Microsoft’s recommended range on this metric is, but in our environment, this metric stays low i.e. under 5 or 10. So, as a rule of thumb, lets say as long as the number is below 20 we are okay. Anything higher for extended periods of time might need some attention.

So the best approach is to gather baseline first; Then you’ll know what is normal and what is out of the ordinary.

Table Lock Escalations

When a query is trying to read rows from a table, it uses the smallest lock as possible to maintain concurrency. In some rare (but not uncommon) occasions, this lock gets escalated higher level, either Page or Table level. While this reduces concurrency on that table, it improves the efficiency of this particular query execution.

Issuing thousands of locks costs a lot of memory; So it is easier to issue table lock and read as much data as needed. But the down side is that it will prevent other connections from reading this table.

For more, read this previous post on Lock Escalations.

This counter measures, the number of such escalations occur per second. Which this is a common occurrence, higher numbers for extended periods of time are not good. So, look into optimizing queries, so they only query the exact amount of data they need (a.k.a. use better JOINs and WHERE clause conditions)

Workfiles Created

When a large query that handles large data sets is executed, sometimes the intermediary data sets (or virtual tables) are written to disk. This helps with efficient processing of data. Sql Server reads this data into memory as and when needed and completes query processing.

This counter measures, how many work files are created each second. On a busy system, that handles large data set manipulation queries, many WorkFiles & WorkTables are created each second. So, this number needs to be considered in context. Capture a baseline first; Then measure any aberrations from baseline and look into possible reasons.

Usually when a query manipulates large data sets, Sql Server uses Hash Joins to manipulate the data to find matches. So, if you have a lot of queries that perform Hash Joins or Hash Aggregates, this counter spikes up.


Hope this helps,

Read Full Post »

Interesting one today:

Earlier this month, during Sql Server Cluster set up on a new set of VMs, ran into this interesting warning message.

Network binding Order generated a warning.

The domain network is not the first bound network. This will cause domain operations to run slowly and can cause timeouts that result in failures. Use the Windows network advanced configurations to change the binding order.
Cluster Setup Network Binding Order

Cluster Setup Network Binding Order

Upon further investigation, it became clear that the NIC that connects to the Domain network is not given highest priority (as needed) for Sql Cluster.


In Clustered environments, it is recommended to have the network interfaces properly ordered for maximum efficiency.

Go to “Network connections” and open Advanced Settings. See the image below:

Network Connection - Advanced Settings

Network Connection – Advanced Settings

In the resultant window, under Adapters and Bindings tab, make sure the network interfaces are ordered according to the recommendation. Domain network needs to be on the top, then Heartbeat Network and Remote Access Connections. See the image below, for the recommended order.

Network Binding Proper Order

Network Binding Proper Order

After saving the new order, go back to “Install Failover Cluster Rules” and re-run the checks. This blog has more info, if interested about the rest of cluster set up.

Hope this helps,

Read Full Post »

Quick one today:

Every now and then, when good Sql Server resources are available, they are posted here for more people to benefit from them. Continuing on that tradition, today we have a gold mine. Microsoft has released many, many & many e-books open to public to download for free. In the list there are several Sql Server books along with BI, Windows, Office, SharePoint, etc

Happy learning !!


Hope this helps,

Read Full Post »

Interesting one today:

A few months ago, we saw a point on CHECKPOINT and its counterparts. Now, lets dig a little deeper into it.

Starting Sql Server 2012, we have a new advanced option called TARGET_RECOVERY_TIME; It helps with setting Indirect Checkpoints to change the recovery time after a crash.

Automatic CheckPoints are the default (system level setting in sys.configurations) settings that decide how frequently the dirty pages in buffer pool are written to disk. Usually it is once every minute (generalization), but there are some nunaces to it (see NOTE below).   This helps in reducing the amount of time it takes to bring the system back to working after a crash.

Note:1 minute is just a general guideline, the actual interval depends on the amount of traffic on the database system. For higher traffic systems, there will be a lot of transactions each second, so there will be more dirty pages. So, the CHECKPOINT (background writer) kicks off more frequently than once a minute.

With new Indirect Checkpoints, a database level setting, we could configure a custom checkpoint settings to enable faster & predictable recovery times after crash.



When we UPDATE/INSERT data into Sql Server, it is written to buffer pool, not disk. Only when (automatic/default) CHECKPOINT occurs, all the dirty pages in buffer pool are written to disk. This occurs at one minute intervals (varies based on workload, but 1 min in a good general guideline). So, approximately every minute, at the 60th second, you’ll see a HUGE spike in I/O to the MDF/NDF files as all the dirty pages are being written to disk. Then it waits for another ~60 seconds for the next CHECKPOINT, to write all the dirty pages to disk again. So, you see a pattern here.

The entire dirty page workload is being written to disk in one shot —  then wait (sit idle) for the next 60 seconds; And then again write the next workload to disk in one shot. As you can see, the I/O subsystem will be more active during these CHECKPOINT periods than at anytime in between.

If your storage is designed to handled, let’s say, 100 MB/sec and you have 1000 MB worth of dirty pages since the last checkpoint (1 min), it might take storage subsystem more than 10 seconds to fully process the workload. This results in unnecessary spikes in I/O metrics.

See the image below, where it shows the Maximum reading on the amount of dirty pages written to disk.


This presents an incorrect picture that there is something wrong with your storage. While staying idle the remaining 50 (to 45) seconds of the minute.



  • I/O bottlenecks:
    • Now, if we could write more frequently, then the same workload could be accomplished without triggering off any false positive metrics (and also reducing recovery time after a crash).
    • In our above example, if the same 1000 MB dirty page workload per minute, could be written 2 times within a minute, we’ll have ~500MB workload every 30 seconds.
    • Now, the same storage metrics will show much better picture.
    • Then we could tweak the design of storage to the requirements of Sql Server dirty page workload.


  • Indirect Checkpoints enable you to control recovery time after a crash to fit within your business requirements


  • Adjustment Period:
    • When we first change this setting, Sql Server makes some changes to the amount of dirty pages in memory. This may result, initially, in sending some pages to disk immediately to achieve the new Recovery time.
    •  During this adjustment period, we’ll see the above disk activity as Sql Server is going through the adjustment period.
    • Once the adjustment period is completed, Sql Server will go back to a predictable schedule in its attempts to send pages to disk.
    • So, do not be alarmed during this adjustment period.
  • Preparation:
    • For OLTP workloads, sometimes this setting could result in performance degradation. Looks like the background writer, that writes dirty pages to disk, increases total write workload for server instance.
    • If different databases have different settings, the instance ends up doing more work, which might result in performance degradation.
    • So, this setting needs to be tested in performance environment before enabling it in Production environments.
In the next post, we’ll see the interaction between ‘recovery interval‘ & TARGET_RECOVERY_TIME setting.
Hope this helps,

Read Full Post »

Older Posts »