Feeds:
Posts
Comments

Archive for October, 2013

Quick one today:

Sometimes when I run manual backup processes, I want to check how far along has it progressed and its current status. There is an easy way to do this. As you know, the DMV ‘sys.dm_exec_requests’ has information about current requests to the database. If we query that and join it with ‘sys.dm_exec_sql_text’ we can capture all the necessary information.

This is the query:

SELECT DB_NAME(R.database_id) AS [Database_Name]
 , R.Command AS [Command]
 , R.status AS [Status]
 , R.start_time AS [Start_Time]
 , R.percent_complete AS [Percent_Complete]
 , R.total_elapsed_time/(1000*60) AS [Duration_in_Minutes]
 , R.estimated_completion_time/(1000*60) AS [Estimated_Completion_Time_in_Minutes]
 , T.text AS [Sql_Text]
 , R.wait_type AS [Wait_Type]
FROM sys.dm_exec_requests AS R
CROSS APPLY sys.dm_exec_sql_text(R.sql_handle) AS T
WHERE R.command in ( 'RESTORE DATABASE'
 , 'BACKUP DATABASE'
 , 'RESTORE LOG'
 , 'BACKUP LOG'
 )

Above code spits out all the necessary information along with some more pertinent info.

Backup_Progress_and_Status

Backup_Progress_and_Status

Hope this helps,
_Sqltimes

Read Full Post »

We have a lot of virtual machines (VM’s) in our environment. Every day, the push is more and more to have database servers as VM’s as well. Our infrastructure team, when they create new VM’s, they use templates. From their perspective, it’s a lot easier when you have VM templates.

You need a new web server, sure clone from this web-server template and in a few minutes you have a brand new webserver running. They have templates for web-server, app-server, Oracle-DB-server, sql-server-vm, etc.

This results in a problem for us DBA’s.

When you install Sql Server software on a machine, it takes some properties of the Operating System and configures into its installation.

Example:

  • New Windows groups are created. The Sql Server service account is added to one of these groups to assign necessary privileges.
  • Sql Server default instance takes its name from machine name.
  • …etc

This new VM they give us, still have configurations from the old VM. How do I change that?

This is how.

We’ll look at one such method:

  • For default Instance after computer name is changed.
sp_dropserver 'old_name';
GO
sp_addserver 'new_name', local
GO

Restart the instance of SQL Server.

  • For named instance, after computer name is changed.
sp_dropserver 'old_name\instancename';
GO
sp_addserver 'new_name\instancename', local
GO

Restart the instance of SQL Server.

Hope this helps,
_Sqltimes

Read Full Post »

Quick one today.

Occasionally, in my development environment, when I run CHECKDB, I run into this error:

ERROR: Operating system error 665(The requested operation could not be completed due to a file system limitation) encountered.

The error message is pretty vague and does not give me any leads to troubleshoot the issue. But upon some head-scratching, I realized this.

When we run CHECKDB, Sql Server internally runs CHECKCATALOG, CHECKTABLE, CHECKALLOC on all the objects. This is a pretty resource intensive operation, so it needs a lot of resources. CPU, Memory and I/O. There is one more thing, Disk. Yes it needs some disk space to run its calculations.

From what I read from Paul Randall, when you run CHECKDB, Sql Server creates a database snapshot of the database and runs checks on the snapshot. This snapshot needs some disk space. If you have a VLDB with a lot of traffic coming in to make changes, then this Snapshot takes up more space. How big does this snapshot get? I can’t tell, but if there are many changes (UPDATEs/INSERTs) happening on the tables, then snapshot database gets bigger due to ‘copy-on-write’.

Just make sure you have enough disk space on the drive that have the original database’s data files.

Hope this helps,
_Sqltimes

Read Full Post »

This is really awesome !! You can split a partition (filegroup) with data without resulting in any data movement.

Imagine a partitioned table with partitions for each month of a calendar year going to a dedicated file group. Each month data goes into its own partition and as you move forward, the sliding window opens up empty partitions on one end, and removes (and merges) old partitions on the other.

Table Partition & FileGroups

Table Partition & FileGroups

Now for some reason, if you do not run the monthly sliding window, the new month’s data starts accumulating in the same partition (and filegroup) as previous month partition (and filegroup). Essentially data from two different months goes into the same partition.

When you introduce a partition, for current month, it results in a lot of data movement. Current months data that is stored in the previous months data needs to be moved to it’s own partition. This I/O movement may not always be a welcome activity on VLDB’s.

So, you have two options now:

  1. Create new partition for starting of the month or
  2. Create new partition on future date; Something like a few hours in future, to prevent any data movement.

Obviously, Option 1 results in a lot of data movement, but to my surprise option 2 also results in data movement. In option 2, though we create a new partition in future date, Sql Server goes through the old partition to be sure that there is no data that need’s to be moved. Sometime this could take a long, long time.

But there is a new option that results in zero data movement. Microsoft folks have suggested this new way to partition table with no data movement.

Follow these steps:

  1. Create archive table that is also partitioned on the same partition function and partition scheme.
  2. On the main table, Switch out the partition with data for more than one month to the archive table, with an empty partition from archive table. No data movement.
  3. Now, the main table has two empty partition on the end.
  4. Since it’s empty, we can SPLIT it with today’s date. No data movement.
  5. Now Switch back the partition from archive table to main table’s penultimate partition. No data movement again.

Now we created new partition on main table without any data movement.

Voila !! no data movement at all.

Hope this helps,
_Sqltimes

Read Full Post »

In a VLDB environment, diagnosing performance issues is both an art and science. There are several correct ways to identify the reasons contributing to the performance issue and help define the scope. Sql Server as a database engine is a resource intensive software. Not because of Sql Server, but what Sql Server, as a database engine, is designed to honor i.e. Several thousands of requests coming in from several web/app servers simultaneously.

There several components, moving together, contributing to good or bad performance.

Primarily,

  • I/O
  • CPU
  • Memory
  • Network latency
  • etc

Luckily, there are several Operating Systems performance counters available that enable us to rule-out some factors and only focus on the ones directly contributing to the problem. Gather these performance metrics to analyze and arrive at a conclusion based on empirical evidence.

Avg. Disk sec/Transfer

This allows us to see if the physical disks are contributing to the slow I/O. A drive (LUN) attached to a machine could have several physical disks assigned to it. Together they help in retrieving data when requested. This counter shows if the disks are performing to the level expected for Sql Server’s optimal performance. If there is a lot of traffic coming into this drive causing higher Avg. Disk Queue Length numbers that is a different problem. But when a request comes to a physical disk, it is expected to respond within a range. If it goes beyond that then its time to look into the physical disks.

The standard I measure is this:

Range

Intrepretation

0-15 milliseconds

Excellent

15-30 milliseconds

Good

30-40 milliseconds

Okay

40-50 milliseconds

Poor

> 50 milliseconds

Bad Disk

With this ranges, we know how good the disks are, so we could eliminate them as a contributing factor to the performance issues.

If you see 15-30 millisecond response and still experiencing performance issues, then the problem is with something else, not this disks. Not the disks aligned to this LUN (drive) where the database files are located.

But if you see, 40-50 millisecond response time or more than 50 millisecond response time, then you know this is a contributing factor. May not be the sole factor to all the performance issue, but one factor that you know, with evidence, to be part of the problem.

Avg. Disk Queue Length

If you see large Avg. Disk Queue Length, then may be there is a lot of traffic coming into this drive. See if you can add more disks to this LUN (talk to SAN administrator) Or see if you can split this traffic into separate drives by adding (or moving) new database files on a new drive. But if you see this high (> 2 per disk) along with poor performance (40-50 ms) on ‘Avg. Disk sec/Transfer‘, then both are contributing to the problem.

Hope this helps,
_Sqltimes

Read Full Post »

Another quick one today.

The counter “Process(sqlservr)\% Processor Time“, is monitored frequently, to see how SQL Server is consuming CPU resources. This helps put things in perspective as to how existing resources are being utilized; If it’s efficient or does it need any tweeks.

One counter that, we capture as part of regular monitoring is: Process(sqlservr)\% Processor Time

But on a multi-core machines, the captured number seems to go above 100%. It’s aggregate of CPU utilization (%) on each core, so when added it goes about 100%. But it reality it’s not.

Example:

If you have a quad-core machine, with following CPU usage on each core:

  • Core 1 – 40%
  • Core 2 – 30%
  • Core 3 – 5%
  • Core 4 – 50%

The total is :125%

So you see 125% as the captured metric for this counter, but in reality its an aggregate value.

From MSDN:

…Now this Processor time counter under the Process Object gives the amount of CPU this Process is taking individually. This value is calculated over the base line of 
(No of Logical CPUS * 100),  So this is going to be a calculated over a baselin of more than 100.

Hope this helps,
_Sqltimes

Read Full Post »

Quick one today:

This is something I use, frequently, in our performance lab environment to reset wait & latch stats counters before every volume testing (Performance Testing).

DBCC SQLPERF('sys.dm_os_wait_stats',CLEAR)
GO
DBCC SQLPERF('sys.dm_os_latch_stats',CLEAR)
GO

Surprisingly, it does not reset all the DMV’s. Only these two DMV’s. May be this is by design.

Hope this helps,
_Sqltimes

Read Full Post »

A couple of days ago, I ran into this interesting error message. As I was preparing some diagnostic queries, to add to our Query Library, I was playing with some DMV’s & DMF’s (Dynamic Management Views & Functions) to gather some under-the-hood details about our Production Sql Server instance. When I JOIN’ed system views with some DMF’s, I get this error  message saying that this is not allowed. It’s interesting because, I remember being able to successfully do this in the past. Not this exact query, but something like this, and it worked. Not this time.

SELECT	  @@SERVERNAME																						AS [Server_Name]
		, Db_name(Db_id())																					AS [Database Name]
		, name																								AS [File Name] 
		, F.[file_id]																						AS [File_ID] 
		, physical_name																						AS [Physical Name]
		, size / 128.0																						AS [Total Size in MB]
		, size / 128.0 - Cast(Fileproperty(name, 'SpaceUsed') AS INT) / 128.0								AS [Available Space In MB] 
		, num_of_reads																						AS [num_of_reads]
		, num_of_writes																						AS [num_of_writes]
		, io_stall_read_ms																					AS [io_stall_read_ms]
		, io_stall_write_ms																					AS [io_stall_write_ms]
		, Cast(100. * io_stall_read_ms / ( io_stall_read_ms + io_stall_write_ms ) AS DECIMAL(10, 1))		AS [IO Stall Reads Pct] 
		, Cast(100. * io_stall_write_ms / ( io_stall_write_ms + io_stall_read_ms ) AS DECIMAL(10, 1))		AS [IO Stall Writes Pct] 
		, ( num_of_reads + num_of_writes )																	AS [Writes + Reads]
		, num_of_bytes_read																					AS [num_of_bytes_read]
		, num_of_bytes_written																				AS [num_of_bytes_written]
		, Cast(100. * num_of_reads / ( num_of_reads + num_of_writes ) AS DECIMAL(10, 1))					AS [# Reads Pct]
		, Cast(100. * num_of_writes / ( num_of_reads + num_of_writes ) AS DECIMAL(10, 1))					AS [# Write Pct]
		, Cast(100. * num_of_bytes_read / (num_of_bytes_read + num_of_bytes_written ) AS DECIMAL(10, 1))	AS [Read Bytes Pct]
		, Cast(100. * num_of_bytes_written / (num_of_bytes_read + num_of_bytes_written) AS DECIMAL(10, 1))	AS [Written Bytes Pct] 
FROM   sys.database_files AS F
CROSS APPLY sys.Dm_io_virtual_file_stats(Db_id(), F.file_id) AS VF
OPTION (recompile); 
GO

Error Message:

Msg 413, Level 16, State 1, Line 1
Correlated parameters or sub-queries are not supported by the inline function “sys.Dm_io_virtual_file_stats”.

 So, I play with a little bit and try to see if it is syntactic or semantic issue. But no progress. So, I looked up the error message, but did not get anything directly useful. Looks like this is a rare error. And what I gather from MSDN Connect, there is no immediate fix for it.

But there are workarounds. Connect shows a work around from Simon Sabin. But below is my easier (I think) work around. I use CTE to wrap-around the DMF and JOIN it with any table or system view. And it works !!

	WITH CTE_VFS (Database_ID, [File_ID], sample_ms, num_of_reads, num_of_bytes_read, io_stall_read_ms, num_of_writes, num_of_bytes_written, io_stall_write_ms, io_stall, size_on_disk_bytes, file_handle)
	AS
		(
			SELECT Database_ID, [File_ID], sample_ms, num_of_reads, num_of_bytes_read, io_stall_read_ms, num_of_writes, num_of_bytes_written, io_stall_write_ms, io_stall, size_on_disk_bytes, file_handle 
			FROM sys.Dm_io_virtual_file_stats(Db_id(), NULL) AS VF
		)
	SELECT		  @@SERVERNAME																								AS [Server_Name]
				, Db_name(Db_id())																							AS [Database Name]
				, F.name																									AS [File Name] 
				, F.[file_id]																								AS [File_ID] 
				, F.physical_name																							AS [Physical Name]
				, F.size / 128.0																							AS [Total Size in MB]
				, F.size / 128.0 - Cast(Fileproperty(name, 'SpaceUsed') AS INT) / 128.0										AS [Available Space In MB] 
				, VF.num_of_reads																							AS [num_of_reads]
				, VF.num_of_writes																							AS [num_of_writes]
				, VF.io_stall_read_ms																						AS [io_stall_read_ms]
				, VF.io_stall_write_ms																						AS [io_stall_write_ms]
				, Cast(100. * VF.io_stall_read_ms / ( VF.io_stall_read_ms + VF.io_stall_write_ms ) AS DECIMAL(10, 1))		AS [IO Stall Reads Pct] 
				, Cast(100. * VF.io_stall_write_ms / ( VF.io_stall_write_ms + VF.io_stall_read_ms ) AS DECIMAL(10, 1))		AS [IO Stall Writes Pct] 
				, ( VF.num_of_reads + VF.num_of_writes )																	AS [Writes + Reads]
				, VF.num_of_bytes_read																						AS [num_of_bytes_read]
				, VF.num_of_bytes_written																					AS [num_of_bytes_written]
				, Cast(100. * VF.num_of_reads / ( VF.num_of_reads + VF.num_of_writes ) AS DECIMAL(10, 1))					AS [# Reads Pct]
				, Cast(100. * VF.num_of_writes / ( VF.num_of_reads + VF.num_of_writes ) AS DECIMAL(10, 1))					AS [# Write Pct]
				, Cast(100. * VF.num_of_bytes_read / (VF.num_of_bytes_read + VF.num_of_bytes_written ) AS DECIMAL(10, 1))	AS [Read Bytes Pct]
				, Cast(100. * VF.num_of_bytes_written / (VF.num_of_bytes_read + VF.num_of_bytes_written) AS DECIMAL(10, 1))	AS [Written Bytes Pct] 
	FROM CTE_VFS AS VF
	INNER JOIN sys.database_files AS F
		ON VF.File_ID = F.file_id
	GO

Hope this helps,
_Sqltimes

Read Full Post »