Feeds:
Posts
Comments

Archive for September, 2014

Recently, during review of some T-SQL code, I ran into the usage of “not equal to (<>)” operator in one of the queries with LEFT OUTER JOIN. This is interesting, because the general perception of ‘not equal to’ is different from the way Sql Server processes it.

Let’s take an example:
Two tables, Table1 & Table2, with similar structure.

--
-- Create tables
--
CREATE TABLE dbo.Table1(
   ID         INT         NOT NULL IDENTITY(1,1)
 , PatientID  INT         NULL
 , SomeCode   VARCHAR(10) NULL
)
GO

CREATE TABLE dbo.Table2(
   ID           INT         NOT NULL IDENTITY(1,1)
 , PatientID    INT         NULL
 , SomeCode     VARCHAR(10) NULL
)
GO

--
-- Load dummy data
--
INSERT INTO dbo.Table1(PatientID, SomeCode)
VALUES (1, 'Code1')
     , (1, 'Code2')
     , (1, 'Code3')
     , (2, 'Code1')
     , (2, 'Code2')
     , (3, 'Code1')
GO&lt;/pre&gt;
&lt;pre&gt;
INSERT INTO dbo.Table2(PatientID, SomeCode)
VALUES (1, 'Code1')
     , (1, 'Code2')
     , (1, 'Code3')
     , (2, 'Code1')
     , (2, 'Code3')
     , (2, 'Code4')
     , (3, 'Code1')
GO

Here, in Table1, Patient No.1 has 3 codes i.e. Code1, 2, 3 & Patient 2 has 2 codes i.e Code1, 2; Patient3, etc

Requirement Definition

In plain English, if we say, we want to see all records from Table1, that do not have a matching record in Table1 (a.k.a. LEFT OUTER JOIN scenario).

You want to see something like this:

Dataset and manual comparision

Dataset and manual comparision

 

The data set in green is the matching and red is the non-matching. So you only want to see the ones in red box (but only from Table1 – left hand side).

Options

This general English language query could be converted into these two SQL queries


--

-- Option 1: Using &lt;&gt; operator

--

SELECT A.PatientID, A.SomeCode, B.PatientID, B.SomeCode
FROM dbo.Table1 AS A
LEFT OUTER JOIN dbo.Table2 AS B
    ON A.PatientID = B.PatientID
    AND A.SomeCode &lt;&gt; B.SomeCode
GO

This results in more data than what we expect. For  every record in Table1, it cartesian-join’s every record in Table2 (after PatientID condition) and return every combination of mis-match.

See the image below:

Not Equal Operator Output

Not Equal Operator Output

 


--

-- Option 2: Using &quot;WHERE SomeCode IS NULL&quot;

--

SELECT A.PatientID, A.SomeCode, B.PatientID, B.SomeCode
FROM dbo.Table1 AS A
LEFT OUTER JOIN dbo.Table2 AS B
    ON A.PatientID = B.PatientID
    AND A.SomeCode = B.SomeCode
WHERE B.SomeCode IS NULL

GO

This returns the correct dataset that we are expecting i.e. only the records from Table1 that do not have matching record in Table2.

Using WHERE CLAUSE

Using WHERE CLAUSE

 

Summary

In the first version, when we say ‘<>’ (A.SomeCode <> B.Somecode, though in plain English we seem to be asking the correct question, in SQL the meaning changes. This returns a lot more data than what we’d like to see.

It takes each record from Table1 and returns every record from Table2, that does not match it (after PatiendID condition is applied), like a cartesian join

In general terms, stay away from ‘<>’ operator, unless it is the exact logic you need. Also, it is inefficient for general use cases.

Hope this helps,
_Sqltimes

Advertisements

Read Full Post »

Quick one today:

Starting Sql Server 2012, looks like there has been a change to restarting options with Sql Server when connected from remote SSMS (a.k.a. when connecting to a Sql Server instance on a lab/prod server using SSMS on your laptop).

SqlAgent Restart is Disabled

SqlAgent Restart is Disabled

Looks like this is part of the new security features. In a typical scenario, you rarely (or almost never) would perform restarts on Sql Server instance or Sql Server Agent on a production instance from your local SSMS. During a production maintenance window, the usual practice is to, log into the server (i.e. RDP into the box) and then use Sql Server Configuration Manager to perform such operation. This sounds like a better/safer practice than performing similar operations from my local SSMS. Though it may seem inconvenient, it is better this way to prevent an inadvertent downtime.

As always, there seem to be some techniques to bypass this limitation. Though there has not been a successful outcome when these techniques were attempted, one of the popular suggestions is to open SSMS using ‘Administrator’ privileges.

Other techniques are listed in the blog here. Please not that I do not recommend attempting any of them.

 
Hope this helps,
_Sqltimes

Read Full Post »

More often these days, there have been many conversations around the topic of handling unicode data and UTF-* type data in Sql Server. Some details are below:

Basics

  • ASCII means American Standard Code for Information Interchange – based on English alphabet
    • As you can see above, ASCII is only for English
  • Unicode is a standard for representing characters of most of world’s languages (written)
    • Just like ASCII, but covers more languages.
  • UTF-8 means UCS Transformation Format – 8 bit (1 byte).
    • UTF-8 is the most used standard for web and has closest compatibility with ASCII
    • UCS means Universal Character Set defined by ISO
    • UTF-16 uses 2 bytes
    • UTF-32 uses 4

How this relates to Sql Server:

  • Sql Server uses COLLATION setting to determine the techniques to compare and sort character data.
  • In Sql Server we have COLLATION at instance level, database level and column level.
    • Instance collation could be configured during installation
    • Installation collation is inherited for database and table columns, when no other collation is explicitly specified.
  • .Net string is unicode encoded in UTF-16
  • Java string is unicode encoded in UTF-16
  • In previous post, we discussed ways to store unicode data.
    • When you store unicode data in non-unicode column, Sql Server finds the closest match. Example : ‘À’ to ‘A’
    • But when a close match is not available, it replaces it with ‘?’. See this example.
  • Use windows collation where possible (not SQL collation)
    • Windows collations are updated when there are new characters (or alphabet), where as SQL collations are not going to up updated going forward.
    • SQL collations are for backwards compatibility and when one Sql Server needs to communicate directly with other Sql Servers in the environment.
      • Either through SSIS
      • Replication
      • etc
  • In a multi-language environment, use the collation with best overall support for the languages used.
  • Performance is not (and should not be) a factor in deciding collation.

Several posts have helped gather the information above. I’ll list some of them:

  1. BoL
  2. Article on UTF
  3. Dan Guzman
  4. MSDN Blog

Hope this helps,
_Sqltimes

Read Full Post »