SQL Server Performance - what it takes for a newbie DBA in resolving performance issues?

Published 05 February 08 01:57 AM | SQL Master 

SQL Server Performance Tuning, sounds familiar on a DBA perspective and though it sounds like an easy words but when you start analyzing the problem it is far deeper inside.

When you refer to the web resources such as forums/newsgroups then a common question will be fulfill of improving performance on a relatively busy production server. Increase performance means reduce the number of READS when a query is executed using TSQL and all it depends on the columns used in WHERE clause that should be SARGable. Using SQL Server techniques such as running PROFILER and make use of INdex Tuning Wizard (SQL2000) and Database Tuning Advisor (SQL2005) will have suggestions on the indexing strategy. So taking out the number of queries that are executed often on the server will help you, such as referred in these blog posts

At times I visit forums & newsgroups that will get more insight on problems that a user face on SQL Server arena, recently visitng SQL Server Central site forums and gone through this interesting and awesome explanation by a forum member, R.P.Rozema given much of insight for a User (beginner to Professional) to see what they can do in fine tuning a query performance.

I fully agree that you should first identify the most called/heaviest queries in your database and concentrate mostly on optimising those. Your objectives can be any combination of: less cpu-load, less IO or improved concurrency, it just depends on where the bottleneck is for your application(s). Each of these objectives may need their own adjustments. But in any case you need to have a look at the T-SQL only after you've identified your poor performers first.

Of course it's best to always write your code with performance as one of your objectives. For this purpose I've compiled a non-extensive list with hints on how to get well performing T-SQL. The list was originally written with MS SQL 2000 in mind, though I'm sure most of this still holds for 2005. I think this list will help you too improving performance of your poor performers. I would like to invite more people to share their tips too and/or give comments on mine.

The list



Reduce the number of columns returned to a minimum



The less data needs to be retrieved, the more optimal the queries will be. Even better, try to reduce the columns fetched (not just the columns in the select clause add to this, also the where and join criteria must be included) to just columns from exactly one index. If a (partial) query can be answered by just reading an index, that will save the additional IO for reading the entire records from the table itself.


Avoid OR like the plague



Very often OR results in index or even table scans. Scans are bad for performance. Different approaches can be used to avoid using OR, ranging from using 'union all' and the use of nullif, isnull and such in join queries.


Specify table qualifiers



Specifying dbo. for all tables in your queries will simplify the job for the query interpretator, there by speeding up creating the query plan for your queries. More importantly: in future versions of MS SQL specifying the qualifier will be mandatory.


How to compare a mandatory column with an optional column



IMPORTANT NOTE: the original hint -shown below in italics, (I can't find strike-through)- does NOT improve performance, in fact it even it hurts performance as Scott pointed out. The where clause as I've presented it below hides the fact that an 'or' is in the conditions and even making it worse by applying nullif. The query will result in a table scan because the arguments get obscured for the optimizer by the nullif operator. A next attempt for an example to compare a mandatory column with an optional column, please -anyone- correct me if I'm wrong again
select 1  
from dbo.Table t
where t.mandatorycol = t.optionalcol

DO NOT FOLLOW THIS ORIGINAL HINT:
Optional columns may contain null values, comparing null with any value may result in unexpected results. To properly compare optional columns you would need to use an OR operator to check for either of the operands being null plus a = (or <> ) operator to check for equal (or non-equal) values. By using the nullif operator this sort of comparison can be implemented more easily. The nullif operator returns null if both operands have the same value and it returns the value of the first operand if they are not the same. If either of the columns to compare is mandatory, to compare the values it is sufficient to use a single nullif operator.

For example, to check for equal values in a mandatory column and an optional column:
select 1  
from dbo.Table t
where nullif( t.mandatorycol, t.optionalcol) is null



How to compare 2 optional columns



If both columns are optional, the nullif operator will return null if the values are the same, but it will also return null if the values are not the same and the first operand is null. You can combine the coalesce operator with the nullif operator to check 2 optional columns for (in)equality. Note that by reversing the both operands in the second nullif, either of the both nullifs will return a non-null value if exactly 1 of the both columns is null.
select 1  
from dbo.Table t
where coalesce( nullif( t.optionalcol1, t.optionalcol2), nullif( t.optionalcol2, t.optionalcol1)) is null

Reduce the amount of rows to work on a.s.a.p.



This is a very generic rule that can be applied in many ways. Examples follow. (sub queries, join on sub queries, selection criteria as join criteria, etc). Reducing the number of rows to work on helps us in many ways: less IO, less locks and less memory.


Check for existence



select 1 outperforms select * in many situations and select top 1 1 often performs even better because it allows the query optimiser to reduce the amount of data as soon as possible.

Example:
if exists (  
select top 1 1
from aTable
)
begin
...

Check for existence(2)



If you need to find rows in table a for which at least 1 row exists in table b, you can use an inner join combined with a sub select on table b grouping table b by the joining key. This of course works best if an index exists on aTableB.id.

Example:

   select a.col1
from aTableA a
inner join (
select
b.id
from aTableB b
group by
b.id
) b on (b.id = a.id)

Check for non-existence



If you need to find rows in table a for which no row exists in table b, you can use a left outer join combined with a where clause checking that b's joined column is null. Of course this only works for mandatory columns in table b. If the join key is a combined key, you only need to check for one of these to be null. You should always look at the query plan and use the first column in the underlying index to check for null. This of course works best if an index exists on aTableB.id.

Example:

   select a.col1
from aTableA a
left outer join aTableB b on (b.id = a.id)
where b.id is null -- not exists


Check for duplicates



It is sufficient to check for the existence of just one duplicate instead of looking for all duplicates. i.e. checks for duplicates can often be implemented as a check for existence of 2 different rows with identical properties. Never use select count(1) having count(1) > 0
instead use something like:
if exists (  
select top 1 1
from aTable a
inner join aTable b on (b.someproperty = a.someproperty and b.keyproperty <> a.keyproperty)
)
begin
...

Group by in sub query



One example of reducing the intermediate amount of data is when grouping data from joined tables. If possible, try to group the data from one table first in a sub query and then join the 2nd table onto the sub qeury's result. The performance is gained mostly because the query optimiser can more easily identify the correct indexes to use. Another important aspect here is that both the query plan for and the intermediate table results of the sub query can be re-used (cached) among more queries/connections, whereas for more monolithic queries the chances of getting a cache hit for either are much less.
select fls.flsJFHillebrandRef, cnt.NumContainers  
from dbo.Files fls
left outer join (
select
cnt.flsGUID,
count(1) as NumContainers,
cnt.cntDeleteDate
from dbo.Containers cnt
group by cnt.flsGUID, cnt.cntDeleteDate
) cnt on (cnt.flsGUID = fls.flsGUID and nullif( cnt.cntDeleteDate, fls.flsDeleteDate) is null)

Use left joins for 'find' queries



When a query is potentially used with many different where clauses (typically this sort of queries is used in view definitions, but it may apply to others too) and the search conditions may include columns from joined-in tables, it can be advantageous to specify left joins for most of the join conditions while specifying in the select clause the value from the left-joined table. Performance is gained because only if the value is used as a selection criterium, the left-join will reduce into an inner join, and the left-joined table needs to be searched for the specified value. However if in a call the column is not specified as a selection criterium, the left join needs not be considered when selecting the rows to search through. i.e. the intermediate results are significantly smaller. Of course care must be taken to not turn inner joins that functionally filter unmatched rows into lef-joins.
select cnt.cntJFHillebrandRef, emqt.eqmtID, eqmt.eqmtDescription  
from dbo.Containers cnt
left outer join dbo.EquipmentTypes eqmt on (eqmt.eqmtID = cnt.eqmtID)

Aggregated value is only valid if another field has the same values



In some queries you need an aggregated value (sum, avg, count, etc) which is only to be shown if another field in the same row has exactly 1 value. For example order detail summaries are to be summed only if the currency on those details all have the same currency, in all other cases no order total can be calculated. This is easy to implement in one query:
select  
odtl.ordGUID,
odtl.odtlDeleteDate,
case
when coalesce( nullif( max(odtl.crncCode), min(odtl.crncCode)),
nullif(min(odtl.crncCode), max(odtl.crncCode))) is null
then max(odtl.crncCode)
else null
end as crncCode,
case
when coalesce( nullif( max(odtl.crncCode), min(odtl.crncCode)),
nullif(min(odtl.crncCode), max(odtl.crncCode))) is null
then sum(odtl.odtlProductValue)
else null
end as odtlProductValue
from dbo.OrderDetails odtl
group by odtl.ordGUID, odtl.odtlDeleteDate

No type conversions



If a query requires 2 values to be compared, try to keep the both values of the same data type. If a conversion is required before the values can be matched, this will usually result in a index or even a table scan. If a type conversion can not be avoided, put in an explicit type conversion on the value that is expected to deliver the least rows.

For example: LookupValues.luvValue is of type varchar, EquipmentTypes.eqmtGUID is of type uniqueidentifier. LookupValues has many more rows than EquipmentTypes, i.e. we should convert eqmt.eqmtGUID:
select 1  
from dbo.LookupValues luv
left outer join dbo.EquipmentTypes eqmt on (convert(varchar(36), eqmt.eqmtGUID) = luv.luvValue)
where eqmt.eqmtGUID is null -- not exists in eqmt
and luv.ifgrCode = 'FLOW'
and luv.luenCode = 'eqmt'

Use a temporary configuration table



Sometimes you need a specific set of decisions taken depending on the data found in a table. Such decisions can be implemented using if constructions in a procedure. However these if constructions usually lead to several queries being fired. If there is only a limited set of decisions to be taken you can also opt to declare a memory table and fill that with appropriate data so that join-ing your data with that configuration table will result in the correct decisions in one query.

An example from a report procedure where 4 copies were to be printed if the package volume was pallet and 1 copy for boxes:
declare @tblConfig (  
pckvID T_KEYSEQUENCE not null,
nCopies int not null,
primary key (pckvID)
)

insert @tblConfig( pckvID, nCopies)
select 'PL', 4
union all
select 'BX', 1

select odtl.prdCode, t.nCopies
from dbo.OrderDetails odtl
inner join @tblConfig t on (t.pckvID = odtl.pckvID)

Checks before insert, delete and update



Insert, delete and update are very 'expensive' operations compared to select statements. They usually involve a lot of IO and in a multi user environment they will lock down certain data for at best a short period of time. If in a procedure or trigger inserts, deletes or udpates are more often not fired than that they are fired, you should check whether the statement will have any effect on beforehand by adding an if-exists-statement around the statement. The existence check is much faster and does not put as many locks or as high a level of locks on the data as the statement itself.
if exists (  
select top 1 1
from Inserted i
left outer join dbo.Table tbl on (tbl.key = i.key)
where tbl.key is null -- not exists in tbl
)
begin
insert dbo.Table( key, col2)
select i.key, i.col2
from Inserted i
left outer join dbo.Table tbl on (tbl.key = i.key)
where tbl.key is null -- not exists in tbl
end

if exists (
select top 1 1
from Inserted i
inner join dbo.Table tbl on (tbl.key = i.key)
where nullif( tbl.col2, i.col2) is not null
)
begin
update tbl
set
col2 = i.col2
from Inserted i
inner join dbo.Table tbl on (tbl.key = i.key)
where nullif( tbl.col2, i.col2) is not null
end

Optional joined table pairs



Sometimes you need a certain row from table A only when there is a matching row in table B and you need neither of them if there is no such match. you need to specify that the inner join needs to be evaluated before the left outer join on the pair. To do this is actually just a matter of putting additional parenthesis '(' in your query, however syntactically this can be challenging and so often, this is implemented as 2 left joins plus an additional where clause testing that both are non-null. This however requires the query engine to evaluate the both tables without the additional join condition first, to later on filter out all unmatched rows. i.e. a lot of unneeded IO and intermediate storage is involved. Here's an example that shows the syntax and performs much better:
select top 10 fls.flsJFHillebrandRef, ctpr.ctprFullName as EHOContact  
from dbo.Files fls
left outer join (dbo.Users usr
inner join dbo.ContactPersons ctpr on (ctpr.cmpnGUID = usr.cmpnGUID and ctpr.ctprGUID = usr.ctprGUID)
) on (usr.usrGUID = fls.usrGUIDEHO)



Transactions must be kept short



Transactions are important to avoid race conditions. However while inside a transaction you are cummulating all locks along the way, thereby possibly blocking other users from accessing the objects you've touched upon until you've finished your transaction. i.e. other users needing to access (even just reading depending on the locks applied) need to wait until your transaction is finished. You should take great care to optimise the actions inside any explicit transaction. i.e. collect as much as possible data before beginning the transaction and only do the actual updates/inserts/deletes inside the transaction.

with (nolock)



Adding the with (nolock) clause can give considerable performance increase. It makes the queries results unreliable though. with (nolock) should therefore be used with care. I deliberatly do not give an example here as I personally feel with (nolock) should be avoided.

Comments

# SSQA.net - SqlServer-QA.net said on February 5, 2008 2:07 AM:

SQL Server Performance Tuning, sounds familiar on a DBA perspective and though it sounds like an easy

# Other SQL Server Blogs around the Web said on February 5, 2008 2:44 AM:

SQL Server Performance Tuning, sounds familiar on a DBA perspective and though it sounds like an easy

# Log Buffer #83: a Carnival of the Vanities for DBAs said on February 8, 2008 11:40 AM:

PingBack from http://www.pythian.com/blogs/817/log-buffer-83-a-carnival-of-the-vanities-for-dbas

# TrackBack said on February 9, 2008 5:14 AM:
# SQL Server Security, Performance & Tuning (SSQA.net) said on March 18, 2008 3:08 AM:

First part of question is a pure newbie FAQ to know about, and the simple answer would be it is one of

# SQL Server Knowledge Sharing Network (SqlServer-qa.net) said on May 15, 2008 2:36 AM:

One of the intersting post in forums and replies to them: I recently took a new position as the DBA for

Anonymous comments are disabled

About SQL Master

**__________________________________** SQL Server MVP, Sr. DBA & industry expert. - Knowledge is of two kinds. We know a subject ourselves or we know where we can find information on it. It is also a power and you will gain by sharing it.

Search

Go

This Blog

«February 2008»
SMTWTFS
272829303112
3456789
10111213141516
17181920212223
2425262728291
2345678

Syndication