Follow SQLMaster on Twitter

SQL Server Performance patterns of a UDF with datetime parameters

Published 11 May 09 12:10 AM | SQL Master 

There was an interesting discussion about User Defined Fuctions (UDF) performance and patterns, the actual phenomenon is that UDF has got parameters to pass on during the execution.

Just to talk about introduction of such options in the SQL Server history, stored procedures and views have been available in SQL much longer than UDFs, but each of these objects has their niche in SQL Server development. Stored procedures are great for processing complex SQL logic, securing and controlling access to data, and returning a rowset to a calling routine whether that routine is a VB or any other frontend program or another TSQL batch. Unlike views, stored procedures are compiled, making them ideal candidates to represent and process frequently run SQL statements. Views are great for controlling access to data, but they do it differently than stored procedures. Views are limited to only certain columns and rows from the underlying SELECT statement that generated the view. Thus a view is often used to represent a commonly used SELECT statement that may join several tables, employ a WHERE clause, and expose specific columns. Views are often found in the FROM clause of a SQL statement joined to other tables and views.

Scalar value-returning UDFs are most similar to what many programming languages refer to as functions. They return a single value consisting of a scalar datatype such as integer, varchar(n), char(n), money, datetime, bit, and so on. UDFs can also return user-defined datatypes (UDDTs) if they are based on a scalar datatype. With UDFs that return either inline or multistatement tables, a rowset can be returned via the table datatype. However, not all datatypes can be returned from UDFs. For example, a UDF cannot return a value of any of these datatypes: text, ntext, image, cursor, or timestamp.

The default behaviour of scalar UDFs will be like single statement execution, not like batch one where the difference between them is optimizer will have to pickup the best plan to execute. Due to this SQL Server must execute each function on every row, using any function incurs a cursor like performance penalty.  One way UDFs are simple to complete the task for that reason and can be coverted into single-expression, the actual execution of Scalar UDF will have the performance issue whatever way you may try to optimize the list of TSQL statements. So the actual execution time will vary due to fetching the large number of rows, in a sense 1000 or more. As it works like cursor mechanism, during the followup of each row the UDF is executed which will cause usual performance loss. To add more fuel to the problem if the WHERE clause on that UDF includes any filtering then complete execution will have to move slowly as it must invoke each UDF on each row where it is used in the select list, this happens when creating the results of the query to pass to the next stage of query processing.

Coming to the subjective discussion the 2 logic scenarios to be followed as declare local variables, set values to the variables & issue a query referring to the variables in the query filter. So the execution of batch will have the first 2 steps and then actual part of execution needs to be optimized which is in vaccum as the variable values are unknown which ends up using a hard coded guess (which happens to be 30% selectivity for a range filter). On the other scenario (2) with specified constants in the filter, SQL Server expands the definitions of of the UDFs, substituting the parameter values with the specified constants, and the optimizer will optimize the expanded query with knowledge of the values in the filter. So the actual execution part of UDF will be far better on the second scenario of specified constants will be better as indexes are better used. To construct above I have referred following TSQL statements from Microsoft technet links (revisiting old material):

use northwind;
go

create function dbo.FuncWV(@d as datetime) returns table
as return
select orderid, orderdate, customerid, employeeid
from dbo.orders
where orderdate >= @d;
go

So here if the code has the declaration of variable to a value as follows::

declare @myd as datetime;
set @myd = '19980506';

select orderid, orderdate, customerid, employeeid from dbo.f1(@myd) as t;

So look at the execution plan of above query where you will see the predicate identified the optimize which is the ORDERDATE variable:
orderdate >= @myd

So the optimizer is required to fine tune the query where this value in the predicate is unknown at optimization time:

select orderid, orderdate, customerid, employeeid from dbo.orders where orderdate >= ?;

Scalar datatype-returning UDFs can be used in various situations to make the code more maintainable, reusable, and less complex. This can be very useful when the same segment of T-SQL code is used in several places, perhaps by several stored procedures and batch SQL statements.  Every now and often I see that UDF execution involves full table scan (clustered index too) causing performance loss as per the range filter causing the hard-code guess of rows selectivity. As the same discussed above on scenario 2 and to demonstrate the same the above code has been modified to use a constant:

select orderid, orderdate, customerid, employeeid from dbo.f1('19980506') as t;

So the above query's execution plan  will have to predicate identified as per the optimizer, which is ORDERDATE >=.
As per this optimizer already takes on the value in the predicate which helps to fetch selectivity estimate to nearest possible with better performance, such as:

select orderid, orderdate, customerid, employeeid from dbo.orders where orderdate >= '19980506';

When you look at the execution plan the lookup on the orderdate column, calculate the selectivity of the predicate with an option of using index on orderdate column. Also in some cases the optimizer will not use previously cached plan, due to the query statements against the function use similar selectivity ranges. Back in SQL 2000 days UDFs are seen as evil, it seems that SQL Server’s query optimisation fails to deal with UDFs properly. Basically it optimises and builds the execution plan without ever looking inside the UDFs to find out what they’re doing, so in all the optimisation isn’t worth the cycles it consumes.

In some cases when you have nesting within UDF then one catch with nesting functions is that built-in functions that are nondeterministic, such as the getdate function, cannot be nested inside of another UDF (otherwise, a SQL Server error is raised). A nondeterministic function is one which may not return the same result when called multiple times with exactly the same parameters. The getdate function falls into this category since every time it is called, it returns the new current date and time. Another commonly used nondeterministic built-in function is the NewID function. It is also nondeterministic as it always returns a unique GUID and, as such, the NewID function is not allowed to be nested within a UDF. SO as per the BOL discussion talking about table-valued UDFs: Within the category of table-valued UDFs there are two sub-types: inline table value-returning UDFs and multistatement table value-returning UDFs. UDFs that return inline tables return a rowset via the SQL Server table datatype. They are defined with a single SELECT statement making up the body of the function. Inline table value-returning UDFs cannot contain additional T-SQL logic outside of the SQL SELECT statement that defines the table it will return. However, they are simpler to create than UDFs that return multistatement tables since they do not have to define the exact table structure to be returned. UDFs that return inline tables extrapolate the structure of the rowset from the SELECT statement itself.

To sumup the discussion assume that there is a UDF that performs a SQL SELECT statement that takes one second to execute. If this UDF is used in a SELECT or a WHERE clause it will be executed for every row. Thus the time the main query takes to execute could increase drastically depending on such factors as the number of rows evaluated and returned and the types of indices in place. Before using a UDF in this type of situation, carefully weigh the options and do some performance testing.  There are good references on the blogs about the UDF performance, that I would like to refer:

An Introduction to SQL Server Scalar UDFs

Alexander Kuznetsov : Not all UDFs are bad for performance.

Are UDFs Harmful to SQL Server Performance?

Using schema binding to improve SQL Server UDF performance

 

Comments

# Other SQL Server Blogs around the Web said on May 11, 2009 1:13 AM:

There was an interesting discussion about User Defined Fuctions (UDF) performance and patterns, the actual

# SQL Server Security, Performance & Tuning (SSQA.net) : SQL Server Performance patterns of a UDF with datetime parameters said on May 11, 2009 1:51 AM:

PingBack from http://sqlserver-qa.net/blogs/perftune/archive/2009/05/11/5383.aspx

Anonymous comments are disabled

About SQL Master

**__________________________________** SQL Server MVP, Sr. DBA & industry expert. - Knowledge is of two kinds. We know a subject ourselves or we know where we can find information on it. It is also a power and you will gain by sharing it.

Search

Go

This Blog

«May 2009»
SMTWTFS
262728293012
3456789
10111213141516
17181920212223
24252627282930
31123456

Syndication