-
Back until SQL Server 2005 days there was a high demand for third party tools and scripts to collect the data based upon the requirement, analyze and produce a report.
SQL Server 2008 gets you Management Data Warehouse, a relational database that contains the data that is collected from a server using the new SQL Server 2008 data collection mechanism - BOL is your best friend to get to know more on introduction & How-To in specific to Management Data Warehouse.
As it referred it only works from SQL Server 2008 version onwards, where you can monitor & install the management data warehouse on the same instance of SQL Server that runs the data collector. However, if server resources or performance is an issue on the server being monitored, you can install the management data warehouse on a different computer. As the collection is stored in the database there is a different schema used for them which is called/pre-fixed as 'core', these objects are shared among all the data tables created for individual collector types. This schema is locked and can only be modified by the owner of the MDW database. See BOL explanation:
| Table name |
Description |
|
core.performance_counter_report_group_items |
Stores information about how the management data warehouse reports should group and aggregate performance counters. |
|
core.snapshots_internal |
Identifies each new snapshot. A new row is inserted into this table whenever an upload package starts uploading a new batch of data. |
|
core.snapshot_timetable_internal |
Stores information about the snapshot times. The snapshot time is stored in a separate table because many snapshots can happen at nearly the same time. |
|
core.source.info_internal |
This table stores information about the data source. This table is updated whenever a new collection set starts uploading data to the data warehouse. |
|
core.supported_collector_types_internal |
Contains the IDs of registered collector types that can upload data to the management data warehouse. This table is only updated when the schema of the warehouse is updated to support a new collector type. When the management data warehouse is created, this table is populated with the IDs of the collector types provided by the data collector. |
|
core.wait_categories |
Contains the categories used to group wait types according to wait_type characteristic. |
|
core.wait_types |
Contains the wait types recognized by the data collector. |
|
core.purge_info_internal |
Indicates that a request has been made to stop the removal of data from the management data warehouse. |
Here are few gotchas that you need to follow/monitor while you setup the MDW for your SQL Server 2008 instances, SQL Server AGent service plays important role in this collector process. So you need to ensure that SQLAgent is configured to run using one of the System service accounts (Local System, Network Service, or Local Service), and the management data warehouse is created on a different instance from the data collector, you must configure collection sets to use a proxy for uploading data to the management data warehouse. There should an alert in place to ensure SQLAgent is up and running all the times.
Also the suggestion is not to modify any of the metada that is collected by MDW, it is required only if you are adding a new collector type by having your own code. The 3 different types of data collectors are " Generic T-SQL Query Collector Type", "Generic SQL Trace Collector Type" & "Query Activity Collector Type". There are few stored procedures & functions available specific to data collector & MDW information :
|
sp_syscollector_create_collection_item |
sp_syscollector_set_cache_window |
|
sp_syscollector_create_collection_set |
sp_syscollector_set_warehouse_database_name |
|
sp_syscollector_create_collector_type |
sp_syscollector_set_warehouse_instance_name |
|
sp_syscollector_delete_collection_item |
sp_syscollector_start_collection_set |
|
sp_syscollector_delete_collection_set |
sp_syscollector_stop_collection_set |
|
sp_syscollector_delete_collector_type |
sp_syscollector_run_collection_set |
|
sp_syscollector_delete_execution_log_tree |
sp_syscollector_update_collection_item |
|
sp_syscollector_disable_collector |
sp_syscollector_update_collection_set |
|
sp_syscollector_enable_collector |
sp_syscollector_update_collector_type |
|
sp_syscollector_set_cache_directory |
sp_syscollector_upload_collection_set |
Before they are executed you must ensure that proper parameters are passed through, and last week I had an opportunity to work along with PSS engineer to monitor a performance problem on a SQL Server 2008 R2 machine (to be upgraded). During this exercise I have been given few scripts to see under the bonnet of MDW (later I found them on PSS blog too).
Among all the SNAPSHOTS.PERFORMANCE_COUNTER_VALUES is the key aspects which stores the data related to performance counters for those specific instances. You can use this query to see what are the formatted values:
select spcv.formatted_value as 'Formatted Value', spcv.collection_time as 'Collection Time'
from snapshots.performance_counter_values spcv order by spcv.collection_time desc
Also you can run following query to obtain path, data from different databases & name of counters:
select spci.path as 'Counter Path', spci.object_name as 'Object Name',
spci.counter_name as 'counter Name', spci.instance_name,
spcv.formatted_value as 'Formatted Value',
spcv.collection_time as 'Collection Time',
csii.instance_name as 'SQL Server Instance'
from snapshots.performance_counter_values spcv,
snapshots.performance_counter_instances spci,
msdb.dbo.syscollector_collection_sets_internal scsi,
core.source_info_internal csii,
core.snapshots_internal csi
where spcv.performance_counter_instance_id = spci.performance_counter_idand
scsi.collection_set_uid=csii.collection_set_uid and
csii.source_id = csi.source_id and csi.snapshot_id=spcv.snapshot_id and
scsi.name = 'Disk Performance and SQL CPU'
order by spcv.collection_time desc
At times we have seen problem of deadlocks on the instance where MDW is configured, in order to avoid that the job schedules for collection_set_3_upload & collection_set_2_upload has been changed. Also at times the upload job was failing with no information, so a periodic restart of SQLAgent service would help to avoid any such problems where you will not miss any operation to capture by MDW.
I will post further gotchas as and when they are experienced & collected as a part of my DBA journey.
-
PROFILER is very handy tool to trace the events on SQL Server, like wise SQL Server 2008 gets the new feature Extended Events which is a light-weight & less intrusive tool.
Still there are number of DBAs out there are really scared to invoke PROFILER on a production environment, the impact of running this tool for longer time is high and it is also very important tool for DBA to see what exactly is happening on server side. This is where a server side trace is very handyperform a server-side trace of the events. You can use the system SPs such as sp_trace_create, sp_trace_setevent, sp_trace_setfilter & sp_trace_setstatus like it is mentioned here.
The reason of impact is important for every DBA when the environment is flaky and application is highly dependable on SQL Server, to get to know the performance issues the server side trace is very handy and at times you can enable them to monitor the platform efficiently and for such actions there are few good third party tools available.
On the subject line, at one of the cleint's place I had this problem when the PROFILER trace has been attempted just for 15 minutes that too on a development enviornment. It was a confusing situation for a moment and for further information KBA929728 clarifies that:
When you run SQL Profiler against an instance of SQL Server 2000 or of SQL Server 2005, each active SPID of the instance requires an exclusive latch (LATCH_EX) to write the event to SQL Profiler. The SPID writes the event to the local memory buffer first. Then, the event is transferred to SQL Profiler. If the local memory buffer is full, the SPID discards the event. After the events in the local memory buffer are transferred to SQL Profiler, the SPID can write new events to the local memory buffer. If other SPIDs are waiting to write events to SQL Profiler, these SPIDs wait in a queue.
In addition to this activity Profiler buffers the trace data in temporary files on the server and by default these temporary files are stored on C: drive causing the error: system is low on disk space on drive 'C'. All SQL Server profiler functions are temporarily disabled. To ensure further I have checked that all these temporary files to buffer from profiler trace were stored on the directory C:\documents and settings\<username>local settings\temp\1 with the name Pref*.tmp and all these files are locked by Profiler so you cannot delete them, they will disappear as soon as you exit from the SQL profiler.
Also I have seen many complaints further stating when you run PROFILER the SQL instance will be very slow, by design the SQL Profiler is a process that is separate from the SQL Server process, transferring each event to SQL Profiler takes some time. As a good practice DBAs would always run the Profiler on a different computer that causing a slow action, because of network latency and because of the performance of the computer that is running SQL Profiler.
In such situations I strongly recommend PSSDiag data collection utility which is a very useful & low intrusive tool to collect trace data files, see the link on how to use the tool. Before closing this topic here is a good tip I would like to share (you may have known by this time), say if you are using SQL Server 2008 client tools and Profiler then you can easily export the trace definition (see below). Once you have set the trace definition Click on File --> Export --> Script Trace Definition, here is where you can choose to export to 2005 or 2008 version and seperate link to 2000 version too.

Here is the best resource about SQL Profiler tool a ebook "Mastering SQL Server Profiler" from Brad Mcgeehee, SQL Server MVP, even I do go through this book whenever I wanted to know on few tips & tricks.
Also forgot to mention that you can take help of Process Monitor which is a ex-SysInternal tool (very handy):
Process Monitor is an advanced monitoring tool for Windows that shows real-time file system, Registry and process/thread activity. It combines the features of two legacy Sysinternals utilities, Filemon and Regmon, and adds an extensive list of enhancements including rich and non-destructive filtering, comprehensive event properties such session IDs and user names, reliable process information, full thread stacks with integrated symbol support for each operation, simultaneous logging to a file, and much more. Its uniquely powerful features will make Process Monitor a core utility in your system troubleshooting and malware hunting toolkit.
-
Here is an interesting 'sql security' discussion I had during the SASSUG usergroup interaction with regard to BUILTIN\Administrator role from SQL Server 2008 onwards.
Since SQL Server 2005 the instance has been classified as 'secured by design/default' by cutting down the elevated access to the SQL Server. From the operating system side login privilege, by default SQL adds the group BUILTIN\Administrators to the sysadmin fixed server role, meaning that any local NT administrator is automatically a SQL Server's sysadmin. That gives a right to access server's local administration group logins on the SQL Server, which contradicts the security policy of least-privileged on an enterprise network.
There is always a certain level of advantage of having BUILTIN\Administrator access to SQL in case of any lockout of SA (privileged) logins to fine tune the problem on SQL Server, but not a real-time lock down in terms of security best practices. Say in the case of Windows Authentication usage and there is a problem when using a SA privileged login then you must have to switch the authentication to use MIXED mode by using SA password that you had setup during the installation. Further fuel to the problem if the SA password has been lost/forgotten then it is a absolute nightmare to continue the access of SQL Server, a complete lock down of databases! As a best practice you must use role assignments and workflow processes to ensure that only trustworthy logins are allowed on SQL Server.
Here is what a comfort level of news for DBAs that SQL Server 2008 no longer adds the BUILTIN\Administrators as members of the sysadmin fixed server role! In terms of lock down policies of SQL Server environment this enhancement reduces the surface/attack area for instances & databases by implementing the policy of 'least privileged access' with a clear indication of OS administrators & SQL adminsitrators. Further to topic the local Windows Group BUILTIN\Administrator is no longer provisioned as a login in the SQL Server sysadmin fixed server role by default at SQL Server setup install, that causes server administrators cannot login to the new SQL Server 2008 and SQL Server 2008 R2 instance by default.
Its worth referring to provision of sysadmin fixed server role during the setup of SQL instance, remember service SIDs (security IDs) from Operating System that are available from Windows vista onwards and enhanced from Windows Server 2008 onwards to enable a service isolation provision, which is a way to access/grant specific objects without having to either run in a high-privilege account or weaken the security protection of the object. To accomplish this feature you can design your SQL Server service to use such identity to restrict access to its resources by other services or applications. Referring from BOL notes that such feature of "service isolates an object for its exclusive use by securing the resource by using an access control entry that contains a service SID. This per-service SID is derived from the service name and is unique to that service". Translating the same to actual process to deep dive of registry access on server that a registry key in HKEY_LOCAL_MACHINE\SOFTWARE would usually be available only to services that have administrative credentials. By adding the per-service SID to the access control list of the key, the service can run in a lower-privilege account, but still have access to the key.
You may remember the SQL Server setup from 2008 version onwards that creates a service group for each component of SQL Server. On Windows Server 2003 and Windows XP, service accounts for SQL Server services are added to the local service groups and function on these operating systems as they have in previous SQL Server releases; SQL Server services connect to an instance of SQL Server as members of the local group. On Windows Server 2008 and Windows Vista, the service SID is added to the local security group instead of the SQL Server service account. The new installation methods lets the administrators to control the windows principals that are required to be provisioned in the SQL Server sysadmin fixed server role, so the default duty for admins to provision the logins 'due consideration' to the sysadmin server role membership as this is a high-privileged server role.
Coming back to subject line that it is true that BUILTIN\Administrators are removed from elevated access to SQL Server, but not in the case of emergency situation. Such as if you need to use Dedicated Administrator console (DAC) method or starting the service in single-user mode that lets the local administrator group logins able to login to SQL instance. So that you can decide (or as per enterprise security policy) on standards of letting the specific logins/group to be a sysadmin privilege during the SQL setup, use it wisely at the button which basically says "make me a sysadmin" which will add the account of the user doing the install as a sysadmin. To close the topic in case if you are wondering how to control the situation in SQL 2005 mode then refer to this How to make unwanted access to SQL Server 2005 by an operating system administrator more difficult article.
-
Do you watch/review the informationa messages that are generated in SQL Server error log?
Do you have a alert mechanism to capture informational/warning/error messages within your data platform?
The first question itself catered as informational, I believe at ceratin messages are supressed that are generated from application. But you should not attempt to supress such messages generated from SQL Server. For instance the one on subject line "SQL Server has encountered '%' occurrence(s) of cachestore flush for the '%' cachestore" which is an important aspect for DBA to ensure that server's cache is not flushed repeatedly.
Going back to basics, SQL Server has a pool of memory that is used to store both execution plans and data buffers. The percentage of the pool allocated to either execution plans or data buffers fluctuates dynamically, depending on the state of the system. The part of the memory pool that is used to store execution plans is referred to as the procedure cache. Such an important aspect of performance tuning you should not ignore the message, as and when any SQL statement is executed in SQL Server 2005, the relational engine first looks through the procedure cache to verify that an existing execution plan for the same SQL statement exists. SQL Server memory settings reuses any existing plan it finds, saving the overhead of recompiling the SQL statement. If no existing execution plan exists, SQL Server generates a new execution plan for the query.
As the reasons are clear that if you have executed the DBCC FREEPROCCACHE or DBCC FREESYSTEMCACHE commands to flush procedure cache then the messages will be logged to SQL Server error log. If you haven't passed on any parameters by running : DBCC FREEPROCCACHE WITH NO_INFOMSGS; it will clear all the plans from the cache. Rather than that you can also clear a specific plan from cache using the following example (source: BOL):
USE AdventureWorks;
GO
SELECT * FROM Person.Address;
GO
SELECT plan_handle, st.text
FROM sys.dm_exec_cached_plans
CROSS APPLY sys.dm_exec_sql_text(plan_handle) AS st
WHERE text LIKE N'SELECT * FROM Person.Address%';
GO
-- Remove the specific plan from the cache.
DBCC FREEPROCCACHE (value of plan_handel from above query);
GO
Not only this there are certain activities of database may also cause the cache clear action, such as having AUTO_CLOSE ON for the database or a database snapshot is dropped. Also the other configuration settings such as database state to OFFLINE or ONLINE and restore the backup for that database willhave the action of cache clear. In addition to this the cache is cleared if one of the following server options is changed by the RECONFIGURE statement:
-
- cross db ownership chaining
- index create memory (KB)
- remote query timeout (s)
- user options
- max text repl size (B)
- cost threshold for parallelism
- max degree of parallelism
- min memory per query (KB)
- query wait (s)
- min server memory (MB)
- max server memory (MB)
- query governor cost limit
- Procedure cache will not be cleared if the actual value does not change or if the new value for the max server memory server option is set to 0.
Also you can take help of PERFMON to capture this type of actions by uisng the Proces & SQLServer: Plan_cache performance objects See below:
- Performance object: Process
Counter: %Processor Time
Instance: sqlservr
The value of this counter will increase because of increased CPU activity. Essentially, the whole procedure cache is cleared if this issue occurs. Therefore, subsequent requests must generate new plans to be cached. This behavior will slightly increase CPU activity.
- Performance object: SQLServer:Plan Cache
Counter: Cache Object Counts
Instance: _Total
Performance object: SQLServer:Plan Cache
Counter: Cache Pages
Instance: _Total
The values of these counters will suddenly decrease.
Note For a named instance of SQL Server 2005, the performance object is named MSSQL$InstanceName:Plan Cache.
- Performance object: SQLServer:SQL Statistics
Counter: SQLCompilations/sec
The value of this counter will significantly increase after this incident.
Note For a named instance of SQL Server 2005, the performance object is named MSSQL$InstanceName: SQL Statistics.
-
No matter how best you adopt the indexing practices, few times you have to tweak the resources to obtain optimized performance. This is where a degree of parallelism needs to be applied for your complex/large resulted queries on your SQL Server, having a multiple CPU based server is a quite common criteria.
By design SQL database engine with more than CPU detects the best degree of parallelism, that is number of processors employed to run a single statement, for each parallel plan execution. Unless you are well aware of your code about resulting the data based on conditions you should use/control the max degree of parallelism (MAXDOP) option to limit the number of processors to use in parallel plan execution. By default the value is 0 which means all the CPUs will be used for execution process, and if a value greater than the number of available processors is specified, the actual number of available processors is used. A question has been raised within the usergroup what if the server has only 1 CPU, the answer is if the computer has only one processor, the max degree of parallelism value is ignored.
ON the other side it is always debatable that number of processors that are used by the query optimizer typically provides optimal performance. But in the case of heavy operations such as CREATE/REBUILD/DROP processes of indexes (large) will have a negative impact as resource intensive and can cause insufficient resources for other applications and database operations for the duration of the index operation. This is where DBAs get pounded with complaints against SQL instance is down, so you have to carefully cater the MAXDOP tomanually configure the number of processors that are used to run the index statement by specifying the MAXDOP index option and limiting the number of processors to use for the index operation.
Here is a old reference about a TSQL to obtain a list of current execution of Parallel Plans (MAXDOP) on your environment, but a slight change in SQL Server 2008 version onwards, whether are not SQL instance has been setup on SMP/NUMA/HyperThreading processors you will need to configure MAXDOP appropriately. As SQL Server 2008 brings Resource Governor to control the partial aspect of the need you should not use MAXDOP hints or 'affinity mask' unless it has been tested thorougly. If you got Enterprise Edition and performing Parallel Index operation then make sure you re-visit the good features of Resource Governor in this case, here is a reference from BOL on parallel index execution and the MAXDOP index option apply to the following Transact-SQL statements:
- CREATE INDEX
- ALTER INDEX REBUILD
- DROP INDEX (This applies to clustered indexes only.)
- ALTER TABLE ADD (index) CONSTRAINT
- ALTER TABLE DROP (clustered index) CONSTRAINT
While I was at Tech-Ed NA conference I happened to be hearing a discussion from a user & SQL Dev.team (DB engine) about guidelines on how & when this MAXDOP should be configured. The response is say if the server consist 8 (or less) CPU then you can configure the MAXDOP = 0 to 8 (the ideal number is 4) ensuring other resource intensive operaitons do not clash with parallel index operation or if it is more than 8 CPUs then value of max degree of parallelism = 8. Here is the continuation to that interesting discussion, if the server is NUMA configured then should not exceed MAXDOP value that are assigned to each NUMA node with value capped to 8, for more information on NUMA see How It Works: Soft NUMA, I/O Completion Thread, Lazy Writer , Configuring SQL Server 2005 for Soft NUMA - Slava Oks's WebLog & How to: Configure SQL Server to Use Soft-NUMA related threads. The general recommendation on HyperThreading is to disable as a best practice, just in case if you use then again the value should not exceed the number of physical processors. Also the tip on what kind of DMVs you can use for NUMA based information, use the Dynamic Management views sys.dm_os_sys_info and sys.dm_os_nodes *see BOL.
The value 8 seems to revolve around the default value on the default 2 QUAD core based processors having typical workload (OLTP & DB maintenance), so to continue (clarify my doubts) on further recommendations I have asked any specific reference on guidelines about when to use/configure MAXDOP and in case if the CPUs are more than 64 (it is possible). Here are those links for SQL Server 2005 and SQL Server 2008 recommendations see KBA General guidelines to use to configure the MAXDOP option in specific to SQL Server 2008 R2 version recommendations and guidelines see Best Practices for Running SQL Server on Computers That Have More Than 64 CPUs for obvious reasons of number of processors support in R2 version. Also see the further recommendation for servers running on HyperThreading enabled processors another KBA SQL Server support in a hyper-threaded environment.
-
In the recent times I had been through a peculiar performance issue where the application has been designed with Jave & Websphere application has been used having SQL Server as backend to store data.
The problem is a mixture of performance and deadlock issues within SQL Server causing more chaos, going back to basics the deadlock is a condition that occurs on the system with multiple threads and database systems are no stranger. During this time the system resources are used heavily and sometimes memory is blocked causing a downtime on server side too. So if the resource being acquired is currently owned by another thread, the first thread may have to wait until the target resource releases. So by relating such conditions on the SQL Server environment the deadlocks are identified from SQL Server error log information:
Your transaction (process ID #52) was deadlocked on {lock | communication buffer | thread} resources with another process and has been chosen as the deadlock victim. Rerun your transaction.
Back to the problem resolution that whenever you see such deadlock information it is ideal to check the isolation settings of SQL Server such as Read_Committed_Snapshot and Allow_Snapshot_Isolation values. These 2 isolation levels do offer some level of restrictive locks whenever updates are process, so the exceptions are SNAPSHOT and READ_Committed is on then the levels do no acquire shared locks on the rows durng the READ operatons. Further on I had a thought on check the IMPLICIT_TRANSACTION settings which sets the implicit transaction mode. The problem with this setting is transactions that are automatically opened as the result of this setting being ON must be explicitly committed or rolled back by the user at the end of the transaction. Otherwise, the transaction and all of the data changes it contains are rolled back when the user disconnects. After a transaction is committed, executing one of the statements above starts a new transaction, this causes the longer time for locking and creates a chain of actions within the process. So having the involvement of JDBC & third party driver to connect to SQL Server, it is ideal to check what kind of settings are obtained by default on those drivers.
As explained in the BOL and other documentation, in order to find out the deadlock as a graph you can take help of Profiler that provides deadlock wait-for graph contains process nodes, resource nodes, and edges representing the relationships between the processes and the resources. By default the SQL DB Engine typically performs periodic deadlock detection only, having the number of deadlocks encountered in the system is usually small, periodic deadlock detection helps to reduce the overhead of deadlock detection in the system. The default plan of action will be after a deadlock is detected, the Database Engine ends a deadlock by choosing one of the threads as a deadlock victim. The Database Engine terminates the current batch being executed for the thread, rolls back the transaction of the deadlock victim, and returns a 1205 error to the application. Rolling back the transaction for the deadlock victim releases all locks held by the transaction. This allows the transactions of the other threads to become unblocked and continue. The 1205 deadlock victim error records information about the threads and resources involved in a deadlock in the error log.
Now comes actual troubleshooting part of the deadlocks, once you get the identification of SPID that is causing issue the first option is to check how good the indexes are being used when the queries are executed and underline fillfactor for those indexes. Also for the SELECT based queries it is always best to use NOLOCK hint and for smaller transactions that are performing DML activities you can take help of REPEATABLE READ hint, so if you opt to use ROWLOCK or UPDLOCK that acquire row-level locks may place locks on index keys rather than the actual data rows. For example from BOL, if a table has a nonclustered index, and a SELECT statement using a lock hint is handled by a covering index, a lock is acquired on the index key in the covering index rather than on the data row in the base table.
So taking back the actual configuration on the environment I was more inclined to check what kind of isolation level used by JDBC in particular as the application has been developed using Java. So the initial thought was to look at the version of JDBC driver used which is version 1.2 that has some limitations. In order to eliminate the deadlocks & performance problem the initial indexing & fillfactor based checkup has improved the performance but not the deadlock situation. So to avoid this permanently the application has been tested by a download of Microsoft SQL Server JDBC Driver 2.0 and this driver supports features introduced in the JDBC 4.0 API, including:
· national character set data types: NCHAR, NVARCHAR, LONGNVARCHAR, NCLOB
· SQLXML data type
· Wrapper interface to access SQL Server JDBC Driver specific methods
· client info properties
· new database metadata methods
· LOB creator methods
The 2.0 driver also adds:
· default adaptive response buffering behavior
· support for SQL Server 2008 collations
· enhanced tracing, including public method entry and exit traces
· performance improvements and bug fixes
Here are few reasons that I would like to refer about JDBC 2 driver issue, the connectivity is fine with down-level client and when such conversions are allowed/performed teh application can pass thru the queries such as update to data on SQL Server. There is a high chance that you might be using a older data type and new data types within SQL Server 2008 such as time,date,datetime2 will need to have latest driver. So here is another reference for you to working with SQL Server 2008 Data types and Filestream usage and also the related information about down-level compatibility for the new data types
see Using Date and Time Data (http://go.microsoft.com/fwlink/?LinkId=145211) and FILESTREAM Support (http://go.microsoft.com/fwlink/?LinkId=145212) links.
Finally when you perform such driver changes it is also recommended to perform an end to end testing and if you stlll using legacy based drivers here is the information from IBM about connecting pooling issues.
-
Going back to basics 'parameter sniffing' is a process where SQL Optimizer helps (sniffs) the execution process any current parameter values during compilation or reccompilation process. It will attain a pass-along values to the query optimizer so that the values can be used to generate a potential faster query execution plan/policy. That helps in some cases but at not all of the execution fashion, such as huge volume of rows are returned.
It is nothing but SQL optimizer replaces the constants 'literal' that are appearing in a statement within the parameters (as @p1 and @p2), once compiled plan is finished then it is placed on cache in plan cache in parameterized form which is used whenever such subsequent statements differs on the values of those constants can be reused.
By default SQL Server 2000 and 2005 versions DB engine sniffs similar fashion during compilation or recompilation for the following types of batches:
Additional to the above types 2005 engine behaves for queries that are submitted using query hint OPTION(RECOMPILE), it can be SELECT, INSERT, UPDATE, or DELETE operations for both the parameter values and the current values of local variables are sniffed. The values sniffed (of parameters and local variables) are those that exist at the place in the batch just before the statement with the OPTION(RECOMPILE) hint.
As per the BOL to identify and obtain the RECOMPILATION causes you can take help of Profiler by starting a new profiler trace and select the following events under Stored Procedures event class. (To reduce the amount of data generated, it is recommended that you de-select any other events.)
-
SP:Starting
-
SP:StmtStarting
-
SP:Recompile
-
SP:Completed
In addition, to detect statistics-update-related recompilations, the "Auto Stats" event under "Objects" class can be selected.
Further to SQL 2005 onwards DMVs can help to obtain all the possible recompilation reasons reported for SP:Recompile event using below code:
select v.subclass_name, v.subclass_value
from sys.trace_events e inner join sys.trace_subclass_values v
on e.trace_event_id = v.trace_event_id
where e.name = 'SP:Recompile'
Again BOL helps to notify the various tools and commands that exist in obtaining/debugging recompilation based schemas:
sys.syscacheobjects virtual table that exists only in master database, review in BOL more about this table.
DBCC FREEPROCCACHE - refer to the list of blog posts here.
DBCC FLUSHPROCINDB(db_id) and refer to Recompile all the stored procedures & How to find that a query could benefit from an index? posts.
On the other side during a problem I was looking towards disable/enable such sniffing process for a huge-result based query and every time it
is executed the estimated execution plan shows sniffing problem, while on that its a coincedence that I found one of the SQL Server 2008
SP1 cumulative update 7 : introduces trace flag 4136, which can be used to control the "parameter sniffing" process.
When you enable trace flag 4136, the "parameter sniffing" process is disabled. See that KBA for more information.
Here is the closeout tip from TECHNET article, the following list enumerates the statement types for which SQL Server 2005 does not auto-parameterize.
Queries with IN clauses are not auto-parameterized. For example:
WHERE ProductID IN (707, 799, 905)
BULK INSERT statement.
UPDATE statement with a SET clause that contains variables. For example:
UPDATE Sales.Customer
SET CustomerType = N'S'
WHERE CustomerType = @a
A SELECT statement with UNION.
A SELECT statement with INTO clause.
A SELECT or UPDATE statement with FOR BROWSE clause.
A statement with query hints specified using the OPTION clause.
A SELECT statement whose SELECT list contains a DISTINCT.
A statement with the TOP clause.
A WAITFOR statement.
A DELETE or UPDATE with FROM clause.
When FROM clause has one of the following:
When a SELECT query contains a sub-query
When a SELECT statement has GROUP BY, HAVING, or COMPUTE BY
Expressions joined by OR in a WHERE clause.
Comparison predicates of the form expr <> non-null-constant.
Full-text predicates.
When the target table in an INSERT, UPDATE, or DELETE is a table-valued function.
Statements submitted via EXEC string.
Statements submitted via sp_executesql, sp_prepare, and sp_prepexec without parameters are auto-parameterized under TF 447.
When query notification is requested.
When a query contains a common table expression list.
When a query contains FOR UPDATE clause.
When an UPDATE contains an ORDER BY clause.
When a query contains the GROUPING clause.
INSERT statement of the form: INSERT INTO T DEFAULT VALUES.
INSERT ... EXEC statement.
When a query contains comparison between two constants. For example:
WHERE 20 > 5
If by doing auto-parameterization, more than 1000 parameters can be created.
-
SQL Server query optimization and keepup of performance is not a black-art, it can be achieved in a fair manner if you treat your queries with plans with a are reused or recompiled plans that can help you design well-performing applications.
There are ways to add a plan or data to the SQL Server cache or to remove a plan or data from SQL Server cache, but these techniques are only recommended for testing or troubleshooting purposes. The key point is more you understand about optimal query plans, and how different actual values and cardinalities require different plans, the more you can determine when recompilation is a useful thing.
Here is the script I use for SQL Server 7 & 2000 version installations (source: mssqltips)
| Script # 1: Get SQL Server plan cache contents and their usage for SQL Server 2000 and onwards |
USE Master GO
SELECT
UseCounts, RefCounts,CacheObjtype, ObjType, DB_NAME(dbid) as DatabaseName, SQL
FROM syscacheobjects
ORDER BY dbid,usecounts DESC,objtype
GO |
For SQL Server 2005 and forward, DMVs have been introduced to get this information. So to get the contents of the plan cache along with usage statistics, you may use the following DMV script.
| Script # 2: Get SQL Server plan cache contents and their usage for SQL Server 2005 and onwards |
USE master GO
SELECT UseCounts,RefCounts, Cacheobjtype, Objtype,
ISNULL(DB_NAME(dbid),'ResourceDB') AS DatabaseName, TEXT AS SQL
FROM sys.dm_exec_cached_plans
CROSS APPLY sys.dm_exec_sql_text(plan_handle)
ORDER BY dbid,usecounts DESC;
GO |
Coming back to the recompiles of stored procedures and on top of that when the SQL Server is hammered with unnecessary recompiles, or when SQL Server is not recompiling when you think it should, your troubleshooting efforts will be easier the more you know about how plans are managed internally. Technet & MSDN has tons of articles and tips to go through this topic and here are some I have added from my experience too.
As you are aware SQL Server dynamically manages the memory, unless the server configuration settings are dictated with MIN & MAX values. So most of memory used by SQL Server is allocated to the Buffer Pool, which is used to store data pages. SQL Server steals a proportion of this memory for use in caching query plans. The overall amount of memory available to SQL Server depends upon the amount of memory installed on the server, the architecture of the server, the version and edition of SQL Server and the amount of memory pressure being experienced by SQL Server. This pressure can be internal (SQL Server resources need memory) or external (operating system needs memory). As per the documentation SQL Server’s plan cache is made up of four separate memory areas, called cache stores. There are actually other stores in SQL Server’s memory, which can be seen in the Dynamic Management View (DMV) called sys.dm_os_memory_cache_counters, but there are only four that contain query plans.
Further digging into DMV sys.dm_os_memory_cache_counters from below you can observer the values that can be seen in the type column of :
- Object Plans (CACHESTORE_OBJCP) for stored procedures, functions, and triggers
- SQL Plans (CACHESTORE_SQLCP) for adhoc cached plans, autoparameterized plans, and prepared plans.
- Bound Trees (CACHESTORE_PHDR) : Bound Trees are the structures produced by SQL Server’s algebrizer for views, constraints, and defaults.
- Extended Stored Procedures (CACHESTORE_XPROC): Extended Procs (Xprocs) are predefined system procedures, like sp_executeSql and sp_tracecreate, that are defined using a DLL, not using Transact-SQL statements. The cached structure contains only the function name and the DLL name in which the procedure is implemented.
From BOL here is the script that helps to find out the number of buckets for each of the plan cache stores using the following query:
SELECT type as 'plan cache store', buckets_count
FROM sys.dm_os_memory_cache_hash_tables
WHERE type IN ('CACHESTORE_OBJCP', 'CACHESTORE_SQLCP',
'CACHESTORE_PHDR', 'CACHESTORE_XPROC');
So once you obtain the results you need to find the plan in cache, by combining the values from another DMV the plan_handle needs to obtained; To obtain that plan_handle values use the view sys.dm_exec_cached_ plans. It can get the sql_handle value that corresponds to a particular plan_handle from the sys.dm_exec_plan_attributes function that we looked at earlier. Here is the same query we discussed earlier to return attribute information and pivot it so that three of the attributes are returned in the same row as the plan_handle value.
SELECT plan_handle, pvt.set_options, pvt.object_id, pvt.sql_handle
FROM (SELECT plan_handle, epa.attribute, epa.value
FROM sys.dm_exec_cached_plans
OUTER APPLY sys.dm_exec_plan_attributes(plan_handle) AS epa
WHERE cacheobjtype = 'Compiled Plan'
) AS ecpa
PIVOT (MAX(ecpa.value) FOR ecpa.attribute
IN (“set_options”, “object_id”, “sql_handle”)) AS pvt;
Whenever any complaints about slow query performance it is best to start from sys.dm_exec_cached_plans view which is the one I use most often for troubleshooting query plan recompilation issues, this DMV has other useful information about the cached plans, including:
- size_in_bytes: number of bytes consumed by this cache object.
- cacheobjtype: type of the cache object, that is, if it’s a compiled plan, or a Parse Tree or an Extended Proc.
- memory_object_address: memory address of the cache object, which can be used to get the memory breakdown of the cache object.
BOL and Kalen Delaney's blog gives insight of obtain further information from SQLText that is associated with each compiled plan, we’ve seen that we can find it by passing the plan_handle to the sys.dm_exec_sql_text function. We can use the query below to retrieve the text, usecounts, and size_in_bytes of the compiled plan and cacheobjtype for all the plans in cache. The results will be returned in order of frequency, with the batch having the most use showing up first:
SELECT st.text, cp.plan_handle, cp.usecounts, cp.size_in_bytes,
cp.cacheobjtype, cp.objtype
FROM sys.dm_exec_cached_plans cp
CROSS APPLY sys.dm_exec_sql_text(cp.plan_handle) st
ORDER BY cp.usecounts DESC
Finally to obtain the information from memory cache use the DMV sys.dm_os_memory_cache_entries to show the current and original cost of any cache entry, as well as the components that make up that cost.
SELECT text, objtype, refcounts, usecounts, size_in_bytes,
disk_ios_count, context_switches_count,
pages_allocated_count, original_cost, current_cost
FROM sys.dm_exec_cached_plans p
CROSS APPLY sys.dm_exec_sql_text(plan_handle)
JOIN sys.dm_os_memory_cache_entries e
ON p.memory_object_address = e.memory_object_address
WHERE cacheobjtype = 'Compiled Plan'
AND type in ('CACHESTORE_SQLCP', 'CACHESTORE_OBJCP')
ORDER BY objtype desc, usecounts DESC;
-
Every now and then I see this big question from SQL Server users that when to use a stored procedure?
Well there is a second part to this question as well, think about caching of such stored procedures There is no such database management software that can guarentee you flawless query execution over a period of time. Performance problems caused by misuse or mismanagement of plan cache, or inappropriate recompilation, can manifest themselves as simply a decrease in throughput or an increase in query response time. Problems with caching can also show up as out-of-memory errors or connection time-out errors, which can be caused by all sorts of different conditions, this is a common error you will see when a paticilar query is executed many times. A wealth of information available in SQL Server books online and other blog websites etc. few I would like to present here from my experience.
Coming back to original question on when to use, there are few guidelines that I would like to propose when you are deciding whether to use stored procedures or one of the other mechanisms:
- Stored procedures These objects should be used when multiple connections are executing batches in which the parameters are known. They are also useful when you need to have control over when a block of code is to be recompiled.
Since SQL Server 2005 version, many administrative and informational activities can be performed by using system stored procedures. The system stored procedures are grouped into the categories shown in the following table (from BOL):
| Category |
Description |
|
Active Directory Stored Procedures |
Used to register instances of SQL Server and SQL Server databases in Microsoft Windows 2000 Active Directory. |
|
Catalog Stored Procedures |
Used to implement ODBC data dictionary functions and isolate ODBC applications from changes to underlying system tables. |
|
Cursor Stored Procedures |
Used to implements cursor variable functionality. |
|
Database Engine Stored Procedures |
Used for general maintenance of the SQL Server Database Engine. |
|
Database Mail and SQL Mail Stored Procedures |
Used to perform e-mail operations from within an instance of SQL Server. |
|
Database Maintenance Plan Stored Procedures |
Used to set up core maintenance tasks that are required to manage database performance. |
|
Distributed Queries Stored Procedures |
Used to implement and manage Distributed Queries. |
|
Full-Text Search Stored Procedures |
Used to implement and query full-text indexes. |
|
Log Shipping Stored Procedures |
Used to configure, modify, and monitor log shipping configurations. |
|
Automation Stored Procedures |
Enable standard Automation objects to be used within a standard Transact-SQL batch. |
|
Notification Services Stored Procedures |
Used to manage SQL Server 2005 Notification Services. |
|
Replication Stored Procedures |
Used to manage replication. |
|
Security Stored Procedures |
Used to manage security. |
|
SQL Server Profiler Stored Procedures |
Used by SQL Server Profiler to monitor performance and activity. |
|
SQL Server Agent Stored Procedures |
Used by SQL Server Agent to manage scheduled and event-driven activities. |
|
Web Task Stored Procedures |
Used for creating Web pages. |
|
XML Stored Procedures |
Used for XML text management. |
|
General Extended Stored Procedures |
Provide an interface from an instance of SQL Server to external programs for various maintenance activities. |
- Adhoc caching This option is beneficial only in limited scenarios. It is not dependable enough for you to design an application expecting this behavior to correctly control reuse of appropriate plans.
Consider the same procedure being called dozens or hundreds of times. Remember that SQL Server DB engine optimizer will cache the adhoc shell query that includes the actual parameter for each individual call to the procedure, even though there may be only one cached plan for the procedure itself. As SQL Server starts experiencing memory pressure, the work to insert the entry for each individual call to the procedure can begin to cause excessive waits resulting in a drop in throughput or even out-of-memory errors.
Also other kind of caching issues such as, with prepared queries, you actually specify the parameter datatype, so it’s easier to make sure you are always using the same type. When SQL Server parameterizes, it makes its own decisions as to datatype. If you look at the parameterized form of your queries of type Prepared, you’ll see the datatype that SQL Server assumed.
- Autoparameterization This option can be useful for applications that cannot be easily modified. However, it is preferable when you initially design your applications that you use methods that explicitly allow you to declare what your parameters are and what are their datatypes.
If the recompile is caused by a change in a SET option, the SQL Trace text data for TSQL statements immediately preceding the recompile event can indicate which SET option changed. It’s best to change SET options when a connection is first made, and avoid changing them after you have started submitting statements on that connection, or inside a store procedure.
In this scenario I have seen that Optimizer hints can also be used to force SQL Server to come up with a new plan in those cases in which it might be using an existing plan. Although there are dozens of hints that you can use in your Transact-SQL code to affect the plan that SQL Server comes up with, we will cover about PLAN GUIDES on seperate post or look within Kalen Delaney's blog.
- Execution of those Stored Procedures then look at the two suggestions below:
- The sp_executesql procedure This procedure can be useful when the same batch might be used multiple times and when the parameters are known.
- The prepare and execute method These methods are useful when multiple users are executing batches in which the parameters are known, or when a single user will definitely use the same batch multiple times.
Mixing both parameterization & execution, if you try to force using sp_executesql or Prepare/Execute, all the statements in the batch must be parameterized for the plan to be reusable. If a batch has some parameterized statements and some using constants, each execution of the batch with different constants will be considered distinct, and there will be no value to the parameterization in only part of the batch. Finally as more queries are run, the amount of memory used for data page caching should increase along with the amount of memory used for plan cache.
A hint of how you can monitor such activity there are two main tools for detecting excessive compiles and recompiles. Keep in mind that compiling and recompiling are not the same thing. Recompiling is done when an existing module or statement is determined to be no longer valid or no longer optimal. All recompiles are considered compiles, but not vice versa. For example, when there is no plan in cache, or when executing a procedure using the WITH RECOMPILE option, or executing a procedure that was created WITH RECOMPILE, SQL Server considers this a compile but not a recompile. You can use either System Monitor (PERFMON) or SQL Trace (PROFILER) to detect compilations and recompilations.
More to come....
-
The subject line of this post is a common factor that every DBA might have (once in lifetime).
Performance Monitoring is a big subject when it comes to practice, many factors include such as what to monitor, how often, and 'basic' rules of thumb. For the sake of performance you cannot simply compromise any SLA or Availability of application, there needs to be an effective approach to monitor and manage Database Maintenance issues in specific to DBCC, backup and spotlight on tempdb and Transaction log backup performance. Let us spot light on few common issues that you will face in the DB maintenance scenario.
The key areas are:
- Memory Available to SQL Server
- CPU Utilization
- Disk I/O
- Response Time for Commonly Run Queries
- SQL Error Logs
There are big links (inter-operability) between these issues, if you see one issue then another component is also affected. It is a fact that if a system suffers the percentage of free memory available to be used by any applicaiton (SQL Server, in this case) is consistently low (20% or less) it will significantly degrade the performance of SQL Server. By default leaving the memory to DYNAMIC is preferrable unless otherwise if a thorough (end-to-end) data testing has been done on your SQL Server platform, in general this will be suggested by Microsoft PSS/CAT team if you have raised the issue to support. If adjusting how memory is allocated to SQL Server does not free up enough memory for the SQL Server instance to function properly, the physical memory for the server may need to be increased, or the concurrent application load on the server may need to be decreased.
In general kind of memory issues are sufferred when the server executing any resource intensive process, such as DBCC CHECKDB or DBREINDEX on an active database. It is a good practice to monitor the SQL Server error log even though when you don't see any problem or alerts on the instances, in this case you may see the checkdb that reported a error in tempdb as below:
Table error: Object ID 1109769969, index ID 0, partition ID 72057595983888384, alloc unit ID 72057596039200768 (type LOB data). The off-row data node at page (6:459), slot 0, text ID 296853831680 is not referenced.
DBCC CHECKDB (tempdb) executed by MSSQLService found 32 errors and repaired 0 errors. Elapsed time: 0 hours 0 minutes 3 seconds.
As an experienced DBA, you could translate the description of above error with BOL error 8974 that streams possibility of hardware failure. So periodically you must run hardware diagnostics to correct any problems on disks. Not only that you must also examine the Windows system and application logs and the SQL Server error log to see whether the error occurred as the result of hardware failure. Alerting mechanism should be in place to If you have persistent data corruption problems, try to swap out different hardware components to isolate the problem. Check to make sure that the system does not have write-caching enabled on the disk controller. If you suspect write-caching to be the problem, contact your hardware vendor. I would not suggest to switch to new hardware system unless the existing hardware is redundant/obsolete, as it includes reformatting the disk drives and reinstalling the operating system.
If the scheduled database maintenance jobs such as DBCC CHECKDB reports errors then you have couple of ways to clean the date, if a particular table has been reported as corrupted then either you re-create a new table by scripting existing table (along with primary/foreign keys & constraints) then export the rows out to this new table, drop the old table & rename new to old one. This sort of exercise definetly needs a testing and downtime to your application. Also if no clean backup is available, run DBCC CHECKDB without a REPAIR clause to determine the extent of the corruption. By default the CHECKDB operation results will recommend a REPAIR clause to use, jus a warning that REPAIR clause may suffer data loss to correct the pages.
During these hardware and memory issues you must concentrate on CPU usage as well, it will have knock-on affect even if a small query is executed against the database. If it is a hardware related issue then definetly the system will generate a stackdump that will be presented during the DBCC execution results within the SQL Server error log (see below).
" failed with the following error: "Table error: Object ID 3109769969, index ID 0, partition ID 7205752395433888384, alloc unit ID 720575962329200768 (type LOB data). The off-row data node at page (6:459), slot 0, text ID 2968233811680 is not referenced.
Table error: Object ID 1109769969, index ID 0, partition ID 72057595983888384, alloc unit ID 72057596039200768 (type LOB data). The off-row data node at page (6:459), slot 1, text ID 296853897216 is not referenced.
CHECKDB found 4 allocation errors and 0 consistency errors in database 'mssqlsystemresource'.
DBCC execution completed. If DBCC printed error messages, contact your system administrator.
CHECKING DATABASE tempdb
DBCC results for 'tempdb'.
.... removed lines ...
CHECKDB found 0 allocation errors and 82 consistency errors in table '#4225BEF1' (object ID 1109769969).
**Dump thread - spid = 107, PSS = 0x00000000EE497BC0, EC = 0x00000000EE497BD0
***Stack Dump being sent to C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\LOG\SQLDump1029.txt
* *******************************************************************************
*
* BEGIN STACK DUMP:
* 12/01/09 02:00:08 spid 10327
*
* DBCC database corruption
Talking about consistent high CPU utilization rates, over 90%, may indicate a poorly-tuned or designed application, or an underpowered system. The metrics include to look at size of the processor queue. If you see high CPU usage and a processor queue that exceeds two over time, it means that processes consistently have to wait for the CPU – a sure sign of a bottleneck. At the same time you have to concentrate on blocking & locking of SQL Server processes which are obvious source for amount of CPU time usage. Again this leads to pattern of monitoring the response time of commonly run queries over time is an effective way of gauging the relative performance of SQL Server. If the response time for the same queries increases to unacceptable levels, the resources on the server may need to be increased.
If a simple query (which used to work quick enough) is taking longer than usual time then you must consider the following questions:
- Has anything changes recently like the ones listed below ?
- Before and after any SQL Server service pack or hotfix installations.
- Before and after any major database application upgrades.
- Before and after any Windows operating system service pack or hotfix installations or upgrade.
- Any hardware or system maintenance such as drivers, firmware updates, new hardware installation, or replacement of hardware components.
Do not attempt to repair or restore any databases without having a complete backup, in my experience I haven't touched (restore or repair) the TEMPDB due to the nature of its usage in SQL Server since 2005 version onwards. Again rounding up this issue towards memory, if you are using STANDARD edition of SQL Server then there is a chance of MemToLeave area becoming fragmented so that there is insufficient contiguous space to allocate the buffers required for the backups. Technet and other best practices articles refers to resolve this issue to use the start-up parameter '-g' with SQL Server. The default -g setting is 128Mb, it is recommended that -g256 is used, if this does not resolve the issue increase the value to -g512. Changing these settings does require SQL Server to be restarted for the changes to take affect and in production environments this is not always possible. Refer to this excellent reference from Microsoft perf.engineers about What's swimming in your bufferpool? post.
The default behaviour of SQL optimizer and Query engine to execute a query estimates how much memory it needs to run and tries to reserve this amount of memory from the buffer pool, if the reservation succeeds the query is executed immediately. If there is not enough memory readily available from the buffer pool, then the query is put into a queue with a timeout value, where the timeout value is guided by the query cost. The basic rule is (source Technet): higher the estimated cost is, larger the time out value is. For the obvious reasons when the waiting time of this query exceeds the timeout value, a time out error is thrown and the query is removed from the queue and you will see below error messages on SQL Server error log:
[State:42000 Error:8645] [Microsoft][SQL Native Client][SQL Server]A time out occurred while waiting for memory resources to execute the query. Rerun the query.
In this case you have to concentrate on WAIT queus and SQL_HANDLE values by using Dynamic Management Views (DMVs) to know the memory status. Coming back to timeout and query execution tactics, if the memory is enough for a newly submitted query but there are queries in waiting queues, this query is put into a queue. Queries in waiting queues are “sorted” based on their cost and waiting time. A basic awareness on ranking by SQL engine is dynamic and will have a change frequently as the memory usage will always fluid based on the number of users/processes on the server, unless the usage is absolutely nil. So the query with the highest rank will run if there is enough free memory and if the memory is insufficient, then no other queries will run. The system internal process will not attempt to check if the free memory is enough to run other queries, hence you will see the timeout messages on the application and log. You can use the following TSQL from DMVs for values that are required:
select * from sys.dm_exec_query_memory_grants where is_next_candidate is not null
[use the value in the plan_handle column to retrieve the showplan from sys.dm_exec_query_plan and the sql_handle column to retrieve the SQL text from sys.dm_exec_sql_text].
Since 2005 version of SQL Server there are few ways in which the application can take advantage of SQL DB core engine internal mechanism, as compared to earlier versions. There is benefit in moving/upgrade the platform to 64-bit SQL Server's (if you haven't yet) on the basis of performance on x64 AMD Opteron and Intel Xeon architectures. If there is no scope or chance or requirement for your application/platform then ignore this part of architecture. On 64 bit environment given the price and density of RAM, and great new x64 CPUs available, this is a perfect time to take a closer look at the performance of your database servers, and consider doing fresh deployments, or migrations, on x64 SQL Server, which can use the additional RAM for everything, not just the buffer cache.
Finally when monitoring any service or application on any system, there is always a trade-off of executing and monitoring the system usage and more concentration on usage of tools. By hand and experience there is no such rule of thumb that you must monitor the system using a remote server or run monitoring exercises locally. Depending upon the server situation and SLAs of application there are benefits and advantages of using tools. As always the performance data should always be analyzed with complete understanding of your enviornment and results of any configuration changes with the amount of data that is gathered. Think about running a remote monitoring of SQL Server when it is highly accessed and already stress with performance, network limitations & data that needs to be transfered on wire. Further when you are troubleshooting any specific issues, the use of SYSMON (PERFMON) tool is usually confined to a relatively short time of capturing the counters. Do not mix your capacity planning & trend analysis along with performance monitoring exercises, network is key in all aspects of monitoring tasks.
-
There are various methods you could apply to secure SQL Server platform, that includes the areas of physical hardware and networking systems connecting clients to the database servers, and the binary files that are used to process database requests!
It is not that easy to implement the strict measures of security when the databases are spread out in an enterprise-wide network. There are tons of information about best practices for physical security strictly limit access to the physical server and hardware components. But it is not possible in all the occassions. Implementing physical network security starts with keeping unauthorized users off the network, a single-step to ensure the level of access is restricted. Next level is surface area reduction, a security measure that involves stopping or disabling unused components. This helps improve security by providing fewer avenues for potential attacks on a system and key to limiting the surface area of SQL Server includes running required services that have "least privilege" by granting services and users only the appropriate rights. Most of this topic is available and I refer to Books Online and suggest to go through the pages of information on Security topic and also various blogs on web.
Next in the picture comes about client connection and authentication to SQL Server instance, instantly the TCP/IP protocol comes into consideration. By default, clients are configured to try all protocols until one of them works. If the TCP/IP protocol is disabled, clients continue with the next protocol. If TCP/IP is enabled but the endpoint is stopped, the connection attempt is not rejected, so the client does not try other protocols, but the stopped connection cannot be used. In this case, you must explicitly connect to a active endpoint and any additional user-defined endpoints behave the same way as default endpoints. Also BOL refers that When an endpoint is created for an IP address (or all IP addresses, by using IP_ANY) and a specific TCP port, permission to connect to the endpoint is granted to users in a process called provisioning. They retain the provisioning regardless of whether or not the server is actually listening on the IP address/TCP port combination.
You can restrict access to an endpoint to EVERYONE group, which is a good practice using DENY CONNECT statement when the SQL Server instance is accessed over internet or between enterprise domains. The best method is to grant permission to specific individuals or roles using GRANT CONNECT statment, but do not attempt to use GRANT CONNECT permission to the PUBLIC group unless you are assured about the accessibility and privileges.
Same in the series of GRANT and DENY permissions you can control the SQL Server login accessibility and enforce the password policies, say when your SQL Server has not be enabled with NT authentication which is by default. It is a bonus that SQL Server does provide such an option of enforcing password policy for SQL logins too. Password complexity policies are designed to deter brute force attacks by increasing the number of possible passwords. When password complexity policy is enforced, the new password must meet these changes, also when SQL is running on Windows Server 2003 or later, SQL Server can use Windows password policy mechanisms.. Say when creating a SQL Login you can specify CHECK_POLICY=ON and again BOL refers certain conditions to enforce this policy:
- When CHECK_POLICY is changed to ON, the following behaviors occur:
- CHECK_EXPIRATION is also set to ON unless it is explicitly set to OFF.
- The password history is initialized with the value of the current password hash.
- When CHECK_POLICY is changed to OFF, the following behaviors occur:
- CHECK_EXPIRATION is also set to OFF.
- The password history is cleared.
- The value of
lockout_time is reset.
Some combinations of policy options are not supported.
- If MUST_CHANGE is specified, CHECK_EXPIRATION and CHECK_POLICY must be set to ON. Otherwise, the statement will fail.
- If CHECK_POLICY is set to OFF, CHECK_EXPIRATION cannot be set to ON. An ALTER LOGIN statement that has this combination of options will fail.
As per the BOL server operating system (Windows 2003 and later) takes care of policy, and in the case of Windows XP & 2000 systems the windows password policy only checks if the password is complex enough. As the policy of minimum length and history (based on security policy to use if the same password as previous N passwords), password minimum life and maximum life can be check within Windows 2003 and later systems, where CHECK_EXPIRATION policy will take care and in such cases the login can be locked out by default when there 3 wrong attempts. By default both of these options are off on SQL Server and you can enforce whenever required by using ALTER LOGIN statement. Just a note from BOL (again) about known issue in Windows 2003 operating system that it prevent the bad password count from resetting after the Account Lockout threshold has been reached. This could cause an immediate lockout on subsequent failed login attempts. You can manually reset the bad password count by briefly setting CHECK_POLICY = OFF, followed by CHECK_POLICY = ON. For more information about the Account Lockout threshold, see Your User Account May Be Prematurely Locked Out article.
How about finding the vulnerabilities and security issues from the code?
Microsoft Source Code Analyzer for SQLInjection tool can help to find the SQL injection vulnerabilites in ASP code and refer to SQL injection vulnerabilities in ASP to reduce these issue, it is best to use parameterized SQL queries. Also refer to MS security advisory link about the issue that contains addition security related information.
Next on the list of tools to identify IIS related vulnerabilities, such as a tool that acts as site filter by blocking certain HTTP requests that helps to block malicious requests used in an attack. You can take help of URLScan 3.0 from MS on IIS realted sites.
Finally HP provides a Scrawlr scanning tool that scans websites looking for SQL Injection vulnerabilities in URL parameters, which is referred on security advisory note above too.
On the logging side of servers you can take help of Auditing in SQL Server 2008, which is an enhanced feature and this White-Paper on taking help of Audit feature, but also refer to performance impact of using audit feature in SQL 2008. In addition to this I suggest SQL Server to address compliance needs of your environment & SQL Server Lockdown - Step by Step procedure posts here for further help.
-
You may think on the subject line why TEMPDB is a key factor, a simple term is the size and physical placement of the tempdb database can affect the performance of a system.
Even a simple query requires the access/process to TEMPDB, which is a part of the system-processing load may be taken up with autogrowing tempdb to the size required to support the workload every time you restart the instance of SQL Server.
So if the SQL Server instance TEMPD is spread across multiple files then you need to adjust this percentage based on the speed of the I/O subsystem on which the tempdb files are located. This is required to avoid any sort of latch timeout or file initialization you need to limit the AUTOGROW operation depending upon your disk setup/configuration. As you may be aware in SQL Server engine/architecture the data files can be initialized instantaneously. This allows for fast execution of the previously mentioned file operations, as such it is an advantage of instant file initialization reclaims used disk space without filling that space with zeros.
This is where a DBA must pre-allocate space for all tempdb files by setting the file size to a value large enough to accommodate the typical workload in the environment, how can you achieve this task? You have to monitor the queries that are getting exeuted, monitor the TEMPDB periodically during any changes to the database maintenance scheduled jobs. As any such changes to such processes can trigger the files expansion and pre-sizing will help tempdb from expanding too frequently, which can affect performance. Also there is a good reason to leave the TEMPDB database setting to AUTOGROW as you don't know when a smalll process can even trigger the file expansion that should be used to increase disk space for unplanned exceptions.
To monitor the growth or usage of tempdb SQL Server provides Dynamic Management Views (DMVs) that can provide lot of information with an additional planning of collecting performance counters in Windows System Monitor. Books OnLine provides much of information on such DMV that can help to monitor the TEMPDB:
sys.dm_db_file_space_usage. Returns space usage information for each file in the database. For more information, see sys.dm_db_file_space_usage (Transact-SQL).
sys.dm_db_session_space_usage. Returns page allocation and deallocation activity by session for the database. For more information, see sys.dm_db_session_space_usage (Transact-SQL).
sys.dm_db_task_space_usage. Returns page allocation and deallocation activity by task for the database. For more information, see sys.dm_db_task_space_usage (Transact-SQL).
sys.dm_tran_top_version_generators. Returns a virtual table for the objects producing the most versions in the version store. It groups the top 256 aggregated record lengths by database_id and rowset_id. Use this function to find the largest consumers of the version store. For more information, see sys.dm_tran_top_version_generators (Transact-SQL).
sys.dm_tran_version_store. Returns a virtual table that displays all version records in the common version store. For more information, see sys.dm_tran_version_store (Transact-SQL).
sys.dm_tran_active_snapshot_database_transactions. Returns a virtual table for all active transactions in all databases within the SQL Server instance that use row versioning. System transactions do not appear in this DMV. For more information, see sys.dm_tran_active_snapshot_database_transactions (Transact-SQL).
sys.dm_tran_transactions_snapshot. Returns a virtual table that displays snapshots taken by each transaction. The snapshot contains the sequence number of the active transactions that use row versioning. For more information, see sys.dm_tran_transactions_snapshot (Transact-SQL).
sys.dm_tran_current_transaction. Returns a single row that displays row versioning-related state information of the transaction in the current session. For more information, see sys.dm_tran_current_transaction (Transact-SQL).
sys.dm_tran_current_snapshot. Returns a virtual table that displays all active transactions at the time the current snapshot isolation transaction starts. If the current transaction is using snapshot isolation, this function returns no rows. sys.dm_tran_current_snapshot is similar to sys.dm_tran_transactions_snapshot, except that it returns only the active transactions for the current snapshot. For more information, see sys.dm_tran_current_snapshot (Transact-SQL).
As you may be aware that SQL Server 2005 started to offer such an helpful statistics with the help of DMVs, not only that with the service pack releases there has been few changes about transaction processing. One of them is the introduction of version store concept and snapshot isolation transaction isolation level. these two subjects are completely related. If you are aware about Oracle's feature of undo tablespace, which is a place where the previous version of data is stored. Similar to this feature SQL Server offers the transaction isolation level SNAPSHOT which is a picture of the actual situation is taken and the transaction asking for reading this data is redirected to the version store. So all this information is stored in TEMPDB, which is another good step to keepup the space of TEMPDB in order to get an optimum statistics when they are required.
Lastly, winding up to the PERFMON counters you must followup the SQL Server related counters for system performance impacted by SQL Server processes. You will find these counters under SQLServer:Transactions performance object, further to Free Space in tempdb (KB) to monitor the KB offree space in the tempdb database which will give you a snapshot of requirement of enough free space in tempdb to handle the version store that supports snapshot isolation. One of the Technet article provides the formula of providing estimate of size of version store:
[size of common version store] = 2 * [version store data generated per minute] * [longest running time (minutes) of the transaction]
The above will provide a rough estimate of the size of the version store and in case of long running transactions you must monitor the the above counter, also you should rule out the online index rebuild information (if enabled/exists) that such longest running transactions will not consists as such online index builds use a separate version store. The approximate size of the online index build version store equals the amount of data modified in the table, including all indexes, while the online index build is active.
Finally to keepup the performance of your SQL Server you must take care of TEMPDB database and create as many files as needed to maximize disk bandwidth. It is evident that using multiple files reduces tempdb storage contention and yields significantly better scalability. At the same time do not simply create too many files because this can reduce performance and increase management overhead. One of the best practice is to create one data file for each CPU on the server (accounting for any affinity mask settings) and then adjust the number of files up or down as necessary, in case of dual-core CPU then consider it as 2 CPUs.
It has been over 6 months for this post about TEMPDB importance, recently I had to upgrade my understanding of such concept with the help of book "Microsoft SQL Server 2008 Internals" by our own Kalen Delaney. This SQL 2008 bible talks that in order to achieve your goal of TEMPDB management you must have a consistent view of the database which is necessary, because DBCC CHECKDB must analyze all allocated pages in the database and check the various links between structures on multiple pages. By all means the process will be handled within TEMPDB while the pages being analyzed (that is, the whole database) cannot change while the consistency checks are running;
The following are very useful to know why TEMPDB is useful for every SQL instance:
-
Old drawing board for every database, temporary objects
-
1 TEMPDB used a single filegroup for data
-
Do not take chances or assumptions on sizes, always set a size
-
A server restart does not persist the previous size
-
Don't rely upon object caching, reduce the contention
-
Common problems and practices to the TEMPDB issues resolution:
- If it blows then you will see errors: 1101, 1105 (common one), 3958
- Have a history for Performance (SYSMON_) counters specific to monitor Tempdb
- TEMPDB DMVs – comes very handy to obtain the largest objects
- If you find I/O issues – then look at DMV and memory issues that will stress upon IO
- Hardware - consume responsibly:
- RAID 1+0 will provide better throughput for write-intensive applications. The amount of performance gained will vary based on the HW vendor’s RAID implementations. Most common alternative to RAID 1+0 is RAID 5. Generally, RAID 1+0 provides better write performance than any other RAID level providing data protection, including RAID 5.
To wrap up on disk consideration you must have faster spindles for better performance, a good practice to en
sure that you have an adequate number of spindles to support your IO requirements with an acceptable latency. As usual option on filegroups for administration requirements such as backup / restore, partial database availability, etc. is required to recover the databases when any issue strikes. Unless you understand the application very well avoid trying to over optimize the IO by selectively placing objects on separate spindles. Make sure to give thought to the growth strategy up front. As your data size grows, how will you manage growth of data files / LUNs / RAID groups? It is much better to design for this up front than to rebalance data files or LUN(s) later in a production deployment.
-
I always trust how good your resources are to find out a root cause than how bad the problem is?
It is one of the best point of disaster recovery position I wanted to be to get to work on solution on quicker basis than worrying about the impact, the business will have their say about availability or scalability for their application! Well we cannot work on both of the requirements at the same time and anytime a DBA must be hands-on to solve the problem in a flexible manner than in panic.
Lately, it has been much of Architecture and Solution design tasks for me rather than working on a real-time performance problem and honest opinion is I miss much of those good old 'DBA' days. One of the best features that I consider in SQL Server 2008 version is 'Multiserver Administration' taking the help of SQL Agent that takes advantage of the self-tuning features of SQL Server, does it work that way! But you should rather concern on normal conditions of the platform and additional job tuning is not necessary. In any case the performance between client and server suffers if the network is not good enough and loads increase when you run jobs, generate alerts, and notify operators.
Back to the old school and in order to determine whether your SQL Server system is performing optimally, take performance measurements at regular intervals over time, even when no problems occur, to establish a server performance baseline. Compare each new set of measurements with those taken earlier.
The following areas affect the performance of SQL Server:
- System resources (hardware)
- Network architecture
- The operating system
- Database applications
- Client applications
The minimum requirement of basline measurement is essential to determine:
- Peak and off-peak hours of operation.
- Production-query or batch-command response times.
- Database backup and restore completion times.
The format of Baseline and Benchmarking methods in SQL Server which helps the DBA to establish the performance baseline and helps to compare the baseline statistics to current server performance. Numbers far above or far below your baseline are candidates for further investigation, bigger or smaller is not the question and the important task is how good is your system and even smaller ones may indicate areas in need of tuning or reconfiguration. For example, if the amount of time to execute a set of queries increases, examine the queries to determine if they can be rewritten, or if column statistics or new indexes must be added.
So the first step to add on the tasks is identify the processes that are blocked for more than 10 seconds:
SELECT * FROM sys.sysprocesses WHERE blocked<>0 AND waittime > 10000
This brings a good old question by a SQL user that my application is designed that do not typically requests locks directly, then why do I see locking on my SQL Server? The answer is usual that locks are managed internally by a part of the Database Engine called the lock manager. When an instance of the Database Engine processes a TSQL statement, the Database Engine query processor determines which resources are to be accessed. The query processor determines what types of locks are required to protect each resource based on the type of access and the transaction isolation level setting. The query processor then requests the appropriate locks from the lock manager. The lock manager grants the locks if there are no conflicting locks held by other transactions, this is irrespective of how the application queries are managed and at best leave it to SQL Server than using your own hints.
So coming to the handy TSQL code that I use to query that will display lock information :
SELECT resource_type, resource_associated_entity_id,
request_status, request_mode,request_session_id,
resource_description
FROM sys.dm_tran_locks
WHERE resource_database_id = <dbid>
And this query which returns object information by using resource_associated_entity_id from the previous query. This query must be executed while you are connected to the database that contains the object.
SELECT object_name(object_id), *
FROM sys.partitions
WHERE hobt_id=<resource_associated_entity_id>
(you might observe that both the above code is available on BOL to get handy)
Also I would like to get the resource usage which returns information that associates a session ID with a Windows thread ID. This is essential to know how the Windows operating system manages the threads and how it helps on performance of the thread which can be monitored in the PERFMON tool and be aware that this query does not return session IDs that are currently sleeping.
SELECT STasks.session_id, SThreads.os_thread_id
FROM sys.dm_os_tasks AS STasks
INNER JOIN sys.dm_os_threads AS SThreads
ON STasks.worker_address = SThreads.worker_address
WHERE STasks.session_id IS NOT NULL
ORDER BY STasks.session_id;
GO
My next step of setup is to find how much memory has been allocation to the SQL Server from the operating system, though the general best practice is to leave it to DYNAMIC:
SELECT object_name as SQL_Server_Instance, cntr_value as Memory_in_KB FROM sys.dm_os_performance_counters
WHERE counter_name = 'Total Server Memory (KB)'
Finally it is best to see if there any deadlocks on the SQL Server using the good old DMVs and also obtain the older user session to see what is causing such a delay in performance:
SELECT * FROM sys.dm_os_performance_counters WHERE counter_name = 'Number of Deadlocks/sec' AND cntr_value <> 0
SELECT login_name as WHO_IS_HE, status as WHAT_IS_HE_DOING, program_name as WHAT_CLIENT_IS_HE_USING, login_time as WHEN_DID_HE_LOGIN, last_request_start_time WHEN_DID_HE_LAST_INTERACT FROM sys.dm_exec_sessions
WHERE is_user_process = 1 ORDER BY login_time ASC
--Web resource (credit to originator)
So handy process of using DMVs in SQL Server and PERFMON tool can provide information about SQL Server manages on individual resource types. It is not possible to prevent concurrent use of resources by different transactions. For example, if an exclusive (X) lock is held on a row within a table by a transaction, no other transaction can modify that row until the lock is released. Minimizing locks increases concurrency, which can improve performance. As usual the DMVs can be used by using two-part, three-part, or four-part names. They are my favourite as these Dynamic management functions on the other hand can be referenced in TSQL statements by using either two-part or three-part names.
There are many good tools available with SQL Server 2008 such as Management Data Warehouse and Extended Events to obtain the inforamtion quickly before you can find how to react when a problem strikes!
-
When you talk about Performance problems then fragmentation & database integrity issues are dearest friends of DBA!
As data is modified in the databases, the tables and indexes will become fragmented, which means the data is scattered everywhere in the database. So talking in DBA sense the moral is as more fragmented a clustered or non-clustered index becomes, it is nothing but database engine requires to add more pages to fulfil the same query request. So it is a required practice to defragment these indexes on a regular basis using your own methods or industry best practices. So to check the allocation of all database pages and internal structures based on the regular activity on the database, delete/insert/modify processes and instant statement pops up is DBCC CHECKALLOC. When it is executed an internal database snapshots are created to maintain the transactional consistency during the operation of checkout. To get consistent results I have seen that some of the adhoc scripts specifies TABLOCK then creation of database snapshot may not occur causing exclusive database lock during the execution of this command which will potentially affect the database concurrency.
Ok, enough talk on index fragmentation and now I'm more concerned on disk fragmentation as a DBA you are more responsiblefor server hardware as well. You must understand the value that professional grade disk defragmentation software brings to their servers. Storage servers can experience high levels of disk thrashing (the constant writing and rewriting of small amounts of data) caused from excessive file fragmentation for database MDF & LDF files. When we refer the operating system based fragmentation then you will see network and application issues have a much more visible impact on the performance of network-based services, especially when problems with those functions are encountered.
Optimal disk performance translates into better ROI, but how can you assure that present hardware can bring it all?
Testing the current hardware disks or involve the hardware vendor to get certfied on the testing pattern it is followed. As you can easily obtain results for impact of server disk defragmentation by looking at common tasks that network servers, both physical and virtual, encounter, ranging from maintenance tasks such as server backup and anti-virus scans.Many out there think that involving the tests of a basic user tasksscha s opening files stored on the host server and virtual machines, and manipulating email are not included, but they must be included too. They will help to add on more fuel to the rigorous database testing such as database queries, index creation, and bulk updates.
So doing such a tests requirement I always depend upon the fantastic tool from Diskeeper which can produce fantastic reports about the file fragmentation on to the root level on the servers. Using this tool you need to perform a benchmark tests involving the hardware and I have performed such benchmarking tests using HP's DL380 G5 equipped with dual quad-core 2.83 GHz Xeon processors, each with a 2x8MB L2 cache, 32 GB of RAM and eleven 72 GB 10,000 RPM SCSI drives attached to an HP Smart Array P400 controller that has a 256 MB cache and that supports both serial attached SCSI and SATA drives. The disk volums varies from 30GB to 170 GB due to the nature of the application data I have to go on such range that keeps the database sizes to a minimum of 500 GB within 7200 RPM locally attached SATA drives and 4 x 500GB for backup purpose only. So this Diskeeper tool helped to achieve such a results by performing 3 levels of fragmentations low, medium and high with Diskcrusher fragmentation utility to createfragmented files and directories. So the results that are produced are phenominal on the level of fragmentation you’ll encounter in usual production environments and pretty much dependent upon the level of use and types of applications the server deals with.
So to refer about complete testing on SQL Server side, we used 2008 version with a bulk insert of 500,000 rows of data. The bulk insert is often the fastest
method of getting data into a SQL Server database. Obviously captured the resource usage of the server by using PERFMON (SYSMON) for CPU, Memory and disk related counters. Also the testing of table key creation times are directly related to how much data SQL Server had to touch, and the level of fragmentation that had to be dealt with. SQL Server 2008 does a very good job of managing its databases, but defragmentation shows appreciable
improvement in the performance of tasks such as this with a performance improvement of over 11 percent in the most fragmented environments. Coming to the query execution that the test involved differ primarily due to the amount of data returned by SQL, in response to the query. The tests depict the effects of
manipulating the data on a fragmented drive with peak performance improvements of approximately 21%.
SQL Servers benefit from defragmentation; reading and writing data with either application simply works better when the files are not fragmented. Which means whenever a DBA follows the regular practice of defragmenting the indexes then underlying file defrag methods are easy to manage and you will instantly see the result as is improved performance during the busy times at the server. I see that many such solutions are simply de-tour'd by suggesting to throw more storage resources (hardware) at a problem and it should be the last resort, because it only masks the potential problems that intelligent disk
defragmentation addresses. The reason I mention on Diskeeper is which it reported that server storage levels are consistently exceeding 75 percent or so, due to the affect of transactional and usage of users about the data off of the servers causing the fragmentation which isn’t a direct result of reduced capacity, the chances for fragmentation increase as free storage space decreases and the operating system is forced to write data into an everincreasing
number of non-contiguous spaces. The tool comes with the option of manual or auto defragmentation method, by using the automated the same disk volume sees absolutely minimal fragmentation even though it is in continual use by applications and users.
To wrap up the discussion it is better to defragment the operating system files too on regular basis or whenever there is a low server activity on the SQL Server, specifically.
-
There was an interesting discussion about User Defined Fuctions (UDF) performance and patterns, the actual phenomenon is that UDF has got parameters to pass on during the execution.
Just to talk about introduction of such options in the SQL Server history, stored procedures and views have been available in SQL much longer than UDFs, but each of these objects has their niche in SQL Server development. Stored procedures are great for processing complex SQL logic, securing and controlling access to data, and returning a rowset to a calling routine whether that routine is a VB or any other frontend program or another TSQL batch. Unlike views, stored procedures are compiled, making them ideal candidates to represent and process frequently run SQL statements. Views are great for controlling access to data, but they do it differently than stored procedures. Views are limited to only certain columns and rows from the underlying SELECT statement that generated the view. Thus a view is often used to represent a commonly used SELECT statement that may join several tables, employ a WHERE clause, and expose specific columns. Views are often found in the FROM clause of a SQL statement joined to other tables and views.
Scalar value-returning UDFs are most similar to what many programming languages refer to as functions. They return a single value consisting of a scalar datatype such as integer, varchar(n), char(n), money, datetime, bit, and so on. UDFs can also return user-defined datatypes (UDDTs) if they are based on a scalar datatype. With UDFs that return either inline or multistatement tables, a rowset can be returned via the table datatype. However, not all datatypes can be returned from UDFs. For example, a UDF cannot return a value of any of these datatypes: text, ntext, image, cursor, or timestamp.
The default behaviour of scalar UDFs will be like single statement execution, not like batch one where the difference between them is optimizer will have to pickup the best plan to execute. Due to this SQL Server must execute each function on every row, using any function incurs a cursor like performance penalty. One way UDFs are simple to complete the task for that reason and can be coverted into single-expression, the actual execution of Scalar UDF will have the performance issue whatever way you may try to optimize the list of TSQL statements. So the actual execution time will vary due to fetching the large number of rows, in a sense 1000 or more. As it works like cursor mechanism, during the followup of each row the UDF is executed which will cause usual performance loss. To add more fuel to the problem if the WHERE clause on that UDF includes any filtering then complete execution will have to move slowly as it must invoke each UDF on each row where it is used in the select list, this happens when creating the results of the query to pass to the next stage of query processing.
Coming to the subjective discussion the 2 logic scenarios to be followed as declare local variables, set values to the variables & issue a query referring to the variables in the query filter. So the execution of batch will have the first 2 steps and then actual part of execution needs to be optimized which is in vaccum as the variable values are unknown which ends up using a hard coded guess (which happens to be 30% selectivity for a range filter). On the other scenario (2) with specified constants in the filter, SQL Server expands the definitions of of the UDFs, substituting the parameter values with the specified constants, and the optimizer will optimize the expanded query with knowledge of the values in the filter. So the actual execution part of UDF will be far better on the second scenario of specified constants will be better as indexes are better used. To construct above I have referred following TSQL statements from Microsoft technet links (revisiting old material):
use northwind;
go
create function dbo.FuncWV(@d as datetime) returns table
as return
select orderid, orderdate, customerid, employeeid
from dbo.orders
where orderdate >= @d;
go
So here if the code has the declaration of variable to a value as follows::
declare @myd as datetime;
set @myd = '19980506';
select orderid, orderdate, customerid, employeeid from dbo.f1(@myd) as t;
So look at the execution plan of above query where you will see the predicate identified the optimize which is the ORDERDATE variable: orderdate >= @myd
So the optimizer is required to fine tune the query where this value in the predicate is unknown at optimization time:
select orderid, orderdate, customerid, employeeid from dbo.orders where orderdate >= ?;
Scalar datatype-returning UDFs can be used in various situations to make the code more maintainable, reusable, and less complex. This can be very useful when the same segment of T-SQL code is used in several places, perhaps by several stored procedures and batch SQL statements. Every now and often I see that UDF execution involves full table scan (clustered index too) causing performance loss as per the range filter causing the hard-code guess of rows selectivity. As the same discussed above on scenario 2 and to demonstrate the same the above code has been modified to use a constant:
select orderid, orderdate, customerid, employeeid from dbo.f1('19980506') as t;
So the above query's execution plan will have to predicate identified as per the optimizer, which is ORDERDATE >=.
As per this optimizer already takes on the value in the predicate which helps to fetch selectivity estimate to nearest possible with better performance, such as:
select orderid, orderdate, customerid, employeeid from dbo.orders where orderdate >= '19980506';
When you look at the execution plan the lookup on the orderdate column, calculate the selectivity of the predicate with an option of using index on orderdate column. Also in some cases the optimizer will not use previously cached plan, due to the query statements against the function use similar selectivity ranges. Back in SQL 2000 days UDFs are seen as evil, it seems that SQL Server’s query optimisation fails to deal with UDFs properly. Basically it optimises and builds the execution plan without ever looking inside the UDFs to find out what they’re doing, so in all the optimisation isn’t worth the cycles it consumes.
In some cases when you have nesting within UDF then one catch with nesting functions is that built-in functions that are nondeterministic, such as the getdate function, cannot be nested inside of another UDF (otherwise, a SQL Server error is raised). A nondeterministic function is one which may not return the same result when called multiple times with exactly the same parameters. The getdate function falls into this category since every time it is called, it returns the new current date and time. Another commonly used nondeterministic built-in function is the NewID function. It is also nondeterministic as it always returns a unique GUID and, as such, the NewID function is not allowed to be nested within a UDF. SO as per the BOL discussion talking about table-valued UDFs: Within the category of table-valued UDFs there are two sub-types: inline table value-returning UDFs and multistatement table value-returning UDFs. UDFs that return inline tables return a rowset via the SQL Server table datatype. They are defined with a single SELECT statement making up the body of the function. Inline table value-returning UDFs cannot contain additional T-SQL logic outside of the SQL SELECT statement that defines the table it will return. However, they are simpler to create than UDFs that return multistatement tables since they do not have to define the exact table structure to be returned. UDFs that return inline tables extrapolate the structure of the rowset from the SELECT statement itself.
To sumup the discussion assume that there is a UDF that performs a SQL SELECT statement that takes one second to execute. If this UDF is used in a SELECT or a WHERE clause it will be executed for every row. Thus the time the main query takes to execute could increase drastically depending on such factors as the number of rows evaluated and returned and the types of indices in place. Before using a UDF in this type of situation, carefully weigh the options and do some performance testing. There are good references on the blogs about the UDF performance, that I would like to refer:
An Introduction to SQL Server Scalar UDFs
Alexander Kuznetsov : Not all UDFs are bad for performance.
Are UDFs Harmful to SQL Server Performance?
Using schema binding to improve SQL Server UDF performance