SQL Server 2005 Database Mirroring choosing optimum value for auto-failover timeout?
Choosing a better plan to reduce outage to your database application is a beginner to provide High Availability to your application, irrespective to database or server. You may be aware that SQL Server 2005 version provides the database mirroring which is an advantage to avoid the costs and pre-requisite for Clustering, as it comes when you think about High Availability.
So when you setup the Database Mirroring the partners within this session will be Principal, Mirror and witness, just to talk about witness instance requirement that it should be used only when you intend to use high-safety mode with automatic failover. In high-performance mode, for which a witness is never required, why and all subject I will cover within another blog post J.
For the subject purpose assume only HIGH SAFETY mode is used where you have an option of choosing Automatic failover mode within the database mirroring sessions having witness aside for quorum purpose. As it is one of the best implementation I choose high-safety mode with automatic failover as database is synchronized, if the principal database becomes unavailable, an automatic failover occurs. During this action the mirror server to take over the role of principal server and bring its copy of the database online as the principal database, like-to-like with 'no (virtually)' data-loss. I believe this is one of the best practice to provide the database be synchronized prevents data loss during failover, because every transaction committed on the principal database is also committed on the mirror database.
So think about how the automatic failver should take into affect, as a part of role switch, the amount of time that database mirroring will be out of service depends on the type of role switching and the cause of the role switch. The technical documentation for SQL Server refers that:
- For automatic failover, two factors contribute to the time service is interrupted: the time required for the mirror server to recognize that the principal server instance has failed, that is error detection, plus the time required to fail over the database, that is failover time.
- For a forced-service operation, though a failure has occurred, detecting and responding to the failure depends on human responsiveness. However, estimating the potential interruption of service is limited to estimating the time for the mirror server to switch roles after the forced service command is issued.
This is where the communication between Principal and Mirror with a Witness instance plays important role, which is a default timeout value of 10 secionds. That means if any of the partners are not ping-able for 10 seconds it can throw a timeout message and the database can failover. At the end of the day you should consider the real-problem wide of failover and not with any other errors from the operating system level that can cause failures such as, Network errors or I/O errors or Process (memory/drivers) errors, etc. So take advantage of causing such an auto-failover of default timeout value, you can make changes by configuring the partner timeout value as shown in the following T-SQL statement:
Use Master;
ALTER DATABASE [Database_Name] Set Partner TIMEOUT [Number_of_Seconds] ;
So here comes the big question what would be optimum value to set, the answer is it depends on your application method to control the committed transactions, where keeping the minimum set of transactions to commit for better availability & performance. Within a scenario of highly-transactional application such as stock exchange trading application, I setup with 25 seconds as the instance (witness) must receive a ping on that connection within the time-out period defined by the mirroring time-out value, plus the time required to send one more ping. Receiving a ping during the time-out period indicates that the connection is still open and that the server instances are communicating over it. On receiving a ping, a server instance resets its time-out counter on that connection.
Just to cover up the difference between running asynchronous & synchronous sessions, where the asynchronous will have default time-out value of 10 seconds that cannot be changed and within synchronous sessions, you can control the time-out period as mentioned above. But always make sure you have tested the failover methods and transaction checking within your environment before taking my word, another best practice.
**__________________________________**
SQL Server MVP, Sr. DBA & industry expert.
-
Knowledge is of two kinds. We know a subject ourselves or we know where we can find information on it. It is also a power and you will gain by sharing it.