SQL Server replication FAQ and important measures to consider when replication performance is sufferred
Here we are going to highlight the majority of tasks you need to perform whenever user complains or you observe about SQL Server Replication performance is affected, adversly.
For any performance tuning issues there will be other factors involved including SQL Server, such as in replication the performance can be measured along the dimensions latency, throughput, synchronization, concurrency and server resources.
The latency takes the higher importance as the amount of time it takes for a data change to be propagated between nodes (Publisher & Subscriber) in a replication scenario irrespective of whether it is Transactional, Merge or Snapshot. Then it comes to the throughput where the replication activity can be measured over a period of time and how long it can keepup with such performance. The part of synchronization can be dealt by using periodic subscription if there is a huge data refresh and keeping up the concurrency where it executes along with other resource intensive processes on the SQL Server. Among all of these the main point is also look for the consumption of the resources on the server, such as hardware and network resources used as a result of replication processing.
As we talked latency and throughput are import to transactional replication, because systems built on transactional replication generally require low latency and high throughput. The other 2 factors of concurrency and synchronization (duration) are most relevant to merge replication, because systems built on merge replication often have a large number of Subscribers, and a Publisher can have a significant number of concurrent synchronizations with these Subscribers. All of these 4 can be dependant on the consumption on the server and how busy the server is managing such processes.
So it is always better to develop a baseline in performance as mentioned in Replication Monitor and System Monitor helps to determine typical numbers for the above dimensions of replication performance. To determine what else can affect is server/network hardware, distributor configuration with publication design and options. Make sure to go through the subscription/snapshot options with agent parameters. Do not run or execute replication tasks whenever database maintenance tasks are scheduled such as DBCC REINDEX or CHECKDB statments.
In a typcial scenario of merge replication verify that you are using the appropriate filtering options and that columns in the filter are indexed. Within the replication monitor from the top-level error reported by the agent is not enough to find the root cause of the problem. This top-level error is likely to be followed by a more specific replication error such as a server-side error reported by the SQL Server database engine or by the operating system. Also for the conflict make sure ot use Advanced Merge Replication Conflict Detection and Resolution setup. Whereas, within the transactional replication and experience of high latency when performing batch operations on published tables consider replicating the execution of a stored procedure to perform the batch operation at the Subscriber, as the server and network factors are very important to reduce the latency.
The usual set of practices for best placment of proper allocation of database data files and log files. Use a separate disk drive for the transaction log for all databases involved in replication, as the log is hammered whenever a simple replication process is executed. You need to monitor the schedule of replication whenever a set the minimum and maximum amount of memory allocated to SQL Server services, as it may react slowly due to such settings and with a proper load testing should prove better results, if possible try adding memory to servers used in replication, particularly the Distributor (edition specific).
After going through the server/network based concepts it is important to walk-thru the subscription options within the replication, say when you have large number of subscribers better to use pull subscriptions and also ensure to reinitialize the subscription if the data updates are lagging behind as compared to the publishers. Ensure to place the snapshot folder on local drives of the Distributor and by keeping seperately from database data/log files drive. Do not run or schedule the snapshot agent only during busy times on the server and if possible use at off-peak times. Create a seperate agent when the multiple publications exist on the same publisher and ensure to run the agent 'continuously' instead of 'on-demand or very frequent' schedule.
Finally, you can make use of SYSMON (PERFMON) that provides a number of counters associated with replication agents and can be used to troubleshoot replication performance. Along with SYSMON, PROFILER is helpful too for further digout on the performance by a capture process and save data about server events to a file or table to analyze later. For example, you can monitor a replicated database for stored procedures or other processes that are taking a long time to complete.