SQL Server ETL Performance during Data Load Optimization
Do you have a large ETL process to finish on day-to-day basis?
Is your ETL process is suffering with performance loss during the BULK insert or export task?
Are you using SQL Server 2005?
If it is YES for all the above then you can take advantage of SQL 2005 in speeding up the ETL process to perform better. Also it is one of the best method I suggest to fine tune the performance. Say if you have a huge table and every night the same table is populated with another BULK INSERT task and on top of this if that table has indexes for query tuning, then obviously your ETL process will suffer the issues while inserting such a huge load. In any case you should go BULK INSERT route to perform better and take care of transaction log by using BULK-LOGGED recovery model on that database.
Within SQL Server 2000 you can disable foreign-key constraints. When you re-enable them, SQL Server verifies that the data comply to the constraint. But you cannot disable primary-key or unique constraints, nor indexes.
In this regard within SQL Server 2005 the new feature of disabling INDEX would help to some extent, as documentation refers:
Disabling an index prevents user access to the index, and for clustered indexes, to the underlying table data. The index definition remains in metadata and index statistics are kept on nonclustered indexes. Disabling a nonclustered index or clustered index on a view physically deletes the index data. Disabling a clustered index on a table prevents access to the data; the data still remains in the table, but is unavailable for DML operations until the index is dropped or rebuilt. To rebuild and enable a disabled index, use the ALTER INDEX REBUILD statement or the CREATE INDEX WITH DROP_EXISTING statement.
But there is a catch if you have a replication as the BOL refers If a table is in a transactional replication publication, you cannot disable any indexes that are associated with primary key columns. These indexes are required by replication. To disable an index, you must first drop the table from the publication. Another good reason to disable index is if disk space is limited, it may be helpful to disable the nonclustered index before rebuilding it. In this case you can easily disable any constraints and re-enable them without having to rebuild them. Also in terms of optimizing load times, the general things to consider would be to load all data "as is" into staging tables and then to perform all transformations in the staging table before loading it into the final destination. Most will depend on the specific environment you find yourself in, as well as the structure and volume of the data. In any case you have to test this within your development platform to ensure it is working as expected as it can make significant difference. You can always enable the constraints back with a simple ALTER TABLE.
**__________________________________**
SQL Server MVP, Sr. DBA & industry expert.
-
Knowledge is of two kinds. We know a subject ourselves or we know where we can find information on it. It is also a power and you will gain by sharing it.