Welcome to SSQA.net Sign in | Join | Help

SQL Server Storage Engine & Tools (SSQA.net)

SQL Server Tools includes storage engine that includes a complete set of graphical tools and command prompt utilities that allow users, programmers, and administrators. SSMS, SSRS, SSIS & SSAS are most commonly used tools.
SQL Server 2005 Partitioned Tables and Indexes - learning curve

When do you need data partitioning?

The data you see in relational database may need such a strategy if the volume is huge, for instance the data to store event-log information having the schema such as:  [Id]  INT (make this primary key clustered), [Unit_Id]  UNIQUEIDENTIFIER, [EventType_Id]    UNIQUEIDENTIFIER, [TimeStamp]  DateTime, Description nvarchar and with lots of other fields. So in the realtime this table can easily have a scale of 100 million rows in it (based on the events occurrence on the system). The usual pattern of data volume will be simply inserted, not with any update or deletion. For performance sake I have shown to specify the ID as an identity clustered primary key to prevent nasty page splits, etc. So then coming to process of data insertion it is performed with a single-insert process in continuous manner. As it refers the unique constraint can be built on ID, EventType_ID in addition to the TimeStamp column, so looking at the WHERE clauses it might include the conditions alone ID, ID and EventType or event ID, EventType and TimeStamp. So it is better to have another non-clustered index on TimeStamp for the sake of range matches on the date column.

Had enough on design aspect, now coming to the query performance on the SARGable columns atleast consider make sure to use unique index than with a unique constraint that are included in covering columns. Do not ignore the lookups from the Execution Plan and that too on a clustered indexes will prove expensive, so talkin at 2% of million rows means more than 1000 rows that are extracted with every query. Here it comes the data partitioning that is specific to the number of rows, having the performance with partition keys and always make sure to have column with filter-on which will be a good candidate for partition key. When you have the queries that includes joins then make sure to have composite index in place on a persisted column(s), so that any form of expression or conversion on a column used in a where/join will prevent index seeks for that filter. Also within the partitioned tables say the target table has any FOREIGN KEY constraints, the source table  must have the same foreign keys defined on the corresponding columns, and these foreign keys must reference the same primary key as those of the target table.

So having such a huge relational data and simulating the data warehouse methods will have a  comprehensive source of data and an infrastructure for building Business Intelligence kind of  solutions that are very useful to build better performed reporting applications with complex ad-hoc SQL queries. So when you talk about BI you will have data warehouse consists of dimension tables and fact tables, that are used usually smaller in size than fact tables and they provide details about the attributes that explain the facts. For instance dimensions are item, store, and time and fact tables represent business recordings such as item sales information for all the stores. Other factors to keep in mind while building a strategy such as volume, data loading (inserts), index maintenance, data consistency and backup/archive strategy.

So taking this into relational world data/index partitioning facilitate the effective management of highly available databases, that can influence several factors such as partitioning strategies, design and implementation along with data management consideration for your database environment. The storgate plays importatnt role in achieving performance that should be optimally configured and I encourage to work closely with you hardware vendors when configuring storage for a SQL Server implementation when you such a huge data platform. This includes the design  for any data warehouse, there exists a balance between isolating data at the physical level and arranging the design such that the hardware is fully used.

The placements of data such as active partition and archive partition must be placed on common physical spindle but seperated on different logical volumes for better management. For instance an active partition can be created on logical volume L1 and an archive partition can be created on logical volume L2, that talks about these two partitions are logically isolated, they share the same set of physical spindles. Say if you do not have such an option then make sure to isolate the active data from archived data on physical spindel level, that will decrease number of rounds on disk read/write factors.

This is a huge subject that needs a careful consideration, best resources are available on the web in addition to latest SQL Server Books Online such as to get better understanding on best practices and building partition strategies:

 

Using Partitions in a Microsoft SQL Server 2000 Data Warehouse
http://msdn.microsoft.com/library/default.asp?URL=/library/techart/PartitionsInDW.htm.

SQL Server 2000 Incremental Bulk Load Case Study
http://www.microsoft.com/technet/prodtechnol/sql/2000/maintain/incbulkload.mspx

SQL Server 2005 Partitioned Tables and Indexes by Kimberly L. Tripp
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnsql90/html/sql2k5partition.asp

 http://www.sqlskills.com/blogs/conor

 

 

 

 

 

 

 


 

Posted: Wednesday, March 12, 2008 1:01 PM by SQL Master

Comments

Other SQL Server Blogs around the Web said:

When do you need data partitioning? The data you see in relational database may need such a strategy

# March 12, 2008 2:48 AM

thienqnguyen said:

Is it possible to move a partition from one server to another as part of archival process? From all the reading it does not seem like it, but has anyone been successful in doing that?

Thanks.

# March 19, 2008 9:05 AM
Anonymous comments are disabled