SQL Server BI - Managing Large Datawarehouse environment, guidelines

Published 17 September 08 01:24 AM | SQL Master 

What will be the main challenge for a DBA/User to manage a large Data Warehouse (DW) environment, size does matter and also think about downtime you need to deploy whenever any patches are required.

Before that you need to be perfect or well-versed with Data partitioning & Data lifecycle within that DW environment.  Also buidling up the backup and restore strategy is most important as you need to maintain the health of this DW management system. So you might ask what do you mean by Data lifecycle management of DW, the initial aspect is inserting new data, maintain the aggregations and archiving the data in addition to the usual backup/restore & data integrity tasks. You must be aware of not to slip the SLAs within the data environment as such large databases, often multi-terabyte will have limited batch windows, with much to be done.

Now a days the data access is globalized rather than a localized aspect, as it can further decrease batch window due to the differential time zones. So you need to be very careful to achieve the SLA with the predefined batch windows (SLAs) and not to mention about the performance and availability should be assumed. You will be the incharge of data update/modification as they are predicatable when it comes to the lifecycle management and also you need to deploy the methods of updating only small subsets of data are actually volatile.

As we referred about Performance tuning it is good that most data warehouse queries are designed to follow a star schema and can process hundreds of millions of rows in a single query. By default, the SQL query optimizer detects queries against star schemas and creates efficient query plans for them. One method the optimizer can use to generate an efficient plan is to use bitmap filtering. Technet and books online refers that using bitmap filter will  compact representation of a set of values from a table in one part of the operator tree to filter rows from a second table in another part of the tree. Essentially, the filter performs a semi-join reduction; that is, only the rows in the second table that qualify for the join to the first table are processed. Not only that using such an optimized bitmap filters have the following advantages: (source BOL)

  • Filtering from several dimension tables is supported.
  • Multiple filters can be applied to a single operator.
  • Optimized bitmap filters can be applied to more operator types. These include exchange operators such as the Distribute Streams and Repartition Streams operators, table or index scan operators, and filter operators.
  • Filtering is applicable to SELECT statements and the read-only operators used in INSERT, UPDATE, DELETE, and MERGE statements.
  • Filtering is applicable to the creation of indexed views in the operators used to populate the index.
  • The optimizer uses cardinality and cost estimates to determine if optimized bitmap filtering is appropriate.
  • The optimizer can consider more plans.
     

Now comes another aspect of management is data partitioning, as it will help to seperate data into sections based on the values in a single column, that will enable to place the partitions selectively on the disk such as using filegroups. As you know you can partition the tables and indexes, also the data can be easily added and removed from partitioned tables via a SWITCH operation (refer to updated Books oNline for more information). You will agree that most management activities can be performed at the partition level and using such a feature will reduce the problems to address and FYI it is available only in Enterprise Edition.

Choosing the right partition key is a KEY, as it will help to know how the data is inserted and removed and having the disk layout in front of you will keepup the performance of the DW environment. The main advantages as we spoke earlier in using partition is it will enable the backup / restore flexibility, using a  piecemeal restore method with a consistency checking flexibility to manage the performance, administration & file fragmentation of databases. So if using SSAS, use MDX to identify unused dimension members, as such dimensions in the RDW can be flagged independently of removing from the cubes (dimension and cube re-processing).

Then it comes to the guidelines on the backup mechanism you can adopt, using the  partitioning method you can only backup partitions that have changed and usage of piecemeal restore to most used data online first and create offline area for resource or time intensive operations (consistency checking, index fragmentation). When saying backup the partition you need to be aware that you cannnot perform the backup on partition level rather you need to keep that partition on to a seperate filegroup when you can perform the filegroup backups easily to control the granularity and with such a way backup operation can be simplified to backup read/write in a single operation. Ok enough talk on backup then how about the restore flexibility! Using such backup mechanism obvisouly you will have the easy process to perform  restore Primary (data file) + most highly queried partitions (filegroups) first to get users online quickly that will give you more peaceful time to segregate management activities, managing the data filegroup such as READONLY & READWRITE you can perform the restore only read/write filegroups to a separate server for database health analysis.

So to sump on the backup/restore mechanism activities you can develop an overnight plan to toggle DB to read / write, then perform the load updates to dimension tables &  fact tables. Having the data partition method you can easily update summary tables or cubes based on the partitions that are updated. Then perform the filegroup backup that are associated with the updated partitions and then change the database to read-only mode. This will also help you to keep the performance of our Reporting mechanism with optimum performance as the query to data cache will have higher advantage, as appropriate. Intermittently you can also build & implement a plan to backup and restore the primary and read-write filegroups on a secondary server and you must script necessary index maintenance of read-write filegroups. For index maintenance & optimization take advantage of using  DBCC CHECKFILEGROUP on read-write filegroups as using DBCC CHECKTABLE will have slower performance issues on larger tables over multiple weekends, if the weekend batch cycle is inadequate to run them all.  Finally, you can manage the less volatile partitions by performing the de-fragment indexes and  database consistency checking by making the partition filegroup read-only with a final backup of the filegroup.  Just to addup you must perform the backup after you enable the filegroup to READONLY as it will be a final copy of that filegroup.

The last one is to referring the links and web-resource on the above topic:

SQL Server 2008: Data Warehouse Query Performance

SQL Server TechCenter - Downloads, documentation and more - webcasts and demos

Data Warehousing Microsoft SQL Server 2008 and its integration ...

Comments

# Other SQL Server Blogs around the Web said on September 17, 2008 2:10 AM:

What will be the main challenge for a DBA/User to manage a large Data Warehouse (DW) environment, size

# All about Business Intelligence (SSQA.net) : SQL Server BI - Managing Large Datawarehouse environment, guidelines said on September 17, 2008 2:50 AM:

PingBack from http://sqlserver-qa.net/blogs/bi/archive/2008/09/17/sql-server-bi-managing-large-datawarehouse-environment-guidelines.aspx

Anonymous comments are disabled

About SQL Master

**__________________________________** SQL Server MVP, Sr. DBA & industry expert. - Knowledge is of two kinds. We know a subject ourselves or we know where we can find information on it. It is also a power and you will gain by sharing it.

Search

Go

This Blog

«September 2008»
SMTWTFS
31123456
78910111213
14151617181920
21222324252627
2829301234
567891011

Syndication