Intelligence, the Fact of Reason

The Importance of Intelligence

Optimal database performance is perhaps the biggest challenge managers and administrators face as they work to ensure delivery of BI to data warehouse (DW) users. No one likes to wait for answers. Squeezing every last bit of performance out of the hardware and software to make those DW queries fly is more important than ever. This is especially the case as businesses increasingly manage BI systems as mission-critical applications and move toward "real-time" DW for some apps.

As businesses push to reduce the data latency between DW and operational systems, the DW begins to look more like a transactional system. For a DW to deliver near real-time information, you either have to update it more frequently or access data directly from operational systems. Either way, the push to reduce latency changes the nature of database performance to support the DW. DWs were created to separate resource-intensive analytic processing from shorter duration, frequent transaction processing. If the two worlds now come back together, the churn pressure on the database system will be significant.

But even if the same data is accessed for both transactional and analytic apps, real-time goals will force different performance metrics on DW databases. Database metrics evaluate transactions based on speed; that is, their ability to "get in," "do their work," and "get back out again." DW queries can't be measured in the same way. The analytic nature of BI means that DW queries perform large and sometimes complex operations to arrive at heretofore-unknown correlations in the data. Speed isn't as important as accuracy. A better metric for such queries is repeatability; that is, does the same query against the same data at the same time of day result in similar performance characteristics? Managing performance in this way requires a different mindset.

Another challenge is that many DW queries must access aggregated data. With a separate DW environment, DBAs can denormalize physical database structures to create aggregate tables containing redundant data summarized from other data. Aggregate tables are supposed to optimize performance and increase data availability — both noble goals. However, these tables add to the size and the complexity of managing the environment.

Modern database systems are beginning to support aggregation in a more easy-to-use, performance-oriented manner. DBAs can create materialized query tables (MQTs) and materialized views that essentially turn a query into a physical table. But MQTs are automated and aware; when DBAs use MQTs, queries accessing the base tables don't need to be rewritten. Instead, the database optimizer understands the relationship between the MQT and the "real" tables accessed to create it. The DBMS itself decides when and how to use the MQTs versus the base tables. Automated refresh and management commands simplify MQT implementation. MQTs bolster a DBA's ability to deliver a real-time DW because they lead to better performance against normalized database implementations.

IBM's DB2 uses the MQT term (previously known as "automated summary tables") to refer to this functionality. Oracle offers similar functionality under the name "materialized views" (previously known as "snapshots"). Microsoft SQL Server also uses the term "materialized view." The IBM term, though, is a bit more accurate because a view is virtual, whereas a table is physical. Each of these DBMSs provides varying degrees of support for query rewrite as well.

-Craig S. Mullins

Back