Chapter 1. Introduction

Database creation is a complex task and involves tuning many parameters and can be done using Oracle’s database creation wizard [15]. If many databases need to be created, the database creation wizard will need to be used repeatedly for each new database. This can be cumbersome especially when only a minimum number of essential parameters differ for each database.

Data warehouses [8] are databases that are loaded with subsets of relevant data from a source database. These warehouses may contain informational data extracted from operational data in the source database. The tables in warehouse databases are based on the tables from the source database. Hence, it is essential to transform structures of the source database into structures for the warehouse. Nowadays, this is done by manually exploring and creating such a mapping. This process is both tedious and time-consuming. Also, users need to be technically trained to perform this task.

There are a few other shortcomings in the present system. In the warehouse schema users may add new attributes to tables, these new attributes are the aggregates of the attributes of the master database. As a result, when data is copied from the master database to the warehouse database, data for these aggregate functions need to be computed at run-time during update, causing more delay. When this update is in progress, applications accessing the warehouse will not get access to accurate data, leading to lack of synchronization.

1.1. Objectives

These problems form the basis and the motivation for this thesis. We present SagaMap (Semi Automatic Schema Generation and Mapping) – a tool that works towards providing an interface to accept from users all the required information to generate a new database and creates an empty data warehouse. For a given source database, the tool aims at arriving at an appropriate mapping to create a structurally related warehouse. After a mapping has been formalized, tables for the new warehouse are created. Then, relevant data is automatically transported from the source database to the newly created warehouse.

A framework has been built to facilitate automatic updates of data warehouses. It has been designed in a way that the there can be multiple copies of the warehouse database, where each copy is an image of the warehouse database. Copies that need to be updated are taken offline and applications that need to access the warehouse database can now access any of the other image warehouses. SagaMap’s Switching Application – Image Switcher, switches between databases in a way that is totally transparent to applications so that they do not realize existence of multiple warehouse databases.

As a result, using SagaMap, the end user can directly create the desired warehouse schema. A major advantage in using this tool is the automation of SQL script generation for schema creation and data management. The use of such a tool gives the user more time to design his schema more accurately and efficiently rather than developing the code itself.

1.2. Overview

The remaining chapters are organized as follows:

Chapter 2 discusses the important concepts of an automated data warehouse. We also present an interesting example to explain the significance of this project. Chapter 3 explains the functional architecture, system architecture and design decisions. Chapter 4 describes the creation of a data warehouse and steps users need to follow to use SagaMap for this. Data warehousing involves mapping subsets of relevant data from the source database to the target database. This is discussed in Chapter 5. Chapter 6 discusses how the tool performs automatic updates of data warehouses. Chapters 7, 8 and 9 describe the additional components developed for Automated Site Generation for the Benchmarking Engine. Chapter 10 discusses the contribution of this project and avenues for future work on this tool.

Questions/Comments
Send e-mail to : ssharatk AT cs DOT uno DOT edu