A Database Patch Management Framework in Perl

Brent S.A. Cowgill, B.A. Sc.

[zardoz@infoserve.net http://www.oocities.org/gurucoder]

Raison d'être.

A database patch management system eases the ongoing development of a live database application.

Traditionally, databases have been maintained by using SQL scripts to create, alter or drop tables and to migrate data or populate new tables with information required by the underlying application. SQL is adequate for the task of altering database structures but lacks the power of a true programming language to manipulate or import complex data easily.

This article presents a database patch management framework which combines the simplicity of SQL with the power of Perl. Using Perl as a patch management utility makes it easy to import data from many file formats and can even be used to automate database imports where the source data is taken from a URL over the internet. If your application program is also written in Perl, then you gain in re-usability because your database patches can make use of your application objects when populating or performing operations on the database.

If you fail to plan, you plan to fail.

The typical (or perhaps atypical) development cycle for a modern web based application consists of initial development and launch followed by ongoing improvements to the application and underlying database structure. Once launched, the application needs to be online as much as possible. When major changes are made to the application, the developers must ensure that the live site's HTML, CGI and database structure can be upgraded without any risk of prolonged downtime. In order to support this development process it is useful for the development team to have their own server installed with a copy of the live database, HTML and CGI scripts. Developers improve, test and upgrade the database and CGI scripts on their development server and once all the bugs are worked out, deploy their changes on the live server. Using a concurrent version control system (like CVS) can ease the task of upgrading the HTML and CGI scripts but does not help in the area of synchronizing database structure. Using CVS along with a patch management system enables developers to bundle a set of code and database changes together into a release which can be installed while the site is live or with minimal downtime.

Before getting in to the details of the implementation, we lay out the requirements for the patch management system:

simple to install the patch management table to an existing database.
isolate all the changes needed for one database change into a single patch.
assign a description and a patch number to all patches.
the ability to apply and back out patches on different databases easily.
the ability to list all the patches currently applied to a particular database.
the ability to mark a patch as completely reversible or only partly so.
when creating a patch, the ability to state that the patch depends on the presence of other patches as a pre-requisite.
the ability to check if a patch can be installed because all pre-requisite patches exist.

It must be stressed that the purpose of this patch management system is to aid the development of web based database applications. It allows for developers to quickly and easily configure their own personal copy of the live database so that it will work with the version of code they are working on. As other team members add new features, a developer can quickly bring their database configuration up to date when they update to the latest code. Conversely, if a developer needs to back up to a previous version of the code to test something, he can back out the appropriate database patches to test the old code against a proper database configuration. To coordinate these kinds of changes, it is useful to keep a version controlled file listing the patches needed by module. Developers update this file as new code is developed which relies on particular database patches.

There are difficulties involved in patching a live database. If the patch is simple and involves only: creating new tables and populating them with initial, static, data; dropping unused columns or tables; altering tables to add columns or indices; or changing the contents of application read only tables; then there is no great difficulty in applying a patch to a live database. If, however, the patch must manipulate the data in tables that user applications are continuously modifying, then great care must be taken when writing such a patch. if the patch must be applied live, the patch author will have to lock all the tables involved in the update process. If the patch is a time consuming operation, it will present a delay to all users of the system while they block, waiting for table locks to be released. Alternatively, row locks and a two phase mark and sweep procedure can be used to minimize interference with normal operations. In such situations, it may be better to put the system into a maintenance mode, and inform web users that the system is down for scheduled maintenance.

The patch management system presented here will be implemented with a base package (class) which implements the core patch management functions and individual patches will be derived from the base package. In addition, there is a database layer with a few support functions for operating on the database. Finally, there is the patch manager script which cements everything together into a complete system. The patch manager is based on the DBI module and is designed to operate against a MySQL database. However, a few changes to the database layer are all that are needed to use it with a different database. Figure 1 presents a Class, Responsibilities, Collaborators (CRC) model for the completed patch management system.

Let's crack 'er open and look under the hood.

The focus of this article is on the patch management system, however we must have a way of connecting to the database in question. We use a pair of modules, config.pm and DBA.pm to configure and connect to our database. These modules are presented in Listing 1 and Listing 2 but we defer an explanation of these modules to the Appendix for those who are interested. At this point we will only comment that using a Perl module for configuration information gives us flexibility. In the future, we can decide to place site configuration information in shared memory, in ASCII text files or stored in tables in the database. The Perl interface can remain the same and clients will not be affected by the change.

Better put a sweater on, it's cold outside. (on to the next layer...)

Having laid our foundation, we are able to proceed to the implementation of DBPatch.pm which will serve as the base class for all patches. Listing 3 presents this module. Let's consider what this class needs to do. First it must be able to record and check information about patches. Second, it needs to provide a framework for application and backing out of patches, and for checking patch dependencies. Finally, it needs to create the patch management table and, for completeness, we give it the power to destroy the patch management table; should we ever decide we can make do without them. But mostly because that makes automated testing easier.

We begin, in lines 10-13 with the standard fare for new(), simply stashing away the database connection provided in our self hash for safekeeping. Following that, we provide a method dba() to give our derived classes access to our database connection. Then we implement is_initialized() which checks for the existence of the patch management table and returns true if such is the case.

Followers of the Extreme Programming methodology are probably screaming now for some unit tests. Rather than doing the whole use Test and use Test::Harness we will use a simple script, humbly called go.pl which is presented in Listing 4. Lines 10-12 of go.pl are responsible for loading the application configuration, connecting to the database and creating an instance of a new DBPatch object to begin testing. We immediately begin by testing out the is_initialized() method, which should return false at this point. In lines 13-14, we call drop_patch_table() and then create_patch_table() to ensure the patch table is in a known, empty state.

Meanwhile, back in DBPatch, Listing 3, we skip forward to lines 154-167 wherein we define these methods which bring the patch management table into existence and oblivion, respectively. Note that we simply feed a 'here' document containing the SQL CREATE statement directly to our DBA::do() method to create the table. Likewise, we use do() to issue our DROP TABLE statement in drop_patch_table().

Having brought our table into existence, we need to set patch entries, back them out and check for their existence. The script go.pl performs tests on these methods in lines 16-25 of Listing 4. First, we call the methods which check for a patch, and check if a patch is backed out, when no patches exist, then we set a patch, change it's description and finally, call the methods which check the patch once more. We finish up by backing out the patch and checking its status once more.

The methods of DBPatch tested above are implemented in lines 26-54, and 187-229 of Listing 3. At this point you will notice a simple notation which has been adopted for this module to separate methods which are intended to be private, inherited and public. Private methods have a prefix with two underline characters. Methods which are intended to be implemented by derived classes are prefixed by a single underline. Public functions are free of such adornments. Of course, Perl doesn't restrict callers from invoking the private or inherited methods, but the underline should make you think twice before calling these functions, which is all the notation can really hope to accomplish. It also sorts the functions nicely if you use a class browser (bonus!)

The check_patch(), backout_patch() and __patch_entry() methods make use of our DBA::one_row_array() method to check the status of a patch number by issuing a simple SQL SELECT statement. The __set_patch(), __backout_patch() methods use a REPLACE INTOand UPDATE SQL statement to create or update the row corresponding to the numbered patch. These SQL statements are delivered via DBA::do(). You will note, in each of these methods we check first that the patch management table exists and croak with an error if this is not the case. Also, in backout_patch() we croak with an error if the patch is not currently applied.

The next methods we prepare to test in go.pl are patch_info() and patch_list(). These methods are implemented in lines 57-108 of DBPatch in Listing 3. In patch_info() we read in the information about the patch by issuing an SQL statement and return a hash reference. It there is no patch entry in the database, we construct a hash ourselves and fill it with information about a nonexistent patch. In patch_list() we make use of DBA::each_row_hash() to loop through all patches in the database and store them in a hash, keyed by the patch number.

We're almost finished with DBPatch, all that remains is to implement apply(), back out() and check_dependencies(). But first we discuss how the derived classes, which implement the patch will be structured. In lines 171-176 of Listing 3 we have implemented the methods that derived classes must provide. These methods simply croak with an error message in the off chance that a derived class fails to implement them. The _patch_number() and _patch_description() methods should return the number of the patch and a detailed description of what features or fixes the patch applies to the database.

The _patch_can_undo() method should return 0 or 1. If the operations performed on the database, by the patch, can be completely reversed, leaving the database as it was before the patch was applied, then this method is allowed to return 1. If such is not the case, then the derived class should return 0. This can be used in the future to give notice that it may be wise to backup the database before applying the patch. By way of example, if a patch is simply creating a new table, then _patch_can_undo() should return 1, as a simple DROP TABLErestores the database to the same state it was in before the patch. However, if the patch drops a column and does not store the contents anywhere else, the patch cannot be undone completely and therefore _patch_can_undo() needs to return 0. Again, we stress that this implementation is a develpment tool. Perfect undo, in the face of patching a live database with continuing user access, requires careful programming by the patch author.

The _patch_dependencies() method needs to return an array reference containing the list of patch numbers which must be applied to the database before a particular patch can be applied. If a patch has no pre-requisites, this method needs to return an empty array reference: []

The final two inherited methods: _do_patch() and _undo_patch() do the actual work of the patch. The first method will usually involve issuing some SQL statements to modify the database, or it could make use of your application objects to perform operations on the database. The second method is then responsible for reversing those changes as completely as possible. If a complete undo cannot be performed, then at least it must be possible to execute _do_patch() after a call to _undo_patch().

Lines 112-132 present a straight-forward implementation of apply() and back out(). All that needs to be done, is to call _do_patch() or _undo_patch() and then call __set_patch() or __backout_patch(). However, before we do that, we do some sanity checking to ensure that apply() or back out() is allowed to be called given the current state of the patch management table.

Finally, check_dependencies(), in lines 136-151 performs the check for pre-requisite patch numbers. This method first retrieves the hash of patch information for the patch number, then adds a key to that hash 'patch_status' which is in turn a hash reference containing an entry of patch information for each dependent patch number. During the lookup of dependent patch information, we set a flag if any of the dependent patches are not applied. Finally, we add a key 'ok_to_apply' to the hash just before returning it to the caller.

That wraps it up for the DBPatch module and we're ready to write our first patch and test it in go.pl.

I AM A DBPATCH.

In order to organize all our patches, we will place them in a directory called patches. And we must ensure that this directory remains accessible through our PERL5LIB environment variable. For the moment, we use go.pl to test our new patch but soon we will introduce a utility script which makes the application and backing out of patches a very simple process.

Listing 5 presents the patches::patch1 module, implemented in patches/patch1.pm. This module begins by using the DBPatch module and then declaring on line 6 that it is indeed a DBPatch object. Then it proceeds with simple implementations of the core inheritable methods of DBPatch. As can be gleaned from _do_patch(), this patch is going to create a new table called Contact in order to store some contact information. The undo method fittingly destroys this table and therefore the _patch_can_undo() method is programmed to return 1.

Back in Listing 3, go.pl finishes up by creating an instance of a patches::patch1 object, applying it, verifying the existence of the Contact table and then backing out the patch and looking again for the Contact table, in vain.

I love it when a plan comes together.

In order to make this patch management system usable, we have, in Listing 6, patchdb.pl. This script consists of: a basic command line parser, with a usage message displayed if your fingers found the wrong keys; an abstract factory method which dynamically loads the Perl module corresponding to the patch number; a few methods to apply, back out and list all the patches in the database and a method to initialize the patch management database.

As can be inferred from Lines 7-37, the command 'patchdb.pl -i' will initialize the patch management system by creating the patch manager table. Using the '-l' option will list all the patches in the database. The '-c' option checks the status of a patch and reports information about missing dependent patches if there are any. The '-c', '-b' and '-a' options need to be followed by a patch number to indicate which patch needs to be checked, backed out or applied.

In this script, the interesting part is Lines 122-133, where the abstract factory method loads the correct patch module for use by the apply_patch() and backout_patch() methods. In line 126, we use the patch module at run time rather than at compile time with our old friend eval. On line 129, we use eval again to create an instance of a new patch object, passing it our existing database connection. Lastly, as a simple check, we test the _patch_number() method to ensure we have loaded the patch number expected. This is prudent as many patches will be constructed by copying an existing patch file and making changes to it. Changing the actual patch number may be overlooked in this clone and mutate process. If we encounter any problems loading the module (syntax errors, etc.) we report them with a die to the caller.

Finally, in Listing 7, we present another patch, patches::patch2, which depends on the presence of patch1. In this patch we begin to see the power of a patch manager written in a programming language instead of pure SQL. This patch creates a Phone table to go with the Contact table and uses our DBA layer to pre populate the Phone table with a default entry for each entry already existing in the Contact table.

Do not go gentle into that good night.

This article has presented a framework for a database patch management system. These modules can serve as a foundation for your next web based database application in Perl. Some ideas for future expansion could include adding a CGI interface so you can remotely check, apply or back out patches from an appropriately password protected URL. This would ease site upgrades tremendously, especially if coupled with a web based CVS client to update the site content and CGI application programs. Another possibility is the addition of a 'can_patch_live' column to the patch management database. This would allow the patch author to indicate that the patch must not be applied live, but requires the system to be brought down so that user activity does not interfere with the patching process. Simple patches, like creating new tables and populating them with initial, static, data, dropping unused columns or tables, changing the contents of application read only tables, would not require the system to be brought down. More complex patches, which manipulate the data in tables that user applications are continuously modifying, require the tables to be locked or must otherwise prevent user access during the application of the patch.

Appendix

This appendix provides an explanation for the configuration module and the database access module used by the patch management system. Listing 1 presents config.pm, a module to provide access to application configuration information. We export a single method, get() which returns a hash containing all parameters for opening a database connection to a MySQL database. Using a module in this fashion leaves us open to future development. We could move some or all of this configuration information into a database, dbm file or an ASCII text file. At such time, we merely change the operation of get() to support the new configuration method. As long as the object returned by get() behaves like a hash (perhaps it's tied to shared memory or something) then clients will not notice the change.

Building on top of the configuration object, we add the database layer, DBA.pm in Listing 2.
We begin with method new(), which accepts a few parameters in the form of a hash reference containing keys for the database name, host machine, port number, user name and password. These can conveniently be provided to new() by calling config::get(). After constructing the Data Source Name that DBI expects, we call DBI->connect() to connect to the database and stash the result in our self hash for safekeeping.

Life begins quite well for our DBA object and we must remember to handle its death just as gracefully. in the DESTROY() method we disconnect using the stashed DBI object. This ensures that the connection will be dropped as soon as our database object goes out of scope in the client application. If we create an instance of a DBA object at the start of our program, we can keep it around and re-use it throughout the application, disconnecting when we finally exit.

Lines 24-25 implement a few short methods to report any errors that DBI has accumulated. In lines 28-38 we implement the method table_exists() which will be used to determine if the patch management table has been created. In this method, we issue the SQL statement 'SHOW TABLES' and scan the resulting rows for the name of the table we have been asked to locate. We make use of the each_row_array() method, defined in lines 89-110, which loops through all rows matching the SQL statement and invokes the code reference provided . In this case, our anonymous subroutine merely commits suicide if it sees the table we are looking for. Fortunately, an eval block around the each_row_array() rescues us before we go splat in the street. Having caught something, we check if it was the table name or some other error and return 0 or 1 accordingly.

The important thing to notice here is lines 101-104 where we use braces to begin a local scope and use local $_ to ensure we don't clobber the caller's copy of $_.

Lines 41-44 implements what is likely to be the most heavily called function in the DBA layer. It is a simple chain of command, forwarding its call on to DBI::do() which executes an SQL statement immediately with possible bind parameters.

Next we have a pair of twins, one_row_array() and one_row_hash() . These functions, in lines 48-84, prepare and execute a single SQL statement which is designed to return a single row of data. We make use of another eval block here to ensure that the statement handle is properly finished with in the event that the caller passed in an ill formed SQL statement. After fetching a solitary row of data, we check to see if there might be any additional rows and report an error. This catches well formed SQL statements which actually return more than one row of data. There's nothing worse than making a slight change to an SQL statement that was designed to return one row and suddenly have it start returning 10,000 rows and your program silently ignores it. The only difference between the twin methods are how they return the row of data to the caller, one returns it as an array reference and the other as a hash reference.

In lines 89-136, we come upon a slight deja vu in the implementation of the twin methods each_row_array() and each_row_hash(). We touched upon them lightly before, but now we will illuminate them some more. Once again, we pull out our eval block to protect the statement handle and then we proceed to loop through all the rows matching the SQL statement provided. During each loop iteration, we make a call to the provided code reference, $code, arranging for $_ to contain the fetched row as an array or hash reference, using the foreachstatement in line s 103 and 129. Notice how we accumulate the return value from these subroutine calls in $results for return to the caller, in case it is of interest. By virtue ot the eval block, the subroutine provided in $code can terminate the fetching of rows early by simply issuing a die. In this case, after finishing with the statement handle on lines 107 and 133, we simply send the die message back to the caller. This will serve to report runtime errors as well as planned premature termination in the $codereference.

Finally, in line 137 we are at one with the universe and broadcast that fact back to all users of our module.