Database Two Phase Commit

We recommend Three Tier Software Architectures as prerequisite reading for this technology description.

Purpose and Origin

Since the 1980s, two phase commit technology has been used to automatically control and monitor commit and/or rollback activities for transactions in a distributed database system. Two phase commit technology is used when data updates need to occur simultaneously at multiple databases within a distributed system. Two phase commits are done to maintain data integrity and accuracy within the distributed databases through synchronized locking of all pieces of a transaction. Two phase commit is a proven solution when data integrity in a distributed system is a requirement. Two phase commit technology is used for hotel and airline reservations, stock market transactions, banking applications, and credit card systems.

Technical Detail

As shown in Figure 1 , applying two phase commit protocols ensures that execution of data transactions are synchronized, either all committed or all rolled back (not committed) to each of the distributed databases.

Figure 1: Distributed Databases When Two Phase Commit Happens Simultaneously Through the Network

When dealing with distributed databases, such as in the client/server architecture, distributed transactions need to be coordinated throughout the network to ensure data integrity for the users. Distributed databases using the two phase commit technique update all participating databases simultaneously.

Unlike non-distributed databases (see Figure 2), where a single change is or is not made locally, all participating databases must all commit or all rollback in distributed databases, even if there is a system or network failure at any node. This is how the two phase commit process maintains system data integrity.

Figure 2: Non-Distributed Databases Make Only Local Updates

Two phase commit has two distinct processes that are accomplished in less than a fraction of a second:

The Prepare Phase, where the global coordinator (initiating database) requests that all participants (distributed databases) will promise to commit or rollback the transaction. (Note: Any database could serve as the global coordinator, depending on the transaction.)
The Commit Phase, where all participants respond to the coordinator that they are prepared, then the coordinator asks all nodes to commit the transaction. If all participants cannot prepare or there is a system component failure, the coordinator asks all databases to roll back the transaction.

Should there be a machine, network, or software failure during the two phase commit process, the two phase commit protocols will automatically and transparently complete the recovery with no work from the database administrator. This is done through use of pending transaction tables in each database where information about distributed transaction is maintained as they proceed through the two phase commit. Information in the pending transaction table is used by the recovery process to resolve any transaction of questionable status. This information can also be used by the database administrator to override automated recovery procedures by forcing a commit or a rollback to available participating databases.

Usage Considerations

Two phase commit protocols are offered in all modern distributed database products. However, the methods for implementing two phase commits may vary in the degree of automation provided. Some vendors provide a two phase commit implementation that is transparent to the application. Other vendors require specific programming of the calls into an application, and additional programming would be needed should rollback be a requirement; this situation would most likely result in an increase to program cost and schedule.

Maturity

The two phase commit protocol has been used successfully since the 1980s for hotel and airline reservations, stock market transactions, banking applications and credit card systems.

Costs and Limitations

There have been two performance issues with two phase commit:

If one database server is unavailable, none of the servers gets the updates. This is correctable if the software administrator forces the commit to the available participants, but if this is a recurring problem the administrator may not be able to keep up, thus causing system and network performance will deteriorate.
There is significant demand in network resources as the number of database servers to which data must be distributed increases. This is correctable through network tuning and correctly building the data distribution through database optimization techniques.

Currently, two phase commit procedures are vendor proprietary. There are no standards on how they should be implemented. X/Open has developed a standard that is being implemented in several transaction processing monitors, but it has not been adopted by the database vendors. Two phase commit proprietary protocols have been published by several vendors.

Alternatives

An alternative to updating distributed databases with a two phase commit mechanism is to update multiple servers using a transaction queuing approach where transactions are distributed sequentially. Distributing transactions sequentially raises the problem of users working with different version of the data. In military usage, this could result in planning sorties for targets that have already been eliminated.