Concurrency, Transactions, and Locking

Background

In routine practice, one must deal with concurrency and transaction management in the following contexts: 1) Relational databases; 2) Offline Concurrency, where the concurrency may span multiple database transactions, for example in a Web application; 3) Managing state in a multi-threaded environment.

Concurrency Problems

There are two basic kinds of concurrency problems: 1) Lost Updates; 2) Inconsistent Reads.

Lost Update Example: Two people are editting the same file. The person who saves his file last overwrites the changes made by the first person. The first person's updates are lost.

Inconsistent Read: You are counting the total number of files in two directories. Let's say there are 20 files in the honeymoon folder and 30 files in the wedding folder to start. You list the number of files in the wedding folder, then you take a break. In the meantime, your wife adds 5 pictures to the honeymoon folder an 3 pictures to the wedding folder. You come back and count 33 pictures in the wedding folder. Your total is 20 + 33 = 55. Before your wife added the additional pictures, the total was 50, and afterward the total was 58, so 55 is an incosistent read caused by the concurrently reading from a folder while someone else is writing to that folder.

Optimistic and Pessimistic Concurrency Control

An optimistic lock is really a means of conflict detection. With optimistic locking, two processes can modify the same piece of data. However, if a process tries to commit modifications that over-write a previous process, the system forces a merge operation.

A pessimistic lock requires each process to lock data while it is being modified.

Inconsistent Reads

James adds a call to Customer::getOrders to a class in his application. Michelle modifies the getOrders method signature. James compiles and checks in his code. Michelle compiles and checks in her code (but she doesn't know about the new call to the getOrders method.

With a pick list of products, it doesn't matter if a new product appears in it after you start making changes, but a list of charges you're summarizing for a bill may be more important. In some cases, careful analysis of how the data is used is required. For example, a zip code may seem unimportant, but if where a person lives determines a tax calculation, then the address has to be controlled for concurrency.

Read/Write Lock

Temporal Reads

Deadlocks

ACID

Atomicity
Consistency
Isolation
Durability

Isolation Levels:

The ANSI/ISO SQL standard defines four levels of transaction isolation in terms of three phenomena that must be prevented between concurrent transactions. These undesirable phenomena are:

Dirty read: A transaction reads data written by concurrent uncommitted transaction.
Non-repeatable read: A transaction re-reads data it has previously read and finds that data has been modified by another committed transaction.

Phantom read: A transaction re-executes a query returning a set of rows that satisfy a search condition and finds that additional rows satisfying the condition has been inserted by another committed transaction.

The four isolation levels and the corresponding behaviors are described below:

Dirty Read Non-Repeatable Read Phantom Read

Read Uncommitted Possible Possible Possible

Read Committed Not Possible Possible Possible

Repeatable Read Not Possible Not Possible Possible

Serializable Not Possible Not Possible Not Possible

	Dirty Read	Non-Repeatable Read	Phantom Read
Read Uncommitted	Possible	Possible	Possible
Read Committed	Not Possible	Possible	Possible
Repeatable Read	Not Possible	Not Possible	Possible
Serializable	Not Possible	Not Possible	Not Possible

Read Committed:

Read committed means that a transaction will only read data that has been committed by some other transaction. Say a a transaction W is in the process of updating data field from "OLD" to "NEW", but has not yet committed. Another transaction R that reads this data value will still see "OLD". With Read Committed, you are guaranteed that you won't be reading dirty data, that is data that has not been committed. However, if the R transaction reads the value "OLD", then reads the same value after W has committed, it will get "NEW" the second time around.

Repeatable Read:

The Repeatable Read isolation level solves this problem of un-repeatable reads. Once a particular data value has been read, its value will not be affected by other transactions, even ones that commit. Using the same example, even after W commits, R will still see "OLD" the second time it queries that value.

With Repeatable Read, we know that a value that is read at the beginning of a transaction will be the same no matter how many times it is subsequently re-read during the course of that transaction, even if another transaction commits a change to that value (or for that matter deletes it). However, newly committed insertions may still cause inconsistent reads. Suppose transaction R executes a query that looks for all the red cars in a given table. Initially this query may return, say 3 records. The values of those records won't change if the transaction re-reads those records, and if another transaction deletes some of them, a re-read will still not be affected -- all 3 records will still show up. However, there is still a potential for inconsistent reads. If transaction W commits two more red cars, then re-running the query in R will now return 5 cars instead of 3! These extra two records are called phantoms, because as far as the R transaction is concerned, there should still only be 3 cars.

Serializable:

The ultimate isolation level is Serializable. This level of isolation guarantees that phantoms will not be produced. If R reads 3 red cars at the beginning of a transaction, it will continue to read 3 red cars no matter what transaction W does. Thus all transaction are completely isolated. The term "serializable" refers to the fact that the behaviour of concurrently executed transactions is the same as if they were executed one after another, with no overlap. The disadvantage of Serializable is that it can cause performance problems or timeouts if the transactions are long.

Optimisitic Offline Lock

Pessimistic Offline Lock

Coarse Grained Lock