Optimistic and Multi-version Concurrency Control Practice Problems

For each of the following protocols, describe aspects of practical applications that would lead you to suggest using the protocol, and aspects that would suggest not using the protocol:¹
- Two-phase locking
- Optimistic concurrency control
- Multi-version using optimistic concurrency control
- Multi-version using two-phase locking
Explain the phantom read phenomenon. Why may this phenomenon lead to an incorrect concurrent execution despite the use of the two-phase locking protocol?²
There are some situations (like an audit of bank transactions) where we would want to perform time-travel queries. That is, run a query on the database as it existed at some point in the past. What protocol would we want out database to be using this in case?³
Give two examples of OLTP workloads, one where optimistic concurrency control would be appropriate and one where two-phase locking would be the better choice. What kind of concurrency control would be best-suited for a typical OLAP workload?⁴
Explain why the read-committed isolation level ensures that schedules are cascade-free.⁵

Footnotes:

Two-phase locking: Use for applications with a mix of different kinds of transactions where some conflicts are expected. If there are large read-only transactions, other protocols would do better. Also, if deadlocks must be avoided at all costs, the two-phase locking is not a good choice.
Optimistic concurrency control: If the probability that two concurrently executing transactions conflict is low, this protocol an be used advantageously to get better concurrency and good response times with lower overheads (especially on lower isolation levels where unrepeatable reads are allowed). Not suitable under high contention, when a lot of wasted work will be done.
Multi-version using OCC: Use if OCC is appropriate but it is desirable for read requests to never wait. Shares the other disadvantages of the OCC protocol.
Multi-version using two-phase locking: This protocol allows read-only transactions to always commit without ever waiting. Update transactions follow 2PL, thus allowing schedules with conflicts solved by waiting rather than roll back. But the problem of deadlocks comes back, though read-only transactions cannot get involved in them. Keeping multiple versions adds space and time overheads though, therefore plain 2PL may be preferable in low-conflict situations.

The phantom phenomenon arises when, due to an insertion or deletion, two transactions logically conflict despite not locking any data items in common. The insertion case is described in the book. Deletion can also lead to this phenomenon. Suppose \(T_i\) deletes a tuple from a relation while \(T_j\) scans the relation. If \(T_i\) deletes the tuple and then \(T_j\) reads the relation, \(T_i\) should be serialized before \(T_j\). Yet there is no tuple that both \(T_i\) and \(T_j\) conflict on. An application of 2PL as just locking the accessed tuples in a relation could be incorrect. There is also an index or a relation data that has information about the tuples in the relation. This information is read by any transaction that scans the relation and modified by transactions that update, or insert into, or delete from the relation. Hence locking must also be performed on the index or relation data, and this will avoid the phantom phenomenon.

Multi-versioning is a good protocol to support time-travel queries, as it efficiently structures and maintains multiple previous versions of database objects.

⁴

OCC would be appropriate for an OLTP workload where users are updating their account information. Transactions would involve disjoint data (i.e., separate user accounts), and thus conflicts would be rare. For an OLTP workload of bank transactions between different accounts, 2PL would be a better choice. The same account could have multiple simultaneous transfers, and so conflicts could be more common. Since OLAP queries are typically read-only, OCC would be a good choice.

⁵

The read-committed isolation level ensures that a transaction reads only the committed data. A transaction \(T_i\) cannot read a data item \(X\) which has been modified by a yet uncommitted concurrent transaction \(T_j\). This makes \(T_i\) independent of the success or failure of \(T_j\). Hence, the schedules which follow read-committed isolation level become cascade-free.