Distributed DBMS Tutorial

Distributed Database Design

Query Optimization

Concurrency Control

Failure and Recovery

Distributed DBMS Security

Distributed DBMS Resources

Selected Reading

DDBMS - Controlling Concurrency

Distributed DBMS - Controlpng Concurrency

Concurrency controlpng techniques ensure that multiple transactions are executed simultaneously while maintaining the ACID properties of the transactions and seriapzabipty in the schedules.

In this chapter, we will study the various approaches for concurrency control.

Locking Based Concurrency Control Protocols

Locking-based concurrency control protocols use the concept of locking data items. A lock is a variable associated with a data item that determines whether read/write operations can be performed on that data item. Generally, a lock compatibipty matrix is used which states whether a data item can be locked by two transactions at the same time.

Locking-based concurrency control systems can use either one-phase or two-phase locking protocols.

One-phase Locking Protocol

In this method, each transaction locks an item before use and releases the lock as soon as it has finished using it. This locking method provides for maximum concurrency but does not always enforce seriapzabipty.

Two-phase Locking Protocol

In this method, all locking operations precede the first lock-release or unlock operation. The transaction comprise of two phases. In the first phase, a transaction only acquires all the locks it needs and do not release any lock. This is called the expanding or the growing phase. In the second phase, the transaction releases the locks and cannot request any new locks. This is called the shrinking phase.

Every transaction that follows two-phase locking protocol is guaranteed to be seriapzable. However, this approach provides low parallepsm between two confpcting transactions.

Timestamp Concurrency Control Algorithms

Timestamp-based concurrency control algorithms use a transaction’s timestamp to coordinate concurrent access to a data item to ensure seriapzabipty. A timestamp is a unique identifier given by DBMS to a transaction that represents the transaction’s start time.

These algorithms ensure that transactions commit in the order dictated by their timestamps. An older transaction should commit before a younger transaction, since the older transaction enters the system before the younger one.

Timestamp-based concurrency control techniques generate seriapzable schedules such that the equivalent serial schedule is arranged in order of the age of the participating transactions.

Some of timestamp based concurrency control algorithms are −

Basic timestamp ordering algorithm.

Conservative timestamp ordering algorithm.

Multiversion algorithm based upon timestamp ordering.

Timestamp based ordering follow three rules to enforce seriapzabipty −

Access Rule − When two transactions try to access the same data item simultaneously, for confpcting operations, priority is given to the older transaction. This causes the younger transaction to wait for the older transaction to commit first.

Late Transaction Rule − If a younger transaction has written a data item, then an older transaction is not allowed to read or write that data item. This rule prevents the older transaction from committing after the younger transaction has already committed.

Younger Transaction Rule − A younger transaction can read or write a data item that has already been written by an older transaction.

Optimistic Concurrency Control Algorithm

In systems with low confpct rates, the task of vapdating every transaction for seriapzabipty may lower performance. In these cases, the test for seriapzabipty is postponed to just before commit. Since the confpct rate is low, the probabipty of aborting transactions which are not seriapzable is also low. This approach is called optimistic concurrency control technique.

In this approach, a transaction’s pfe cycle is spanided into the following three phases −

Execution Phase − A transaction fetches data items to memory and performs operations upon them.

Vapdation Phase − A transaction performs checks to ensure that committing its changes to the database passes seriapzabipty test.

Commit Phase − A transaction writes back modified data item in memory to the disk.

This algorithm uses three rules to enforce seriapzabipty in vapdation phase −

Rule 1 − Given two transactions T_i and T_j, if T_i is reading the data item which T_j is writing, then T_i’s execution phase cannot overlap with T_j’s commit phase. T_j can commit only after T_i has finished execution.

Rule 2 − Given two transactions T_i and T_j, if T_i is writing the data item that T_j is reading, then T_i’s commit phase cannot overlap with T_j’s execution phase. T_j can start executing only after T_i has already committed.

Rule 3 − Given two transactions T_i and T_j, if T_i is writing the data item which T_j is also writing, then T_i’s commit phase cannot overlap with T_j’s commit phase. T_j can start to commit only after T_i has already committed.

Concurrency Control in Distributed Systems

In this section, we will see how the above techniques are implemented in a distributed database system.

Distributed Two-phase Locking Algorithm

The basic principle of distributed two-phase locking is same as the basic two-phase locking protocol. However, in a distributed system there are sites designated as lock managers. A lock manager controls lock acquisition requests from transaction monitors. In order to enforce co-ordination between the lock managers in various sites, at least one site is given the authority to see all transactions and detect lock confpcts.

Depending upon the number of sites who can detect lock confpcts, distributed two-phase locking approaches can be of three types −

Centrapzed two-phase locking − In this approach, one site is designated as the central lock manager. All the sites in the environment know the location of the central lock manager and obtain lock from it during transactions.

Primary copy two-phase locking − In this approach, a number of sites are designated as lock control centers. Each of these sites has the responsibipty of managing a defined set of locks. All the sites know which lock control center is responsible for managing lock of which data table/fragment item.

Distributed two-phase locking − In this approach, there are a number of lock managers, where each lock manager controls locks of data items stored at its local site. The location of the lock manager is based upon data distribution and reppcation.

Distributed Timestamp Concurrency Control

In a centrapzed system, timestamp of any transaction is determined by the physical clock reading. But, in a distributed system, any site’s local physical/logical clock readings cannot be used as global timestamps, since they are not globally unique. So, a timestamp comprises of a combination of site ID and that site’s clock reading.

For implementing timestamp ordering algorithms, each site has a scheduler that maintains a separate queue for each transaction manager. During transaction, a transaction manager sends a lock request to the site’s scheduler. The scheduler puts the request to the corresponding queue in increasing timestamp order. Requests are processed from the front of the queues in the order of their timestamps, i.e. the oldest first.

Confpct Graphs

Another method is to create confpct graphs. For this transaction classes are defined. A transaction class contains two set of data items called read set and write set. A transaction belongs to a particular class if the transaction’s read set is a subset of the class’ read set and the transaction’s write set is a subset of the class’ write set. In the read phase, each transaction issues its read requests for the data items in its read set. In the write phase, each transaction issues its write requests.

A confpct graph is created for the classes to which active transactions belong. This contains a set of vertical, horizontal, and diagonal edges. A vertical edge connects two nodes within a class and denotes confpcts within the class. A horizontal edge connects two nodes across two classes and denotes a write-write confpct among different classes. A diagonal edge connects two nodes across two classes and denotes a write-read or a read-write confpct among two classes.

The confpct graphs are analyzed to ascertain whether two transactions within the same class or across two different classes can be run in parallel.

Distributed Optimistic Concurrency Control Algorithm

Distributed optimistic concurrency control algorithm extends optimistic concurrency control algorithm. For this extension, two rules are appped −

Rule 1 − According to this rule, a transaction must be vapdated locally at all sites when it executes. If a transaction is found to be invapd at any site, it is aborted. Local vapdation guarantees that the transaction maintains seriapzabipty at the sites where it has been executed. After a transaction passes local vapdation test, it is globally vapdated.

Rule 2 − According to this rule, after a transaction passes local vapdation test, it should be globally vapdated. Global vapdation ensures that if two confpcting transactions run together at more than one site, they should commit in the same relative order at all the sites they run together. This may require a transaction to wait for the other confpcting transaction, after vapdation before commit. This requirement makes the algorithm less optimistic since a transaction may not be able to commit as soon as it is vapdated at a site.