Transactions

The basic idea behind transactions is to provide something that is


It is also important to be deadlock free, but since that doesn't help spell A.C.I.D. we list that seperately

How to achieve these things is interesting....

Begin/Commit/Abort
 
All transactions must begin with a begin-transaction statement.  This can be explicit, or it can implicitly happen when doing the database connection or at the time of the first database statement.
All successful transactions must have a 'commit' step.  This can be explicit, or it can happen automatically when the application does an 'exit'.
All unsuccessful transaction must have an 'abort' step.  This can happen explicity, when a transaction decides that everything should revert back to how it was (credit card application where credit limit is exceeded.)  Or it can happen implicity due to things like segmentation fault.  Or it can happen because the database system wanted to do so (deadlock detection)


Atomic

This can be done several ways.  Here's one:

Have a log file.  Any  time a change is made in a database entry write 'Changed entry XXX from YYY to ZZZ' in the log.  If a data item is deleted write 'Deleted entry XXX used to be YYY'.  If the transaction ever aborts, go back through the log and reverse all the entries.
 

Consistent
There are at least three ways.

1) Require that every application writer follow the business rules you set down.  The database just trusts that every transaction is correct.
2) Augment the database scheme with a set of 'triggers'.  Each trigger is a requirement that the database must follow.  For example, 'select A.name from employee A, division B where A.name = B.name'  cannot be null.  This would make sure that every employee has a division.  Cause an abort if when the transaction tries to call commit and the database violates these rules.  Most large databases support triggers but there is no standard way of expressing them.
3) Have a piece of software known as Middleware.  All transaction go to this deamon, and it can check for anything it wants.  It can also do other things like switch servers to load balance or deal with downed servers.  Generally you don't use middleware just for consistency checking, but if you need middleware for the other features you can add consistency checking easily


Independent

Be aware what the rule is.  The rule is 'There must be some ordering such that running the transactions one at a time results in the same I/O'.  The rule is not 'this ordering may not suprise the programmer.   In other words, if a programmer runs T1, T2, and T3 all at the same time, there must be some ordering of T1, T2, and T3 running one at a time that produces the same result.  However, ANY ordering is good enough.

There are at least two ways to do this.
1) Timestamping:  Every transaction gets a timestamp when it starts.  Every data item has the timestamp of the transaction that last changed it.  Every time a transaction accesses a data item, it checks the timestamp.  If the time stamp is further in the future than the transaction, it aborts.  Either the transaction was about to read data from the future, or it was about to erase data from the future (by overwriting it).

Timestamping is intellectualy easy, never causes deadlock, and results in transactions ordered in the way you might expect.  It sometimes causes needless deadlocks.  It's not used in practice.

2) Two Phase Locking:  Associated with each data item (or row or table) is a lock.  When a transaction wants to read a data item, it must get a read lock for the data item.  Any number of transactions can hold the read lock all sharing the same item.  When a transaction wants to get change a data item it muct get an exclusive lock (sometimes known as a write lock).  No transaction may release any lock until the transaction is ready to commit.  If the lock you want is not available, you must wait.

Two phase locking is harder, and can cause deadlocks.  Transactions can appear out of order.  It never causes needless aborts.  It is what almost all commercially succcessful databases use.

Durable
Durable is easy.  Any change to be made is written to a log file.  There are two kinds of log files.

Roll forward logs work like this.  Any time a data item is to be written, the write is delayed.  Instead 'Transaction WWW wants to change XXX to YYY' is placed in the log.  When the transaction wants to commit, write 'Transaction WWW commits'.  Then begin to make all the changes that transaction WWW wanted made.  During this step the system might die.  If so, when you reboot go through the log and for every transaction that has committed do all the changes it wanted.

Rollback logs work the other way.  Any time data is to be written, write it.  But before you do so in the log place 'Transaction WWW changed XXX from YYY to ZZZ'.  When the transaction commits, write 'Transaction WWW commits'.  If the system ever crashes, go through the log.  For every transaction that changed data items but did not commit, reverse the change.

For either type of log, the log must be on a different device than the actual database.  There must be a backup copy of the database available that is no older than the log file (The log file must go back at least as far as when the backup was made.)

The commit function cannot return until the log data for that transaction is written to permenant storage.  This is a HUGE slowdown.

Example of Two phase lock where the transaction apear out of order

T1 starts
T1 reads Scott's balance
gets $100
T2 Starts
T2 Reads Scotts balance
Gets $100
T2 Reads Mom's balance
Gets $100
T2 Writes Mom's balance
Writes $50
T2 Writes Scotts balance
Writes $150
T2 

 

MULTIPLE DATABASES