Search This Blog

Thursday, April 28, 2011

Concurrency control and locking


The purpose of concurrency control is to prevent two different users (or two different connections by the same user) from trying to update the same data at the same time. Concurrency control can also prevent one user from seeing out-of-date data while another user is updating the same data.

The following examples explain why concurrency control is needed. For both examples, suppose that your checking account contains $1,000. During the day you deposit $300 and spend $200 from that account. At the end of the day your account should have $1,100.

Example 1: No concurrency control
At 11:00 AM, bank teller #1 looks up your account and sees that you have $1,000. The teller subtracts the $200 check, but is not able to save the updated account balance ($800) immediately.
At 11:01 AM, another teller #2 looks up your account and still sees the $1,000 balance. Teller #2 then adds your $300 deposit and saves your new account balance as $1,300.
At 11:09 AM, bank teller #1 returns to the terminal, finishes entering and saving the updated value that is calculated to be $800. That $800 value writes over the $1300.
At the end of the day, your account has $800 when it should have had $1,100 ($1000 + 300 - 200).
Example 2: Concurrency control
When teller #1 starts working on your account, a lock is placed on the account.
When teller #2 tries to read or update your account while teller #1 is updating your account, teller #2 will not be given access and gets an error message.
After teller #1 has finished the update, teller #2 can proceed.
At the end of the day, your account has $1,100 ($1000 - 200 + 300).
In Example 1, the account updates are done simultaneously rather than in sequence and one update write overwrites another update. In Example 2, to prevent two users from updating the data simultaneously (and potentially writing over each other's updates), the system uses a concurrency control mechanism.

concurrency control mechanisms

The pessimistic concurrency control mechanism is based on locking. A lock is a mechanism for limiting other users' access to a piece of data. When one user has a lock on a record, the lock prevents other users from changing (and in some cases reading) that record. Optimistic concurrency control mechanism does not place locks but prevents the overwriting of data by using timestamps.

PESSIMISTIC vs. OPTIMISTIC concurrency control

Locks and lock modes
A lock is a mechanism for preventing two or more users from doing conflicting operations at the same time. Operations conflict if at least one of the operations involves updating the data (via UPDATE, DELETE, INSERT, ALTER TABLE, and so on). If all the operations are read-only operations (such as SELECT), then there is no conflict.

Pessimistic concurrency control (or pessimistic locking) is called "pessimistic" because the system assumes the worst — it assumes that two or more users will want to update the same record at the same time, and then prevents that possibility by locking the record, no matter how unlikely conflicts actually are.
The locks are placed as soon as any piece of the row is accessed, making it impossible for two or more users to update the row at the same time. Depending on the lock mode (shared, exclusive, or update), other users might be able to read the data even though a lock has been placed. For more details on the lock modes, see Lock modes: shared, exclusive, and update.

Optimistic concurrency control (or optimistic locking) assumes that although conflicts are possible, they will be very rare. Instead of locking every record every time that it is used, the system merely looks for indications that two users actually did try to update the same record at the same time. If that evidence is found, then one user's updates are discarded and the user is informed.
For example, if User1 updates a record and User2 only wants to read it, then User2 simply reads whatever data is on the disk and then proceeds, without checking whether the data is locked. User2 might see slightly out-of-date information if User1 has read the data and updated it, but has not yet committed the transaction.

Choosing concurrency control mechanism
In most scenarios, optimistic concurrency control is more efficient and offers higher performance. When choosing between pessimistic and optimistic locking, consider the following:

Pessimistic locking is useful if there are a lot of updates and relatively high chances of users trying to update data at the same time.
For example, if each operation can update a large number of records at a time (the bank might add interest earnings to every account at the end of each month), and two applications are running such operations at the same time, they will have conflicts.

Pessimistic concurrency control is also more appropriate in applications that contain small tables that are frequently updated. In the case of these so-called hotspots, conflicts are so probable that optimistic concurrency control wastes effort in rolling back conflicting transactions.

Optimistic locking is useful if the possibility for conflicts is very low – there are many records but relatively few users, or very few updates and mostly read-type operations.

Locks and lock modes
A lock is a mechanism for preventing two or more users from doing conflicting operations at the same time. Operations conflict if at least one of the operations involves updating the data (via UPDATE, DELETE, INSERT, ALTER TABLE, and so on). If all the operations are read-only operations (such as SELECT), then there is no conflict.

Concurrency control and locking


The purpose of concurrency control is to prevent two different users (or two different connections by the same user) from trying to update the same data at the same time. Concurrency control can also prevent one user from seeing out-of-date data while another user is updating the same data.

The following examples explain why concurrency control is needed. For both examples, suppose that your checking account contains $1,000. During the day you deposit $300 and spend $200 from that account. At the end of the day your account should have $1,100.

Example 1: No concurrency control
At 11:00 AM, bank teller #1 looks up your account and sees that you have $1,000. The teller subtracts the $200 check, but is not able to save the updated account balance ($800) immediately.
At 11:01 AM, another teller #2 looks up your account and still sees the $1,000 balance. Teller #2 then adds your $300 deposit and saves your new account balance as $1,300.
At 11:09 AM, bank teller #1 returns to the terminal, finishes entering and saving the updated value that is calculated to be $800. That $800 value writes over the $1300.
At the end of the day, your account has $800 when it should have had $1,100 ($1000 + 300 - 200).
Example 2: Concurrency control
When teller #1 starts working on your account, a lock is placed on the account.
When teller #2 tries to read or update your account while teller #1 is updating your account, teller #2 will not be given access and gets an error message.
After teller #1 has finished the update, teller #2 can proceed.
At the end of the day, your account has $1,100 ($1000 - 200 + 300).
In Example 1, the account updates are done simultaneously rather than in sequence and one update write overwrites another update. In Example 2, to prevent two users from updating the data simultaneously (and potentially writing over each other's updates), the system uses a concurrency control mechanism.

concurrency control mechanisms

The pessimistic concurrency control mechanism is based on locking. A lock is a mechanism for limiting other users' access to a piece of data. When one user has a lock on a record, the lock prevents other users from changing (and in some cases reading) that record. Optimistic concurrency control mechanism does not place locks but prevents the overwriting of data by using timestamps.

PESSIMISTIC vs. OPTIMISTIC concurrency control

Locks and lock modes
A lock is a mechanism for preventing two or more users from doing conflicting operations at the same time. Operations conflict if at least one of the operations involves updating the data (via UPDATE, DELETE, INSERT, ALTER TABLE, and so on). If all the operations are read-only operations (such as SELECT), then there is no conflict.

Pessimistic concurrency control (or pessimistic locking) is called "pessimistic" because the system assumes the worst — it assumes that two or more users will want to update the same record at the same time, and then prevents that possibility by locking the record, no matter how unlikely conflicts actually are.
The locks are placed as soon as any piece of the row is accessed, making it impossible for two or more users to update the row at the same time. Depending on the lock mode (shared, exclusive, or update), other users might be able to read the data even though a lock has been placed. For more details on the lock modes, see Lock modes: shared, exclusive, and update.

Optimistic concurrency control (or optimistic locking) assumes that although conflicts are possible, they will be very rare. Instead of locking every record every time that it is used, the system merely looks for indications that two users actually did try to update the same record at the same time. If that evidence is found, then one user's updates are discarded and the user is informed.
For example, if User1 updates a record and User2 only wants to read it, then User2 simply reads whatever data is on the disk and then proceeds, without checking whether the data is locked. User2 might see slightly out-of-date information if User1 has read the data and updated it, but has not yet committed the transaction.

Choosing concurrency control mechanism
In most scenarios, optimistic concurrency control is more efficient and offers higher performance. When choosing between pessimistic and optimistic locking, consider the following:

Pessimistic locking is useful if there are a lot of updates and relatively high chances of users trying to update data at the same time.
For example, if each operation can update a large number of records at a time (the bank might add interest earnings to every account at the end of each month), and two applications are running such operations at the same time, they will have conflicts.

Pessimistic concurrency control is also more appropriate in applications that contain small tables that are frequently updated. In the case of these so-called hotspots, conflicts are so probable that optimistic concurrency control wastes effort in rolling back conflicting transactions.

Optimistic locking is useful if the possibility for conflicts is very low – there are many records but relatively few users, or very few updates and mostly read-type operations.

Locks and lock modes
A lock is a mechanism for preventing two or more users from doing conflicting operations at the same time. Operations conflict if at least one of the operations involves updating the data (via UPDATE, DELETE, INSERT, ALTER TABLE, and so on). If all the operations are read-only operations (such as SELECT), then there is no conflict.

Precision math in Java



Faith has it that I work on some statistical software currently. One thing required when you work on statistical software is data computation precision. Precision on floating point numbers with Java is not so straightforward. While I understand this because of the fact that you squeeze infinite numbers of decimals in finite number of places (classical example of representing in floating point arithmetic 1/3), at the end of the day I just wanted to do 19.99+18.47==38.46.
But you don't always get what you wish for. So, instead of 38.46, Java gave me back 38.4599999994. This is, obviously, not equal to 38.46. To make the equality work, I just used the very nice and friendly BigDecimal class from the java.math package, like this:
BigDecimal.valueOf(19.99d).
         add(BigDecimal.valueOf(18.47d)).doubleValue()
After this, I was the happy owner of a working addition on doubles. Alas, the boss was just bored of my solution (quote: "Just make it work!!").
Check the java.math package for other stuff:http://java.sun.com/j2se/1.5.0/docs/api/java/math/package-summary.html

Object.wait() vs. Thread.sleep()

Another day in the office, another thread dump to be analyzed. We got a thread dump that had a pretty weird pattern: it was waiting on a monitor, but it was also locking the Object. Welcome to the fascinating world of deadlocks!
Scanning the code, everything seems fine. Unless.... the Object.wait(time) method would void the synchronized block. Searching through the Java docs, we find:

* This method causes the current thread (call it T) to
    * place itself in the wait set for this object and then to relinquish
    * any and all synchronization claims on this object.

So the bloody wait method is really voiding the synchronized block contract. To put it into a simple scenario, we have a synchronized block surrounding the login section, in order to prevent multiple locking of the same User object. But, the User clicks the login button repeatedly and impatiently. The first Thread takes ownership, creates the user object and ends. The next ones try to lock the object, can't do that and call wait. And since wait() will give up ownership of the block, a new Thread may claim the synchronized region, thus resulting in deadlock.
The immediate solution to this problem is to useThread.sleep(time). The sleep method will keep the lock on the synchronized block, preventing attempts of overtaking. :)