"Consistency is the property that guarantees that all parties interested in a given piece of information get updated each time it changes. In the context of massive distributed systems that now operate on a world scale, and looking into cloud computing in particular, consistency needs to be traded for other important properties such as performance and availability." This paper discusses the various trade-offs, the resulting consistency models, the extent to which humans can tolerate these inconsistencies and how various systems cope with them.
As cloud computing applications are growing and becoming massively distributed, data is stored on many servers. Achieving high availability and scalable performance requires some sort of data replication technique. However data replication is not without challenges that need to be addressed, especially with regards to consistency. Data needs to be updated in several locations and a problem thus arises when one or more of these locations are temporarily not accessible. Section 2 provides various definitions of consistency and section 3 discusses the concept of cloud computing. Section 4 goes into detail on consistency trade-offs and describes different consistency models and associated levels of inconsistency. Section 5 looks at the human perspective in relating to inconsistencies, what can be tolerated and how much. Section 6 describes how systems deal with inconsistencies and handle them. We specifically look at two examples - Amazon Dynamo and Yahoo! PNUTS. We finally present our conclusion in section 7.
Consistency - Definition
From a database point of view, "Consistency states that only valid data will be written to the database. If, for some reason a transaction is executed that violates the database consistency rules, the entire transaction will be rolled back and the database will be restored to a state consistent with those rules. On the other hand, if a transaction successfully executes, it will take the database from one state that is consistent with the rules to another state that is also consistent with the rules" .Basically, this means that the output of a transaction is committed when the transaction abdicates the right to undo the changes made resulting in that output thereby making the new value available to all transactions.
However, the context of this paper is massively distributed systems where data is generally replicated to achieve high availability and improve performance. Here, the collection of replicas is said to be consistent if all the replicas are the same. This means that if a read operation is carried out at any of the replicas or copies; it will return the same result. Also if a replica is modified or updated, all other replicas will be updated as well no matter which replica the operation originated from. This type of consistency is most times referred to as strong or strict consistency. Other types are discussed in subsequent sections.
Cloud Computing - An Overview
"Cloud Computing is a style of computing in which IT related capabilities are provided "as a service", allowing users to access technology enabled services from the internet(referred to as the cloud) without knowledge of, expertise with, or control over the technology infrastructure that supports them." The focus is on sharing data and computations over a scalable network of nodes which are end users, data centers and web services. The main idea is to use the existing infrastructure in order to bring all feasible services to the cloud and make it possible to access those services regardless of time and location.
Based on the functionality offered, there are three main types of cloud computing services[x].
- Software-as-a-service (SaaS) - This offers already developed and successfully launched applications. Hence users just need a computer or server to download the application and access to the internet to run the application instead of purchasing the necessary hardware or software. Examples include Google Calendar
- Platform-as-a-service (PaaS) - This offers facilities required for the complete development and delivery of web applications and services through the internet. So it basically offers an API which can be used by the application developer. Examples include Google App Engine
- Infrastructure-as-a-service (IaaS) - This provides an environment for the delivery of infrastructure. Hence a customer purchases required resources through an outsourced service rather than purchasing servers, hardware, storage and networking components. Examples include Amazon EC2
Generally, the main drivers of cloud computing are economies of scale, reduction of total IT spend without compromising quality and gaining flexibility and speed in implementation.
Consistency Trade-offs and its levels
As cloud computing applications are growing and becoming massively distributed, the number of servers storing the same data is growing. This replication of data is to enhance availability and reliability so that data can still be accessed even when one system or more is unavailable and to improve performance as multiple copies of data help scale a system to larger numbers of clients and geographic areas. However with high availability and performance comes a price, once data is updated in any replica, others have to be updated as well. As such tradeoffs need to be made.
Brewer, stipulates in what he calls the CAP(Consistency, Availability, Partition Tolerance) theorem that it is impossible for a distributed system to be simultaneously consistent, available and partition tolerant (tolerant to network connectivity losses) and at any given point in time one can only achieve at most two of these properties. It is of importance to note that in large scale distributed systems, network partitions are inevitable so the real trade-off is between consistency and availability. We are thus left with two choices; making consistency a priority which would mean that the system would be unavailable under certain conditions or relaxing consistency so as to allow the system remain highly available and would also mean that consistency cannot be guaranteed under certain conditions.
There are two basic types of consistency models[x] namely data centric and client centric. A consistency model is basically a set of rules to be obeyed by processes while accessing data.
Data Centric Model: This type of model concentrates on consistency from a system wide perspective of the storage system. It assumes that concurrent processes may be updating the storage system. This model can be further broken into two types - models that do not use synchronization operations and those that do. Examples include strict consistency, causal consistency, and sequential consistency.
Client Centric: This type of model concentrates on consistency from a single client perspective with respect to the data stored by that client. It assumes that concurrent updates can be easily resolved or are not being made at all. Examples include monotonic read, monotonic write, read your writes and writes follow read.
Also consistency can be seen from two perspectives namely: client/developer view which has to do with how they observe updates and from the server view which involves the flow of updates through the system and the consistency guarantees with respect to these updates
- Strict Consistency- This is the strictest possible model and it does not use synchronization operations. It guarantees that if any data is being accessed, the value returned must correspond to the result of the most recent update to that data. However, this is impossible to implement in a distributed system because the absolute time ordering of all shared access matters.
- Weak Consistency- This model uses synchronization operations to synchronize all local copies of the storage system. This involves using synchronization variables (which can be seen by all processes in the same order) to propagate writes to and from a machine at appropriate points. It does not guarantee that subsequent accesses will return the updated value as this depends on a number of conditions that need to be met. Basically, access to the synchronization variable is not allowed until all pending write operations are completed and no new read/write operation is allowed if there is an ongoing synchronization operation.
- Eventual Consistency - This is a specific form of weak consistency. It guarantees that if no new updates are made to a particular data, eventually all accesses to that data will return the last updated value. Eventual consistency works just fine for replicated data if clients always access the same replicas but poses a problem when different replicas are involved. This problem can however be lightened by introducing client centric consistency models. The two most desirable are monotonic reads and read your writes. Monotonic reads guarantee that if a process reads the value of a data item, subsequent accesses by that process will return the same value or a more recent value while read your writes guarantee that if a process updates a data item, subsequent accesses by the same process will always return the updated value.
First a few definitions,
- N = the number of nodes that store replicas of the data
- W = the number of replicas that need to acknowledge the receipt of the update before the update completes
- R = the number of replicas that are contacted when a data object is accessed through a read operation
Strong consistency can be guaranteed if W+R>N as this means there is always an overlap between the write and read sets. On the other hand Weak or Eventual consistency occurs if W+R <= N. Here the system is vulnerable to reading from nodes that have not yet received updates. These configurations however depend on the focus of the system. If the focus is fault tolerance, the configuration is usually set to N=3, W=2, and R=2. In the case of consistency, it is set to W=N for updates.
Tolerating Inconsistencies - Human Perspective
The semantics of accessing cloud based services as perceived by developers and end users is governed to a large extent by consistency. Consistency issues are particularly relevant to the distributed computing infrastructure services, such as data storage. The most stringent consistency semantics, known as serializability or strong consistency  globally orders the service requests and presents them as occurring in a global sequence. This can be explained in the following example. Suppose an account has an initial balance of 0. Robin deposits 10 and May deposits 20. These deposits both happen concurrently. If Dan checks the account balance twice and discovers it first to be 20 and then 30, then no user will ever see 10 as a valid balance since clearly May's deposit gets sequenced before Robin's. Strong consistency requires maintaining a global agreement about the command ordering of these service requests to achieve this imaginary global sequence.
Since the focus of cloud computing services is more on availability, consequently reaching such global agreement may not be feasible. We are thus left with the option of managing replicated data by embracing inconsistency using relaxed or weak consistency models in order to allow services remain highly available even as partitions occur. In his article, Vogel  gave two reasons why inconsistency has to be tolerated namely;
- The improvement of read and write performance under highly concurrent conditions
- The handling of partitions in scenarios where part of the system is rendered unavailable even if all nodes are running
He however emphasized that whether or not these inconsistencies are acceptable depends on the user. Thus the question arises "What levels of inconsistency as humans will we be able to tolerate?" We definitely need to put into consideration how user experience is affected. With web user applications such as customer shopping carts, email search and social network queues, a certain amount of inconsistency may be tolerated to speed up the user's perceived time of completing an operation. For instance if a facebook user posts new pictures on a friend's wall or a yahoo user changes an avatar or invites several friends to connect, little or no harm is done(but no doubt affects user experience) if the facebook user does not immediately see the post made or if the new avatar is not visible to one friend. Another example is if a new mail is visible after immediately refreshing an email program.
However, the case is very different with financial and medical applications where the need for strong consistency is paramount. Financial applications are highly transactional and automated and as such transactional consistency semantics need to be applied. Medical applications are safety critical and can lead to the loss of lives if the right information is not obtained. What would happen if withdrawals were allowed in the initial bank account example or if an incorrect medical data record were to be used when providing diagnosis for a patient is left to one's imagination. This would definitely be an inconsistency users would not tolerate.
Tolerating Inconsistencies - System Perspective
In the previous section, we discussed the human perspective to inconsistencies. In this section, we discuss how systems cope with and handle inconsistencies. We specifically look at two big players at the fore front of cloud computing (Amazon Dynamo and Yahoo! PNUTS) and compare their respective consistency models.
Dynamo  is a worldwide ecommerce platform that provides persistent data storage to other services running on Amazon's distributed computing. With respect to the CAP theorem discussed earlier, Amazon decided to weaken their consistency requirements because availability was more important. They chose the eventual consistency model for Dynamo. The problem with eventual consistency is ensuring that all replicas apply the same updates in a consistent order. This problem basically lies in concurrent updates, those that have not been completed before a new one is started as there is no way to tell which update issued first. Amazon's design decision was thus to allow inconsistent values from concurrent updates to enter the system stored side by side and then force the client application to reconcile them and update them back to Dynamo.
DeCandia  further describes the shopping cart service. From a business view, it is of essence if additions to the cart do not get lost as they impact sales but of less importance if deletions are not lost as users would most likely correct those themselves. Reconciliation of conflicting states is left to the cart application and done by merging the states. This guarantees that no addition is lost. However when an automatic reconciliation cannot be made, Dynamo makes use of vector clocks. The idea is that every replica keeps track of the number of updates it receives from other replicas. Whenever an update is made, the replica that handles the update increases its update counter in the vector clock and sends the new clock value to other replicas. In the case where a replica receives concurrent updates from two replicas, it compares the vector clocks to see which update precedes the other. Dynamo also ensures that all versions that are committed to the system are returned when read. This is made possible by a technique known as the quorum assembly.
PNUTS  is a massively parallel and distributed database system for Yahoo!'s web applications. Like Dynamo, PNUT has traded-off consistency for availability but have chosen to implement a model known as per record timeline consistency which is between strong consistency and eventual consistency. Yahoo is of the view that eventual consistency is too weak. This is reflected in the example given in their paper where a user updates his profile twice in quick succession. The first update is the removal of his mother from his photo access list and the second update is the upload of his spring break photos. In an eventually consistent scenario, the second update could be applied to some replicas before the first one because as earlier stated there is no guarantee that updates will be seen in the order they were instantiated. This would mean that the mother might mistakenly be privy to photos that her son did not want her to see. The most glaring solution is thus to figure out a way of totally ordering updates so that there is a guarantee that an update is received only after previous ones have been completed.
This is what Yahoo thus guarantees with its timeline consistency model which it implements with the help of the Yahoo! Message Broker (YMB). The YMB guarantees reliable and a totally ordered delivery of messages, however this is only at geographical locations. As such, there is a possibility that updates to two separate records might arrive in different orders at different replicas. Basically the way it works is that a replica is chosen as the master. Updates go only to the master and are asynchronously propagated to all other replicas and reads can go to the replicas or the master. Also every record is versioned and clients can compare different versions through API calls with varying consistency level guarantees (read or write). These include;
- Read-any: which returns a stale version of the record that is valid from the record's history.
- Read-critical (required_version): also known as required version, this returns the required version or a slightly newer version of the record.
- Read-latest: which returns the latest copy of the record that reflects all writes that have been succeeded.
- Write: This offers the same Acid guarantees as that of a transaction with a single write operation. It is useful for operations like a user's profile status update.
- Test-and-set-write (required_version): This performs the write only if the requested write is the same as the required version. This ensures the proper serialization of concurrent transactions
In general, the timeline consistency model of Yahoo is stronger than the eventual consistency model of Amazon which allows conflicting data to exist side by side whereas with PNUTS consistent ordering of updates, the only inconsistency that may be observed may be staleness of data which users may be able to control by requesting for the most recent version or a potentially stale version.
- Brewer, E.A., Towards robust distributed systems (abstract), PODC '00: Proceedings of the nineteenth annual ACM symposium on Principles of distributed computing, 2000, ACM, 7, 2000.
- DeCandia, G., et al, Dynamo: Amazon's highly available key-value store, SOSP '07: Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles, 2007, ACM, 205-220, 2007.
- Vogels, W., Eventually consistent, Commun.ACM, 52(1), 40-44, 2009.
- Karger, D., et al, Consistent hashing and random trees: distributed caching protocols for relieving hot spots on the World Wide Web, STOC '97: Proceedings of the twenty-ninth annual ACM symposium on Theory of computing, 1997, ACM, 654-663, 1997.
- Andrew S. Tanenbaum and Maarten van Steen, Distributed Systems: Principles and Paradigms, [e-book]. Upper Saddle River, NJ, Prentice Hall, 2002.
- Gray, J., and Reuter, A. 1993. Isolation concepts. In Transaction Processing: Concepts and Techniques, chap. 7. Morgan Kaufmann.
- Brian F. Cooper et al, PNUTS: Yahoo!'s Hosted Data Serving Platform
- Fouquet, M., Niedermayer, H. and Carle, G., Cloud computing for the masses, U-NET '09: Proceedings of the 1st ACM workshop on User-provided networking: challenges and opportunities, 2009, ACM, 31-36, 2009.