Bridging the Gap: Opportunities in Coordination-Avoiding Databases

22 Apr 2014

Background: I recently co-organized the Principles and Practice of Eventual Consistency workshop at EuroSys, where I also gave a talk on lessons from some of our recent research. This post contains a summary (in the form of my talk proposal). This is joint work with Alan Fekete, Ali Ghodsi, Mike Franklin, Joe Hellerstein, and Ion Stoica.

My slides are available on Speaker Deck.

Abstract: Weakly consistent systems surface a controversial tension between, on the one hand, availability, latency, and performance, and, on the other, programmability. We propose the concept of coordination avoidance as a unifying, underlying principle behind the former and discuss lessons from our recent experiences mitigating the latter.

Trouble in Paradise

Faced with the task of operating “always on” services and lacking sufficient guidance from the literature regarding alternative distributed designs and algorithms, many Internet service architects and engineers throughout the 2000s discarded traditional database semantics and transactional models in favor of weaker but less principled models: eventual consistency, few if any multi-object, multi-operation (i.e., transactional) guarantees, and ad-hoc application-specific compensation—collectively, much of “NoSQL” [1]. From a research perspective, this space of weaker models has proven to be a fertile area: new (or re-discovered), often esoteric, and almost always nuanced semantics present many opportunities for new systems design, optimizations, and formal study.

Unfortunately, these weaker models come with serious usability disadvantages: programmability suffers. Understanding the implications of non-serializable isolation models for end-user applications is difficult. Programmers have little practical guidance as to how to choose an appropriate model for their applications, and understanding the differences between models effectively requires graduate-level training in distributed systems and/or database theory [2]. Some members within the internet services industry that birthed the resurgence of interest in these semantics have begun a backlash against them: one recent and prominent industrial account unequivocally claims that “designing applications to cope with concurrency anomalies in their data is…ultimately not worth the performance gains” [15]. Statements like these (which, in our experience, enjoy some popularity among practitioners and considerable acceptance in the database community [17]) suggest that, as a research community centered on weak consistency, we are possibly failing to communicate and demonstrate the benefits achievable with these semantics, have underestimated the burden placed on programmers, or a combination of both.

Coordination-Free Execution: A Unifying Principle Behind “AP” Benefits

In tribute to the CAP Theorem [11] that widely popularized these trade-offs, much of the dialogue around weakly consistent models concerns the availability of operations under failures. Availability is an important property, but, in our opinion, a sole focus on availability undervalues the benefits of weak semantics. Daniel Abadi has successfully argued that, while “availability” is primarily relevant in the presence of failures, weakly consistent (“AP”) systems can also offer low latency [1]. Any replica can respond to any request, alleviating the need for many communication delays—in our experience, average LAN latencies can be up to 720x faster than those over WAN.

We would take Abadi’s position even further: weak consistency also allows aggressive scale-out, even at the level of a single data item—more servers can be added without communication between them. Modern, strongly consistent “NewSQL” systems can indeed provide horizontal scale-out using shared-nothing database replication techniques popularized in the 1980s [16]. However, especially for worst-case accesses, these systems are far from “as scalable as NoSQL” [15] systems offering weak isolation. In recent research, we have examined the throughput penalties associated with these “strong” models: in modern LAN and WAN networks, distributed serializable transactions face a worst-case throughput limits of 1200 and 12 read-write transactions per item per second, independent of implementation strategy [6]. Recent systems [10, 15] are no exception: operations over disjoint data items can proceed concurrently, increasing throughput, but conflicting operations over non-disjoint data items are limited by network latency. In contrast, appropriate implementations of weak consistency face no such overheads.

The three properties above—availability, low latency, and scale-out—are consequences of a more fundamental principle underlying weakly consistent systems: a lack of synchronous communication, or coordination, between concurrent operations. If operations can execute coordination-free, they can run concurrently, on any available resources, without communicating with or otherwise stalling concurrent operations. The cost of coordination is easily and simultaneously cast in the form of (minority) unavailability, latency (minimum 1 RTT), and throughput (maximum 1/RTT). Moreover, and more importantly, the concept of coordination-free execution is portable to a range of system architectures: whereas traditional formulations of availability are inherently tied to physical replication, coordination-freedom is a property of the execution strategy and is independent of physical deployment or topology. For example, a system providing clients with snapshot reads can effectively act as a coordination-free “replicated” system even if implemented by a set of linearizable multi-versioned masters [7]. Judicious use of strong semantics in correct application execution equates to coordination-avoidance [6]: the use of as little coordination as possible.

Bridging the Gap: Experiences Applying Coordination Avoidance

As F1’s authors highlight above, the decision to consider coordination-avoiding algorithms or not requires a cost-benefit judgment [8]: will performance, availability, or latency benefits outweigh the cost of ascertaining whether weak models are sufficient? In a sober assessment, many applications will likely be able to (over-)pay for “strong consistency”: single-site operations are inexpensive [16], while improvements in datacenter networks [19] lower the cost for non-geo-replicated systems. Yet, a large class of applications—for example, non-partitionable applications [9], applications with high mutation rates (i.e., write contention) [18], and geo-replicated applications [20]—will continue to be sensitive to extraneous coordination costs and will likely necessitate further study.

Identifying and serving this latter class of applications is paramount to ensuring the future adoption of coordination-avoiding algorithms. While represents a difficult task, we offer three examples from our recent research:

High-value, well-specified applications are ripe for optimization. We have found success in coordination-avoiding optimization of high-value workloads. As an example, we recently combined our recent work on RAMP transactions with our recent results on necessary and sufficient conditions for coordination-free execution to attain an 25x improvement in throughput on the traditionally challenging TPC-C OLTP benchmark [6, 18]. As the industry- and academic-standard benchmark for transactional performance, TPC-C is accompanied by a rigorous specification for compliance, which acted as a correctness specification in our coordination analysis and was instrumental guiding our implementation strategy. Beyond TPC-C, we have encountered few database workloads and integrity constraints that require coordination for all queries: rather, many queries are amenable to coordination-free execution, while a handful (like TPC-C New-Order ID assignment) require coordination for correctness. The resulting challenge is two-fold: first, identify the operations that do require coordination (ideally few or none), and second, determine an appropriate coordination-free execution plan for those that do not.
Applications on ACID databases (surprisingly often) do not use ACID transactions. Traditional databases also have an equivalent of weak consistency: although they were not explicitly developed for distributed environments, databases provide a range of “weak isolation” models, such as Read Committed and Repeatable Read. Many “ACID” databases today adopt these weak isolation guarantees by default (only 3 of 18 in a survey we recently performed) and sometimes as the strongest level supported (e.g., Oracle 11G) [5]. Existing applications deployed on these systems are necessarily either tolerant of or otherwise must for compensate for these weak semantics, hinting at opportunities for optimization. Moreover, while many of these weak semantics are not typically implemented in a coordination-free manner (e.g., relying on locking or validation protocols)—a remnant of their single-node origins—they are often, in fact, implementable without resorting to coordination. We recently classified commonly-deployed isolation models as achievable via Highly Available Transactions or not [5]: existing applications deployed on “HAT” isolation models are excellent candidates for study in coordination-free environments.
ACID databases are not built using ACID transactions. High-performance database internals are maintained using specialized, highly optimized algorithms that are carefully designed to maximize safe concurrency [12] (e.g., in the case of indexing, exotic data structures like B-link trees [13]). The database designer does not use serializable access to the database data structures for at least two reasons. First, doing so would be prohibitively expensive, and, second, the expert designer does not need to: she has a well-defined specification (e.g., secondary index lookup behavior) that she can use to ensure correctness (without end-user intervention).

In a distributed database system, coordination-avoiding techniques are similarly applicable to internal data structure implementation. As experts of both databases and fast distributed algorithms, we can ensure that the anomalies of our weakly consistent but fast algorithms do not interfere with application-level correctness; we can encapsulate the side-effects of weak semantics behind well-defined (and existing) interfaces. Our recent work on Read Atomic Multi-Partition (RAMP) transactions was developed in this context and, as motivating use cases, focuses on foreign key constraint maintenance, distributed secondary indexing, and materialized view maintenance [7]. Our focus on coordination avoidance yielded algorithms that consistently outperformed alternatives, especially those based on synchronous coordination such as locking.

Overall, we have found success in i.) focusing on existing applications and ii.) incorporating existing specifications to enable coordination-avoiding execution. Towards these goals, our ongoing research directly incorporates application-level invariants (derived from real-world application and the SQL language) for analysis under a necessary and sufficient property for coordination-free execution [6]. The use of application-level invariants is key to safely maximizing concurrency without requiring programmer expertise in weak isolation models. By focusing on existing invariants and specifications as above, we can provide tangible improvements without necessarily affecting end-user experience.

A Coordinated Future

The continued success of weakly consistent systems requires a demonstration of and focus on utility. Delivering an understanding of exactly why these weakly consistent semantics can provide (in a fundamental sense) greater availability, lower latency, and higher throughput is paramount; our proposed focus on coordination avoidance is our attempt at providing a unified answer. By applying the lens of coordination avoidance to a range of existing, well-defined and ideally high-value domains, we have the opportunity to demonstrate exactly when weak consistency is adequate and, equally importantly, when it is not. Without a full specification, language techniques [3, 4] and library-based optimizations [14] are helpful to programmers. However, with a full specification and increased knowledge of application semantics [6, 7], we can fully realize the benefits of coordination avoidance while further mitigating programmer burden. While coordination cannot always be avoided, we are bullish on a continued ability to effectively manage it.

References

[1] D. J. Abadi. Consistency tradeoffs in modern distributed database system design: CAP is only part of the story. IEEE Computer, 45(2):37–42, 2012.

[2] P. Alvaro, P. Bailis, N. Conway, and J. M. Hellerstein. Consistency without borders. In SoCC 2013.

[3] P. Alvaro, N. Conway, J. M. Hellerstein, and D. Maier. Blazes: Coordination analysis for distributed programs. In ICDE 2014.

[4] P. Alvaro, N. Conway, J. M. Hellerstein, and W. Marczak. Consistency analysis in Bloom: a CALM and collected approach. In CIDR 2011.

[5] P. Bailis, A. Davidson, A. Fekete, A. Ghodsi, J. M. Hellerstein, and I. Stoica. Highly Available Transactions: Virtues and Limitations. In VLDB 2014.

[6] P. Bailis, A. Fekete, M. J. Franklin, A. Ghodsi, J. M. Hellerstein, and I. Stoica. Coordination-Avoiding Database Systems. arXiv:1402.2237, 2014.

[7] P. Bailis, A. Fekete, A. Ghodsi, J. M. Hellerstein, and I. Stoica. Scalable Atomic Visibility with RAMP Transactions. In SIGMOD 2014.

[8] P. Bailis and A. Ghodsi. Eventual Consistency Today: Limitations, extensions, and beyond. ACM Queue, 11(3), 2013.

[9] N. Bronson et al. Tao: Facebook’s distributed data store for the social graph. In USENIX ATC 2013.

[10] J. C. Corbett et al. Spanner: Google’s globally-distributed database. In OSDI 2012.

[11] S. Gilbert and N. Lynch. Brewer’s conjecture and the feasibility of consistent, available, partition-tolerant web services. ACM SIGACT News, 33(2):51–59, 2002.

[12] J. Gray and A. Reuter. Transaction Processing: Concepts and Techniques. Morgan Kaufmann, 1993.

[13] P. L. Lehman and S. B. Yao. Efficient locking for concurrent operations on B-trees. ACM TODS, 6(4):650–670, 1981.

[14] M. Shapiro, N. Preguiça, C. Baquero, and M. Zawirski. A comprehensive study of convergent and commutative replicated data types. Technical Report 7506, INRIA, 2011.

[15] J. Shute et al. F1: A distributed SQL database that scales. In VLDB 2013.

[16] M. Stonebraker. The case for shared nothing. IEEE Database Engineering Bulletin, 9(1):4–9, 1986.

[17] M. Stonebraker. Why enterprises are uninterested in NoSQL. ACM Queue Blog, September 2010.

[18] TPC Council. TPC-C Benchmark Revision 5.11. 2010.

[19] Y. Xu, Z. Musgrave, B. Noble, and M. Bailey. Bobtail: avoiding long tails in the cloud. In NSDI 2013.

[20] M. Zawirski, A. Bieniusa, V. Balegas, S. Duarte, C. Baquero, M. Shapiro, and N. Preguiça. SwiftCloud: Fault-tolerant geo-replication integrated all the way to the client machine. arXiv:1310.3107, 2013.

You can follow me on Twitter here.

How To Make Fossils Productive Again (30 Apr 2016)
You Can Do Research Too (24 Apr 2016)
Lean Research (20 Feb 2016)
I Loved Graduate School (01 Jan 2016)
NSF Graduate Research Fellowship: N=1 Materials for Systems Research (03 Sep 2015)
Worst-Case Distributed Systems Design (03 Feb 2015)
When Does Consistency Require Coordination? (12 Nov 2014)
Data Integrity and Problems of Scope (20 Oct 2014)
Linearizability versus Serializability (24 Sep 2014)
MSR Silicon Valley Systems Projects I Have Loved (19 Sep 2014)

Peter Bailis :: Highly Available, Seldom Consistent