Without Conflicts, Serializability Is Free
14 Apr 2014
Common pitches for modern, serializable databases include claims that they are “as scalable as NoSQL,” they “combine the speed and scale advantages of NoSQL systems with ACID guarantees,” or they demonstrate that “the scalability, fault-tolerance, and performance of NoSQL databases are still achievable with [serializable] transactions.” These claims are somewhat misleading, and here’s why:
Any two operations on the same data—at least one of which is a write—can compromise serializability, or the illusion of a sequential execution, if executed concurrently. So, when executing transactions that read and write to the same data, a database will have to stall some of the transactions in order to preserve serializability. Adding more servers won’t necessarily improve throughput: if a workload bottlenecks on read/write synchronization, adding physical resources like extra servers won’t help.
In contrast, a NoSQL system like Riak or Cassandra offering “weak” consistency can avoid these synchronization bottlenecks. Additional servers can process additional requests in parallel, without communicating. This provides availability, low latency, and scalability—even for single-record accesses—allowing literally unbounded throughput. Of course, there’s no free lunch: these scalable systems provide weaker guarantees that can—but do not always—compromise application-level consistency.
However, for operations over disjoint data—that is, for transactions without read-write conflicts—serializable databases can perform as well as weakly consistent systems. Under these workloads, there’s no need for synchronization between operations, which can safely execute concurrently. This is why I say that without conflicts, serializability is free.
Will exploiting this disjoint data parallelism result in a quantum leap in distributed database design? Mike Stonebraker would probably say “no”. Database system designs have optimized for data-parallel access patterns for decades: “shared nothing” serializable databases provide excellent programmability and perform well—just not for all workloads.
Anyone providing strong semantics and claiming absolute performance, latency, or availability parity with “NoSQL” is either confused about database isolation, isn’t running workloads with conflicts, or is just trying to sell you a database. In practice, your mileage may vary: understand your read-write conflict patterns, and plan accordingly.
Read More
- How To Make Fossils Productive Again (30 Apr 2016)
- You Can Do Research Too (24 Apr 2016)
- Lean Research (20 Feb 2016)
- I Loved Graduate School (01 Jan 2016)
- NSF Graduate Research Fellowship: N=1 Materials for Systems Research (03 Sep 2015)
- Worst-Case Distributed Systems Design (03 Feb 2015)
- When Does Consistency Require Coordination? (12 Nov 2014)
- Data Integrity and Problems of Scope (20 Oct 2014)
- Linearizability versus Serializability (24 Sep 2014)
- MSR Silicon Valley Systems Projects I Have Loved (19 Sep 2014)