Without Conflicts, Serializability Is Free

14 Apr 2014

Common pitches for modern, serializable databases include claims that they are “as scalable as NoSQL,” they “combine the speed and scale advantages of NoSQL systems with ACID guarantees,” or they demonstrate that “the scalability, fault-tolerance, and performance of NoSQL databases are still achievable with [serializable] transactions.” These claims are somewhat misleading, and here’s why:

Any two operations on the same data—at least one of which is a write—can compromise serializability, or the illusion of a sequential execution, if executed concurrently. So, when executing transactions that read and write to the same data, a database will have to stall some of the transactions in order to preserve serializability. Adding more servers won’t necessarily improve throughput: if a workload bottlenecks on read/write synchronization, adding physical resources like extra servers won’t help.

In contrast, a NoSQL system like Riak or Cassandra offering “weak” consistency can avoid these synchronization bottlenecks. Additional servers can process additional requests in parallel, without communicating. This provides availability, low latency, and scalability—even for single-record accesses—allowing literally unbounded throughput. Of course, there’s no free lunch: these scalable systems provide weaker guarantees that can—but do not always—compromise application-level consistency.

However, for operations over disjoint data—that is, for transactions without read-write conflicts—serializable databases can perform as well as weakly consistent systems. Under these workloads, there’s no need for synchronization between operations, which can safely execute concurrently. This is why I say that without conflicts, serializability is free.

Will exploiting this disjoint data parallelism result in a quantum leap in distributed database design? Mike Stonebraker would probably say “no”. Database system designs have optimized for data-parallel access patterns for decades: “shared nothing” serializable databases provide excellent programmability and perform well—just not for all workloads.

Anyone providing strong semantics and claiming absolute performance, latency, or availability parity with “NoSQL” is either confused about database isolation, isn’t running workloads with conflicts, or is just trying to sell you a database. In practice, your mileage may vary: understand your read-write conflict patterns, and plan accordingly.

You can follow me on Twitter here.