MSR Silicon Valley Systems Projects I Have Loved

19 Sep 2014

Microsoft confirmed yesterday that it’s shuttering its Silicon Valley lab, home to 75 brilliant Computer Science researchers including 2013 Turing Award winner Leslie Lamport. Others have more and wiser things to say about this decision. However, I want to highlight some of the fantastic work that’s come out of MSR Silicon Valley in the recent past in my research area of databases and distributed systems:

Dryad and DryadLINQ were hugely influential systems that challenged the then-popular notions that high-performance distributed dataflow had to be i.) simple map and reduce tasks only and ii.) hard to program. Ideas such as Dryad’s general-purpose DAG execution model, flexible and lightweight data transfers, and lineage-based recovery model can be found in almost every later distributed dataflow system, from Microsoft SCOPE to Apache Tez and Spark. DryadLINQ provides language-integrated access to the Dryad engine, which set the stage for abstractions like RDDs and the return of automatic and online query optimizers. Both are now open source on GitHub.

Papers: Dryad @ EuroSys 2007; DryadLINQ @ OSDI 2008; Optimus @ EuroSys 2013
Naiad is a more recent system for streaming, cyclic, distributed dataflow. Like Dryad, think MapReduce but for arbitrary task graphs, while combining low-latency incremental processing and large-scale batch operation. At Naiad’s core is an new dataflow abstraction called timely dataflow that allows cyclic computations to safely proceed in parallel. The team’s also been working on some exciting extensions such as graph processing. This research won Best Paper at SOSP 2013, one of highest honors in the systems community and is open source on GitHub. The team’s earlier work on differential dataflow illustrated the potential for this efficient fixpoint processing.

Papers: Naiad @ SOSP 2013; Differential Dataflow @ CIDR 2013
CORFU (Clusters of Raw Flash Units) is a system that exposes a cluster of servers loaded with flash drives as a high-throughput shared log abstraction. This is, in itself, a challenging distributed systems problem that the team solved elegantly via a mix of fast sequencing and clever protocol design. However, CORFU’s power is perhaps better demonstrated by Tango, a system the researchers built on top. Tango demonstrates how to build fault-tolerant, high-performance distributed data structures such as trees, maps, and serializable transactions by making efficient use of the shared log abstraction. This architecture is not only creative, but it’s a great use of modern hardware with excellent empirical results to boot.

Papers: CORFU @ NSDI 2012; Tango @ SOSP 2013
MSR Silicon Valley has been on the bleeding edge of distributed data serving. Doug Terry (of Bayou fame) and others have built a system called Pileus that allows fine-grained control and SLAs in geo-replicated storage systems. Marcos Aguilera and others have similarly been working on fast, wide-area transaction execution, from snapshot isolation in Walter to serializability (leveraging one of my favorite ideas in concurrency control: transaction chopping) via transaction chains. Both of these lines of work are highly relevant to the ongoing redesign of large-scale cloud databases; it’s great to see services like Microsoft Azure DocumentDB adopt ideas like Pileus’s tunable consistency.

Papers: Pileus @ SOSP 2013; Walter @ SOSP 2011; Lynx @ SOSP 2013
There’s too much to list. But here’s a few more anyway: Nectar (OSDI 2010) automates the caching and reuse of intermediate results in data-parallel compute systems. Dandelion (SOSP 2013) provides a language-integrated automated runtime for running applications on both data-parallel compute clusters and GPUs. Quincy (SOSP 2009) pioneered the study of fair cluster scheduling algorithms. Dahlia Malkhi has done and continues to do amazing work on distributed algorithms (e.g., PODC 2014) in addition to working on systems projects such as CORFU. And, of course, Leslie Lamport’s recent work on TLA+ continues to make waves—for example, at Amazon.

You’ve probably heard of many of the brilliant folks behind this work before: Doug Terry, Dahlia Malkhi, Martin Abadi, Michael Isard, Derek Murray, Frank McSherry, Marcos Aguilera, Yuan Yu, and the list goes on. And, as I’ve said, there have been many others and many other exciting projects (and entire groups outside distributed systems) at MSR Silicon Valley.

Fortunately, MSR still has other branches—for example, many of the researchers studying core database issues are in Redmond. However, the above projects help illustrate why MSR Silicon Valley was such a research powerhouse and a welcome industrial neighbor to the west.

You can follow me on Twitter here.

How To Make Fossils Productive Again (30 Apr 2016)
You Can Do Research Too (24 Apr 2016)
Lean Research (20 Feb 2016)
I Loved Graduate School (01 Jan 2016)
NSF Graduate Research Fellowship: N=1 Materials for Systems Research (03 Sep 2015)
Worst-Case Distributed Systems Design (03 Feb 2015)
When Does Consistency Require Coordination? (12 Nov 2014)
Data Integrity and Problems of Scope (20 Oct 2014)
Linearizability versus Serializability (24 Sep 2014)
Understanding Weak Isolation Is a Serious Problem (16 Sep 2014)

Peter Bailis :: Highly Available, Seldom Consistent

MSR Silicon Valley Systems Projects I Have Loved

Read More