Slide 1

Slide 1 text

Lessons Learned and Questions Raised (from building distributed systems) Andy Gross <@argv0> Basho Technologies Thursday, May 16, 13

Slide 2

Slide 2 text

my story Thursday, May 16, 13

Slide 3

Slide 3 text

Which one of these is not like the others? Thursday, May 16, 13

Slide 4

Slide 4 text

“Andy, I studied with Eric Brewer. I know Eric Brewer. Eric Brewer is a friend of mine. Andy, you're no Eric Brewer.” Thursday, May 16, 13

Slide 5

Slide 5 text

Thursday, May 16, 13

Slide 6

Slide 6 text

LOAD ”RICON”, 8, 1 PRESS PLAY ON TAPE LOADING READY. Thursday, May 16, 13

Slide 7

Slide 7 text

My Story C64 Wardialer (~1990) Inventory Management System (MS Access +VBA) (~1998) Large network deployment tools (1999) Content storage system (2000) IP Overlay Network (2005) Distributed File System (2006) Riak (2007-) Riak CS (2011-) Thursday, May 16, 13

Slide 8

Slide 8 text

The Free Lunch is Over 2005 paper by Herb Sutter “A fundamental turn towards concurrency in software” Today: A fundamental turn towards distributed systems Thursday, May 16, 13

Slide 9

Slide 9 text

The Distributed Systems Renaissance We’re all distributed systems people now Reasons: Larger problems Increased expectations Problems: This stuff is hard Thursday, May 16, 13

Slide 10

Slide 10 text

“Thank goodness we don't have only serious problems, but ridiculous ones as well.” - Dijkstra Thursday, May 16, 13

Slide 11

Slide 11 text

Revival and Renewal Of interest in formal specification and verification Of interest in consensus protocols Of new programming languages and paradigms to deal with the complexity of distributed Of databases! Thursday, May 16, 13

Slide 12

Slide 12 text

On Abstractions They’re the means by which we reason about complicated things They’re the means by which we make progress in software But... Thursday, May 16, 13

Slide 13

Slide 13 text

... ad nauseum Thursday, May 16, 13

Slide 14

Slide 14 text

Where’s my libPaxos? Modern operating systems should have consensus capabilities in the OS VMS had DLM (Distributed Lock Manager) We’ve regressed! If Linux can have 50 toy filesystems, why can’t we have Paxos? Thursday, May 16, 13

Slide 15

Slide 15 text

Where’s my libARIES? Write-ahead logging should also be a reusable primitive Historically hard to implement in a layered fashion Stasis (http://code.google.com/p/stasis) is a good start Thursday, May 16, 13

Slide 16

Slide 16 text

Riak Core Dynamo Abstracted Not specific to databases Reasonably successful, despite sparse documentation: Multiple large production deployments (Yahoo, OpenX, StackMob) Used in a few university systems classes Thursday, May 16, 13

Slide 17

Slide 17 text

Erlang/OTP Runtime Riak KV Riak Architecture Client APIs Request Coordination Riak Core get put delete map-reduce HTTP Protocol Buffers Erlang local client membership consistent hashing handoff node-liveness gossip buckets vnodes storage backend JS Runtime vnode master Thursday, May 16, 13

Slide 18

Slide 18 text

On Testing We can prove correctness of distributed algorithms, can we prove correctness of existing distributed systems? Unit tests grossly insufficient for large distributed systems QuickCheck is an improvement “testing only shows the presence, not the absence of bugs” - Dijkstra Thursday, May 16, 13

Slide 19

Slide 19 text

QuickCheck Write high-level assertions (“properties”) that a function should fulfill QuickCheck generates millions of test cases to try to falsify the property Code coverage vs. quality of coverage Thursday, May 16, 13

Slide 20

Slide 20 text

Case Study: Poolboy Poolboy: Erlang connection pool library Seemed to work fine: unit tests passed, Riak integration tests passed A day’s worth of QuickCheck testing revealed bugs in every major piece of functionality Thursday, May 16, 13

Slide 21

Slide 21 text

PULSE: Spoof the Scheduler Riak Core Issue #298 Thursday, May 16, 13

Slide 22

Slide 22 text

Problems Remain QuickCheck is complex, requires training and practice Code evolves separately from tests Large up-front effort, tests decay over time See: “Hansei: Property-based Development of Concurrent Systems” by Joe Blomstedt of Basho Unify model and production code with annotations McErlang does exhaustive state-space exploration Thursday, May 16, 13

Slide 23

Slide 23 text

Testing vs. Verification How to we narrow the conceptual gap between a formal specification (“what”) and its implementation (“how”)? Languages to the rescue? Thursday, May 16, 13

Slide 24

Slide 24 text

On Languages Resurgence of functional, declarative programming C++ has closures now! Dings in the armor of OO Explosion of new languages Let’s use existing tools like compilers, static analysis to verify our programs Thursday, May 16, 13

Slide 25

Slide 25 text

Thursday, May 16, 13

Slide 26

Slide 26 text

On Monitoring What should we monitor and how? We know little about the emergent properties of networks Current open-source options all mostly suck Thursday, May 16, 13

Slide 27

Slide 27 text

SELECT sys.ip ip, procname, rss, pid FROM sys, processes WHERE sys.ip = processes.ip AND (rss*100)/sys.memtotal > 75 AND sys.ip in (SELECT ip FROM machinerole WHERE role=’dns’); Akamai “Query” System *Keeping Track of 70,000+ Servers: The Akamai Query System Thursday, May 16, 13

Slide 28

Slide 28 text

Emergent Property: TCP Incast “You can’t pour two buckets of manure into one bucket” - Scott Fritchie’s Grandfather “microbursts” of traffic sent to one cluster member Coordinator sends request to three replicas All respond with large-ish result at roughly the same time Switch has to either buffer or drop packets Result: throughput collapse Thursday, May 16, 13

Slide 29

Slide 29 text

On Teaching Are there better ways to explain complicated things, like Paxos? Or are they just fundamentally complex and we need to deal with it? Do other disciplines have anything to teach us about new/richer models? Thursday, May 16, 13

Slide 30

Slide 30 text

A Laundry List of Hard Problems Thursday, May 16, 13

Slide 31

Slide 31 text

Admission Control / Overload Prevention Thursday, May 16, 13

Slide 32

Slide 32 text

Multi-tenancy Thursday, May 16, 13

Slide 33

Slide 33 text

Security Thursday, May 16, 13

Slide 34

Slide 34 text

Dynamic Membership Thursday, May 16, 13

Slide 35

Slide 35 text

Garbage Collection Thursday, May 16, 13

Slide 36

Slide 36 text

the burning question.... Thursday, May 16, 13

Slide 37

Slide 37 text

q: do i even know how vector clocks work? a: kinda, but should i have to? Thursday, May 16, 13

Slide 38

Slide 38 text

In Summary The free lunch is over, again! It’s an amazing time to be part of this community Let’s sharpen our tools and build new ones I love all of you! <3 <3 <3 Thursday, May 16, 13

Slide 39

Slide 39 text

Thanks! @argv0 http://www.basho.com http://github.com/basho http://docs.basho.com Thursday, May 16, 13