Slide 1

Slide 1 text

Building Scalable, Highly Concurrent & Fault-Tolerant Systems: Lessons Learned Jonas Bonér CTO Typesafe Twitter: @jboner

Slide 2

Slide 2 text

I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again

Slide 3

Slide 3 text

I will never use distributed transactions again Lessons Learned through... I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again

Slide 4

Slide 4 text

I will never use distributed transactions again Lessons Learned through... I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again Agony

Slide 5

Slide 5 text

I will never use distributed transactions again Lessons Learned through... I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again I will never use distributed transactions again Agony and Pain lots of Pain

Slide 6

Slide 6 text

Agenda • It’s All Trade-offs • Go Concurrent • Go Reactive • Go Fault-Tolerant • Go Distributed • Go Big

Slide 7

Slide 7 text

No content

Slide 8

Slide 8 text

It’s all Trade-offs

Slide 9

Slide 9 text

Performance vs Scalability

Slide 10

Slide 10 text

Latency vs Throughput

Slide 11

Slide 11 text

Availability vs Consistency

Slide 12

Slide 12 text

Go Concurrent

Slide 13

Slide 13 text

Shared mutable state

Slide 14

Slide 14 text

Shared mutable state Together with threads...

Slide 15

Slide 15 text

Shared mutable state ...leads to Together with threads...

Slide 16

Slide 16 text

Shared mutable state ...code that is totally INDETERMINISTIC ...leads to Together with threads...

Slide 17

Slide 17 text

Shared mutable state ...code that is totally INDETERMINISTIC ...and the root of all EVIL ...leads to Together with threads...

Slide 18

Slide 18 text

Shared mutable state ...code that is totally INDETERMINISTIC ...and the root of all EVIL ...leads to Together with threads... Please, avoid it at all cost

Slide 19

Slide 19 text

Shared mutable state ...code that is totally INDETERMINISTIC ...and the root of all EVIL ...leads to Together with threads... Please, avoid it at all cost Use IMMUTABLE state!!!

Slide 20

Slide 20 text

The problem with locks • Locks do not compose • Locks break encapsulation • Taking too few locks • Taking too many locks • Taking the wrong locks • Taking locks in the wrong order • Error recovery is hard

Slide 21

Slide 21 text

You deserve better tools • Dataflow Concurrency • Actors • Software Transactional Memory (STM) • Agents

Slide 22

Slide 22 text

Dataflow Concurrency • Deterministic • Declarative • Data-driven • Threads are suspended until data is available • Lazy & On-demand • No difference between: • Concurrent code • Sequential code • Examples: Akka & GPars

Slide 23

Slide 23 text

Actors •Share NOTHING •Isolated lightweight event-based processes •Each actor has a mailbox (message queue) •Communicates through asynchronous and non-blocking message passing •Location transparent (distributable) •Examples: Akka & Erlang

Slide 24

Slide 24 text

• See the memory as a transactional dataset • Similar to a DB: begin, commit, rollback (ACI) • Transactions are retried upon collision • Rolls back the memory on abort • Transactions can nest and compose • Use STM instead of abusing your database with temporary storage of “scratch” data • Examples: Haskell, Clojure & Scala STM

Slide 25

Slide 25 text

• Reactive memory cells (STM Ref) • Send a update function to the Agent, which 1. adds it to an (ordered) queue, to be 2. applied to the Agent asynchronously • Reads are “free”, just dereferences the Ref • Cooperates with STM • Examples: Clojure & Akka Agents

Slide 26

Slide 26 text

If we could start all over...

Slide 27

Slide 27 text

If we could start all over... 1. Start with a Deterministic, Declarative & Immutable core

Slide 28

Slide 28 text

If we could start all over... 1. Start with a Deterministic, Declarative & Immutable core • Logic & Functional Programming

Slide 29

Slide 29 text

If we could start all over... 1. Start with a Deterministic, Declarative & Immutable core • Logic & Functional Programming • Dataflow

Slide 30

Slide 30 text

If we could start all over... 1. Start with a Deterministic, Declarative & Immutable core • Logic & Functional Programming • Dataflow 2. Add Indeterminism selectively - only where needed

Slide 31

Slide 31 text

If we could start all over... 1. Start with a Deterministic, Declarative & Immutable core • Logic & Functional Programming • Dataflow 2. Add Indeterminism selectively - only where needed • Actor/Agent-based Programming

Slide 32

Slide 32 text

If we could start all over... 1. Start with a Deterministic, Declarative & Immutable core • Logic & Functional Programming • Dataflow 2. Add Indeterminism selectively - only where needed • Actor/Agent-based Programming 3. Add Mutability selectively - only where needed

Slide 33

Slide 33 text

If we could start all over... 1. Start with a Deterministic, Declarative & Immutable core • Logic & Functional Programming • Dataflow 2. Add Indeterminism selectively - only where needed • Actor/Agent-based Programming 3. Add Mutability selectively - only where needed • Protected by Transactions (STM)

Slide 34

Slide 34 text

If we could start all over... 1. Start with a Deterministic, Declarative & Immutable core • Logic & Functional Programming • Dataflow 2. Add Indeterminism selectively - only where needed • Actor/Agent-based Programming 3. Add Mutability selectively - only where needed • Protected by Transactions (STM) 4. Finally - only if really needed

Slide 35

Slide 35 text

If we could start all over... 1. Start with a Deterministic, Declarative & Immutable core • Logic & Functional Programming • Dataflow 2. Add Indeterminism selectively - only where needed • Actor/Agent-based Programming 3. Add Mutability selectively - only where needed • Protected by Transactions (STM) 4. Finally - only if really needed • Add Monitors (Locks) and explicit Threads

Slide 36

Slide 36 text

Go Reactive

Slide 37

Slide 37 text

Never block • ...unless you really have to • Blocking kills scalability (and performance) • Never sit on resources you don’t use • Use non-blocking IO • Be reactive • How?

Slide 38

Slide 38 text

Go Async Design for reactive event-driven systems 1. Use asynchronous message passing 2. Use Iteratee-based IO 3. Use push not pull (or poll) • Examples: • Akka or Erlang actors • Play’s reactive Iteratee IO • Node.js or JavaScript Promises • Server-Sent Events or WebSockets • Scala’s Futures library

Slide 39

Slide 39 text

Go Fault-Tolerant

Slide 40

Slide 40 text

Failure Recovery in Java/C/C# etc.

Slide 41

Slide 41 text

• You are given a SINGLE thread of control Failure Recovery in Java/C/C# etc.

Slide 42

Slide 42 text

• You are given a SINGLE thread of control • If this thread blows up you are screwed Failure Recovery in Java/C/C# etc.

Slide 43

Slide 43 text

• You are given a SINGLE thread of control • If this thread blows up you are screwed • So you need to do all explicit error handling WITHIN this single thread Failure Recovery in Java/C/C# etc.

Slide 44

Slide 44 text

• You are given a SINGLE thread of control • If this thread blows up you are screwed • So you need to do all explicit error handling WITHIN this single thread • To make things worse - errors do not propagate between threads so there is NO WAY OF EVEN FINDING OUT that something have failed Failure Recovery in Java/C/C# etc.

Slide 45

Slide 45 text

• You are given a SINGLE thread of control • If this thread blows up you are screwed • So you need to do all explicit error handling WITHIN this single thread • To make things worse - errors do not propagate between threads so there is NO WAY OF EVEN FINDING OUT that something have failed • This leads to DEFENSIVE programming with: Failure Recovery in Java/C/C# etc.

Slide 46

Slide 46 text

• You are given a SINGLE thread of control • If this thread blows up you are screwed • So you need to do all explicit error handling WITHIN this single thread • To make things worse - errors do not propagate between threads so there is NO WAY OF EVEN FINDING OUT that something have failed • This leads to DEFENSIVE programming with: • Error handling TANGLED with business logic Failure Recovery in Java/C/C# etc.

Slide 47

Slide 47 text

• You are given a SINGLE thread of control • If this thread blows up you are screwed • So you need to do all explicit error handling WITHIN this single thread • To make things worse - errors do not propagate between threads so there is NO WAY OF EVEN FINDING OUT that something have failed • This leads to DEFENSIVE programming with: • Error handling TANGLED with business logic • SCATTERED all over the code base Failure Recovery in Java/C/C# etc.

Slide 48

Slide 48 text

• You are given a SINGLE thread of control • If this thread blows up you are screwed • So you need to do all explicit error handling WITHIN this single thread • To make things worse - errors do not propagate between threads so there is NO WAY OF EVEN FINDING OUT that something have failed • This leads to DEFENSIVE programming with: • Error handling TANGLED with business logic • SCATTERED all over the code base Failure Recovery in Java/C/C# etc. We can do better!!!

Slide 49

Slide 49 text

Just Let It Crash

Slide 50

Slide 50 text

No content

Slide 51

Slide 51 text

The right way 1. Isolated lightweight processes 2. Supervised processes • Each running process has a supervising process • Errors are sent to the supervisor (asynchronously) • Supervisor manages the failure • Same semantics local as remote • For example the Actor Model solves it nicely

Slide 52

Slide 52 text

Go Distributed

Slide 53

Slide 53 text

Performance vs Scalability

Slide 54

Slide 54 text

How do I know if I have a performance problem?

Slide 55

Slide 55 text

How do I know if I have a performance problem? If your system is slow for a single user

Slide 56

Slide 56 text

How do I know if I have a scalability problem?

Slide 57

Slide 57 text

How do I know if I have a scalability problem? If your system is fast for a single user but slow under heavy load

Slide 58

Slide 58 text

(Three) Misconceptions about Reliable Distributed Computing - Werner Vogels 1. Transparency is the ultimate goal 2. Automatic object replication is desirable 3. All replicas are equal and deterministic Classic paper: A Note On Distributed Computing - Waldo et. al.

Slide 59

Slide 59 text

Transparent Distributed Computing • Emulating Consistency and Shared Memory in a distributed environment • Distributed Objects • “Sucks like an inverted hurricane” - Martin Fowler • Distributed Transactions • ...don’t get me started... Fallacy 1

Slide 60

Slide 60 text

Fallacy 2 RPC • Emulating synchronous blocking method dispatch - across the network • Ignores: • Latency • Partial failures • General scalability concerns, caching etc. • “Convenience over Correctness” - Steve Vinoski

Slide 61

Slide 61 text

Instead

Slide 62

Slide 62 text

Embrace the Network Instead and be done with it Use Asynchronous Message Passing

Slide 63

Slide 63 text

Delivery Semantics • No guarantees • At most once • At least once • Once and only once Guaranteed Delivery

Slide 64

Slide 64 text

It’s all lies.

Slide 65

Slide 65 text

It’s all lies.

Slide 66

Slide 66 text

The network is inherently unreliable and there is no such thing as 100% guaranteed delivery It’s all lies.

Slide 67

Slide 67 text

Guaranteed Delivery

Slide 68

Slide 68 text

Guaranteed Delivery The question is what to guarantee

Slide 69

Slide 69 text

Guaranteed Delivery The question is what to guarantee 1. The message is - sent out on the network?

Slide 70

Slide 70 text

Guaranteed Delivery The question is what to guarantee 1. The message is - sent out on the network? 2. The message is - received by the receiver host’s NIC?

Slide 71

Slide 71 text

Guaranteed Delivery The question is what to guarantee 1. The message is - sent out on the network? 2. The message is - received by the receiver host’s NIC? 3. The message is - put on the receiver’s queue?

Slide 72

Slide 72 text

Guaranteed Delivery The question is what to guarantee 1. The message is - sent out on the network? 2. The message is - received by the receiver host’s NIC? 3. The message is - put on the receiver’s queue? 4. The message is - applied to the receiver?

Slide 73

Slide 73 text

Guaranteed Delivery The question is what to guarantee 1. The message is - sent out on the network? 2. The message is - received by the receiver host’s NIC? 3. The message is - put on the receiver’s queue? 4. The message is - applied to the receiver? 5. The message is - starting to be processed by the receiver?

Slide 74

Slide 74 text

Guaranteed Delivery The question is what to guarantee 1. The message is - sent out on the network? 2. The message is - received by the receiver host’s NIC? 3. The message is - put on the receiver’s queue? 4. The message is - applied to the receiver? 5. The message is - starting to be processed by the receiver? 6. The message is - has completed processing by the receiver?

Slide 75

Slide 75 text

Ok, then what to do? 1. Start with 0 guarantees (0 additional cost) 2. Add the guarantees you need - one by one

Slide 76

Slide 76 text

Ok, then what to do? 1. Start with 0 guarantees (0 additional cost) 2. Add the guarantees you need - one by one Different USE-CASES Different GUARANTEES Different COSTS

Slide 77

Slide 77 text

Ok, then what to do? 1. Start with 0 guarantees (0 additional cost) 2. Add the guarantees you need - one by one Different USE-CASES Different GUARANTEES Different COSTS For each additional guarantee you add you will either: • decrease performance, throughput or scalability • increase latency

Slide 78

Slide 78 text

Just

Slide 79

Slide 79 text

Just Use ACKing

Slide 80

Slide 80 text

Just Use ACKing and be done with it

Slide 81

Slide 81 text

Latency vs Throughput

Slide 82

Slide 82 text

You should strive for maximal throughput with acceptable latency

Slide 83

Slide 83 text

Go Big

Slide 84

Slide 84 text

Go Big Data

Slide 85

Slide 85 text

Big Data Imperative OO programming doesn't cut it • Object-Mathematics Impedance Mismatch • We need functional processing, transformations etc. • Examples: Spark, Crunch/Scrunch, Cascading, Cascalog, Scalding, Scala Parallel Collections • Hadoop have been called the: • “Assembly language of MapReduce programming” • “EJB of our time”

Slide 86

Slide 86 text

Batch processing doesn't cut it • Ala Hadoop • We need real-time data processing • Examples: Spark, Storm, S4 etc. • Watch“Why Big Data Needs To Be Functional” by Dean Wampler Big Data

Slide 87

Slide 87 text

Go Big DB

Slide 88

Slide 88 text

When is a RDBMS not good enough?

Slide 89

Slide 89 text

Scaling reads to a RDBMS is hard

Slide 90

Slide 90 text

Scaling writes to a RDBMS is impossible

Slide 91

Slide 91 text

Do we really need a RDBMS?

Slide 92

Slide 92 text

Do we really need a RDBMS? Sometimes...

Slide 93

Slide 93 text

Do we really need a RDBMS?

Slide 94

Slide 94 text

Do we really need a RDBMS? But many times we don’t

Slide 95

Slide 95 text

Atomic Consistent Isolated Durable

Slide 96

Slide 96 text

Availability vs Consistency

Slide 97

Slide 97 text

Brewer’s CAP theorem

Slide 98

Slide 98 text

You can only pick 2 Consistency Availability Partition tolerance At a given point in time

Slide 99

Slide 99 text

Centralized system • In a centralized system (RDBMS etc.) we don’t have network partitions, e.g. P in CAP • So you get both: Consistency Availability

Slide 100

Slide 100 text

Distributed system • In a distributed (scalable) system we will have network partitions, e.g. P in CAP • So you get to only pick one: Consistency Availability

Slide 101

Slide 101 text

Basically Available Soft state Eventually consistent

Slide 102

Slide 102 text

Think about your data • When do you need ACID? • When is Eventual Consistency a better fit? • Different kinds of data has different needs • You need full consistency less than you think Then think again

Slide 103

Slide 103 text

How fast is fast enough? • Never guess: Measure, measure and measure • Start by defining a baseline • Where are we now? • Define what is “good enough” - i.e. SLAs • Where do we want to go? • When are we done? • Beware of micro-benchmarks

Slide 104

Slide 104 text

• Never guess: Measure, measure and measure • Start by defining a baseline • Where are we now? • Define what is “good enough” - i.e. SLAs • Where do we want to go? • When are we done? • Beware of micro-benchmarks ...or, when can we go for a beer?

Slide 105

Slide 105 text

To sum things up... 1. Maximizing a specific metric impacts others • Every strategic decision involves a trade-off • There's no "silver bullet" 2. Applying yesterday's best practices to the problems faced today will lead to: • Waste of resources • Performance and scalability bottlenecks • Unreliable systems

Slide 106

Slide 106 text

SO

Slide 107

Slide 107 text

GO

Slide 108

Slide 108 text

...now home and build yourself Scalable, Highly Concurrent & Fault-Tolerant Systems

Slide 109

Slide 109 text

Thank You Email: [email protected] Web: typesafe.com Twitter: @jboner