Slide 1

Slide 1 text

Without Resilience Nothing Else Matters Jonas Bonér CTO Lightbend @jboner

Slide 2

Slide 2 text

No content

Slide 3

Slide 3 text

This Is Fault Tolerance “But it ain’t how hard you’re hit; it’s about how hard you can get hit, and keep moving forward. How much you can take, and keep moving forward. That’s how winning is done.” - Rocky Balboa

Slide 4

Slide 4 text

Resilience Is Beyond Fault Tolerance

Slide 5

Slide 5 text

Resilience “The ability of a substance or object to spring back into shape. The capacity to recover quickly from difficulties.” -Merriam Webster

Slide 6

Slide 6 text

Antifragility “Antifragility is beyond resilience and robustness. The resilient resists shock and stays the same; the antifragile gets better.” - Nassem Nicholas Taleb Antifragile: Things That Gain from Disorder - Nassim Nicholas Taleb

Slide 7

Slide 7 text

“We can model and understand in isolation. 
 But, when released into competitive nominally regulated societies, their connections proliferate, 
 their interactions and interdependencies multiply, 
 their complexities mushroom. 
 And we are caught short.” - Sidney Dekker Drift into Failure - Sidney Dekker

Slide 8

Slide 8 text

Software Systems Today Are Incredibly Complex Netflix Twitter

Slide 9

Slide 9 text

We need to study Resilience in Complex Systems

Slide 10

Slide 10 text

Complicated System

Slide 11

Slide 11 text

Complex System

Slide 12

Slide 12 text

Complicated ≠ Complex

Slide 13

Slide 13 text

“Complex systems run in degraded mode.” “Complex systems run as broken systems.” - richard Cook How Complex Systems Fail - Richard Cook

Slide 14

Slide 14 text

“Counterintuitive. That’s [Jay] Forrester’s word to describe complex systems. Leverage points are not intuitive. Or if they are, we intuitively use them backward, systematically worsening whatever problems we are trying to solve.” - Donella Meadows Leverage Points: Places to Intervene in a System - Donella Meadows

Slide 15

Slide 15 text

“Humans should not be involved in setting timeouts.” “Human involvement in complex systems is the biggest source of trouble.” - Ben Christensen, Netflix

Slide 16

Slide 16 text

Humans Generally Make Things Worse

Slide 17

Slide 17 text

‘‘Going solid’’: a model of system dynamics and consequences for patient safety - R Cook, J Rasmussen Resilience in complex adaptive systems: Operating at the Edge of Failure - Richard Cook - Talk at Velocity NY 2013 Economic Failure Boundary Unacceptable Workload Boundary Operating Point FAILURE Accident Boundary Operating at the Edge of Failure

Slide 18

Slide 18 text

Economic Failure Boundary Unacceptable Workload Boundary Accident Boundary Management Pressure Towards Economic Efficiency Gradient Towards Least Effort Counter Gradient For More Resilience ‘‘Going solid’’: a model of system dynamics and consequences for patient safety - R Cook, J Rasmussen Resilience in complex adaptive systems: Operating at the Edge of Failure - Richard Cook - Talk at Velocity NY 2013 Operating at the Edge of Failure

Slide 19

Slide 19 text

‘‘Going solid’’: a model of system dynamics and consequences for patient safety - R Cook, J Rasmussen Resilience in complex adaptive systems: Operating at the Edge of Failure - Richard Cook - Talk at Velocity NY 2013 Economic Failure Boundary Unacceptable Workload Boundary Accident Boundary Error Margin Marginal Boundary Operating at the Edge of Failure

Slide 20

Slide 20 text

‘‘Going solid’’: a model of system dynamics and consequences for patient safety - R Cook, J Rasmussen Resilience in complex adaptive systems: Operating at the Edge of Failure - Richard Cook - Talk at Velocity NY 2013 Accident Boundary Marginal Boundary ? Operating at the Edge of Failure

Slide 21

Slide 21 text

‘‘Going solid’’: a model of system dynamics and consequences for patient safety - R Cook, J Rasmussen Resilience in complex adaptive systems: Operating at the Edge of Failure - Richard Cook - Talk at Velocity NY 2013 Operating at the Edge of Failure Accident Boundary Marginal Boundary

Slide 22

Slide 22 text

Embrace Failure

Slide 23

Slide 23 text

Resilience is by Design Photo courtesy of FEMA/Joselyne Augustino

Slide 24

Slide 24 text

“Autonomy makes information local, leading to greater certainty and stability.” - Mark Burgess In Search of Certainty - Mark Burgess

Slide 25

Slide 25 text

Promise Theory Think in Promises Not Commands

Slide 26

Slide 26 text

Promise Theory Promises converge towards A definite outcome from unpredictable beginnings 㱺 improved Stability Commands diverge into unpredictable outcomes from definite beginnings 㱺 decreased Stability

Slide 27

Slide 27 text

Resilience in Biological Systems

Slide 28

Slide 28 text

Meerkats Puppies! Now that I’ve got your attention, complexity theory - Nicolas Perony, TED talk

Slide 29

Slide 29 text

“In three words, in the animal kingdom, simplicity leads to complexity 
 which leads to resilience.” - Nicolas Perony Puppies! Now that I’ve got your attention, complexity theory - Nicolas Perony, TED talk

Slide 30

Slide 30 text

Resilience in Social Systems

Slide 31

Slide 31 text

Dealing in Security Understanding vital services, and how they keep you safe 1 INDIVIDUAL (you) 6 ways to die 3 sets of essential services 7 layers of PROTECTION Dealing in Security - Mike Bennet, Vinay Gupta

Slide 32

Slide 32 text

What we can learn from Resilience in Biological and Social Systems 1. Feature Diversity and redundancy 2. Inter-Connected network structure 3. Wide distribution across all scales 4. Capacity to self-adapt & self-organize Toward Resilient Architectures 1: Biology Lessons - Michael Mehaffy, Nikos A. Salingaros Applying resilience thinking: Seven principles for building resilience in social-ecological systems - Reinette Biggs et. al.

Slide 33

Slide 33 text

Resilience in Computer Systems

Slide 34

Slide 34 text

No content

Slide 35

Slide 35 text

We Need To Manage Failure Not Try To Avoid It

Slide 36

Slide 36 text

Let It Crash

Slide 37

Slide 37 text

Crash Only Software Crash-Only Software - George Candea, Armando Fox Stop = Crash Safely Start = Recover Fast

Slide 38

Slide 38 text

Recursive Restartability Turning the Crash-Only Sledgehammer into a Scalpel Recursive Restartability: Turning the Reboot Sledgehammer into a Scalpel - George Candea, Armando Fox

Slide 39

Slide 39 text

Traditional State Management Object Critical state that needs protection Client Thread boundary Synchronous dispatch Thread boundary ? Utterly broken

Slide 40

Slide 40 text

“Accidents come from relationships not broken parts.” - Sidney dekker Drift into Failure - Sidney Dekker

Slide 41

Slide 41 text

Requirements for a Sane Failure Model 1. Contained—Avoid cascading failures 2. Reified—as messages 3. Signalled—Asynchronously 4. Observed—by 1-N 5. Managed—Outside failed Context Failures need to be

Slide 42

Slide 42 text

Bulkhead Pattern

Slide 43

Slide 43 text

Enter Supervision

Slide 44

Slide 44 text

Out of the Tar Pit - Ben Moseley , Peter Marks • Input Data • Derived Data Critical We need a way out of the State Tar Pit

Slide 45

Slide 45 text

Essential State Out of the Tar Pit - Ben Moseley , Peter Marks Essential Logic Accidental State and Control We need a way out of the State Tar Pit

Slide 46

Slide 46 text

The Vending Machine Pattern

Slide 47

Slide 47 text

Think Vending Machine Coffee Machine Programmer Inserts coins Gets coffee Add more coins

Slide 48

Slide 48 text

Think Vending Machine Programmer Service Guy Inserts coins Gets coffee Out of coffee beans failure Adds more beans Out of coffee beans error WRONG Coffee Machine

Slide 49

Slide 49 text

Think Vending Machine Service Client Supervisor Request Response Validation Error Application Failure Manages Failure

Slide 50

Slide 50 text

Error Kernel Pattern Onion-layered state & Failure management Making reliable distributed systems in the presence of software errors - Joe Armstrong On Erlang, State and Crashes - Jesper Louis Andersen

Slide 51

Slide 51 text

Onion Layered State Management Error Kernel Object Critical state that needs protection Client Supervision Supervision Thread boundary Supervision

Slide 52

Slide 52 text

Demo Time Let’s model a resilient vending machine, in Akka

Slide 53

Slide 53 text

Demo Runner object VendingMachineDemo extends App {
 
 val system = ActorSystem("vendingMachineDemo")
 val coffeeMachine = system.actorOf(Props[CoffeeMachineManager], "coffeeMachineManager")
 val customer = Inbox.create(system) // emulates the customer
 … // test runs 
 system.shutdown()
 } https://gist.github.com/jboner/d24c0eb91417a5ec10a6

Slide 54

Slide 54 text

Test Happy Path // Insert 2 coins and get an Espresso
 customer.send(coffeeMachine, Coins(2))
 customer.send(coffeeMachine, Selection(Espresso))
 val Beverage(coffee1) = customer.receive(5.seconds)
 println(s"Got myself an $coffee1")
 assert(coffee1 == Espresso) https://gist.github.com/jboner/d24c0eb91417a5ec10a6

Slide 55

Slide 55 text

Test User Error customer.send(coffeeMachine, Coins(1))
 customer.send(coffeeMachine, Selection(Latte))
 val NotEnoughCoinsError(message) = customer.receive(5.seconds)
 println(s"Got myself a validation error: $message")
 assert(message == "Please insert [1] coins") https://gist.github.com/jboner/d24c0eb91417a5ec10a6

Slide 56

Slide 56 text

Test System Failure // Insert 1 coin (had 1 before) and try to get my Latte
 // Machine should:
 // 1. Fail
 // 2. Restart
 // 3. Resubmit my order
 // 4. Give me my coffee
 customer.send(coffeeMachine, Coins(1))
 customer.send(coffeeMachine, TriggerOutOfCoffeeBeansFailure)
 customer.send(coffeeMachine, Selection(Latte))
 val Beverage(coffee2) = customer.receive(5.seconds)
 println(s"Got myself a $coffee2")
 assert(coffee2 == Latte)
 https://gist.github.com/jboner/d24c0eb91417a5ec10a6

Slide 57

Slide 57 text

Protocol // Coffee types
 trait CoffeeType
 case object BlackCoffee extends CoffeeType
 case object Latte extends CoffeeType
 case object Espresso extends CoffeeType
 
 // Commands
 case class Coins(number: Int)
 case class Selection(coffee: CoffeeType)
 case object TriggerOutOfCoffeeBeansFailure
 
 // Events
 case class CoinsReceived(number: Int)
 
 // Replies
 case class Beverage(coffee: CoffeeType)
 
 // Errors
 case class NotEnoughCoinsError(message: String)
 
 // Failures
 case class OutOfCoffeeBeansFailure(customer: ActorRef,
 pendingOrder: Selection,
 nrOfInsertedCoins: Int) extends Exception https://gist.github.com/jboner/d24c0eb91417a5ec10a6

Slide 58

Slide 58 text

CoffeeMachine class CoffeeMachine extends Actor {
 val price = 2
 var nrOfInsertedCoins = 0
 var outOfCoffeeBeans = false
 var totalNrOfCoins = 0
 
 def receive = { … }
 
 override def postRestart(failure: Throwable): Unit = { … } } https://gist.github.com/jboner/d24c0eb91417a5ec10a6

Slide 59

Slide 59 text

CoffeeMachine def receive = {
 case Coins(nr) =>
 nrOfInsertedCoins += nr
 totalNrOfCoins += nr
 println(s"Inserted [$nr] coins")
 println(s"Total number of coins in machine is [$totalNrOfCoins]")
 
 case selection @ Selection(coffeeType) =>
 if (nrOfInsertedCoins < price)
 sender.tell(NotEnoughCoinsError( s”Insert [${price - nrOfInsertedCoins}] coins"), self)
 else {
 if (outOfCoffeeBeans)
 throw new OutOfCoffeeBeansFailure(sender, selection, nrOfInsertedCoins)
 println(s"Brewing your $coffeeType")
 sender.tell(Beverage(coffeeType), self)
 nrOfInsertedCoins = 0
 }
 
 case TriggerOutOfCoffeeBeansFailure =>
 outOfCoffeeBeans = true
 } https://gist.github.com/jboner/d24c0eb91417a5ec10a6

Slide 60

Slide 60 text

CoffeeMachine override def postRestart(failure: Throwable): Unit = {
 println(s"Restarting coffee machine...")
 failure match {
 case OutOfCoffeeBeansFailure(customer, pendingOrder, coins) =>
 nrOfInsertedCoins = coins
 outOfCoffeeBeans = false
 println(s"Resubmitting pending order $pendingOrder")
 context.self.tell(pendingOrder, customer)
 }
 } https://gist.github.com/jboner/d24c0eb91417a5ec10a6

Slide 61

Slide 61 text

Supervisor class CoffeeMachineManager extends Actor {
 override val supervisorStrategy =
 OneForOneStrategy(maxNrOfRetries = 10, withinTimeRange = 1.minute) {
 case e: OutOfCoffeeBeansFailure =>
 println(s"ServiceGuy notified: $e")
 Restart
 case _: Exception =>
 Escalate
 }
 
 // to simplify things he is only managing 1 single machine
 val machine = context.actorOf( Props[CoffeeMachine], name = "coffeeMachine")
 
 def receive = {
 case request => machine.forward(request)
 }
 } https://gist.github.com/jboner/d24c0eb91417a5ec10a6

Slide 62

Slide 62 text

So......... Sorry...but Not really. Are We Done?

Slide 63

Slide 63 text

We can not keep putting all eggs in the same basket

Slide 64

Slide 64 text

We need to Maintain Diversity and Redundancy

Slide 65

Slide 65 text

The Network is Reliable NOT Really

Slide 66

Slide 66 text

Here, We are living in the Looming Shadow of Impossibility Theorems CAP: Consistency is impossible FLP: Consensus is impossible

Slide 67

Slide 67 text

Towards Resilient Distributed Systems Isolation • Autonomous Microservices • Resilient Protocols • Virtualization Data Resilience • Eventual & Causal Consistency • Event Logging • Flow Control / Feedback Control Self-healing • Decentralized Architectures • Gossip Protocols • Failure Detection Embrace the Network •Asynchronicity •Location Transparency

Slide 68

Slide 68 text

Microservices 1. Autonomy 2. Isolation 3. Mobility 4. Single Responsibility 5. Exclusive StatE

Slide 69

Slide 69 text

An autonomous Service can only promise its own behavior Apply Promise Theory

Slide 70

Slide 70 text

We need to decompose the system using Consistency Boundaries

Slide 71

Slide 71 text

Inside Data Our current present—state Outside Data Blast from the past—facts Between Services Hope for the future—commands Data on the inside vs Data on the outside - Pat Helland

Slide 72

Slide 72 text

WITHIN the Consistency Boundary we can have STRONG CONSISTENCY

Slide 73

Slide 73 text

BETWEEN Consistency Boundaries it is a ZOO

Slide 74

Slide 74 text

We need Systems that are Decoupled in Time and Space

Slide 75

Slide 75 text

Embrace the Network • Go Asynchronous • Make distribution first class • Learn from the mistakes of RPC, EJB & CORBA • Leverage Location Transparency • Actor Model does it right

Slide 76

Slide 76 text

Location Transparency One communication abstraction across all dimensions of scale Core 㱺 Socket 㱺 CPU 㱺 Container 㱺 Server 㱺 Rack 㱺 Data Center 㱺 GLobal

Slide 77

Slide 77 text

Resilient Protocols are tolerant to • Message loss • Message reordering • Message duplication Embrace ACID 2.0 • Associative • Commutative • Idempotent • Distributed Depend on • Asynchronous Communication • Eventual Consistency

Slide 78

Slide 78 text

“To make a system of interconnected components crash-only, it must be designed so that components can tolerate the crashes and temporary unavailability of their peers. This means we require: [1] strong modularity with relatively impermeable component boundaries, [2] timeout-based communication and lease-based resource allocation, and [3] self- describing requests that carry a time-to-live and information on whether they are idempotent.” - George Candea, Armando Fox Crash-Only Software - George Candea, Armando Fox

Slide 79

Slide 79 text

"Software components should be designed such that they can deny service for any request or call. Then, if an underlying component can say No, apps must be designed to take No for an answer and decide how to proceed: give up, wait and retry, reduce fidelity, etc.” - George Candea, Armando Fox Recursive Restartability: Turning the Reboot Sledgehammer into a Scalpel - George Candea, Armando Fox

Slide 80

Slide 80 text

Services need to learn to accept NO for an answer

Slide 81

Slide 81 text

Member Node Member Node Member Node Member Node Member Node Member Node Member Node Member Node Member Node Member Node Decentralized Epidemic Gossip Protocols Gossip Of membership, Data & Meta Data Failure detection heartbeat

Slide 82

Slide 82 text

STRONG Consistency Is the wrong default

Slide 83

Slide 83 text

“Two-phase commit is the anti-availability protocol.” - Pat Helland Standing on Distributed Shoulders of Giants - Pat Helland

Slide 84

Slide 84 text

Eventual Consistency We have to rely on But relax, it’s how the world works

Slide 85

Slide 85 text

Transactions But I really need

Slide 86

Slide 86 text

“In general, application developers simply do not implement large scalable applications assuming distributed transactions.” - Pat Helland Life Beyond Distributed Transactions - Pat Helland

Slide 87

Slide 87 text

Guess. Apologize. Compensate. Use a protocol of

Slide 88

Slide 88 text

“The truth is the log. The database is a cache of a subset of the log.” - Pat Helland Immutability Changes Everything - Pat Helland

Slide 89

Slide 89 text

CRUD is DEAD

Slide 90

Slide 90 text

Event Logging • Work with Facts—immutable values • Event Sourcing • DB of Facts—Keep all history • Just replay on failure • Free Auditing, Debugging, Replication • Single Writer PRinciple • Avoids OO-Relational impedence mismatch • CQRS—Separate the Read & Write Model

Slide 91

Slide 91 text

Let’s model a resilient & Event Logged vending machine, in Akka Demo Time

Slide 92

Slide 92 text

Event Logged CoffeeMachine // Events
 case class CoinsReceived(number: Int)
 class CoffeeMachine extends PersistentActor {
 val price = 2
 var nrOfInsertedCoins = 0
 var outOfCoffeeBeans = false
 var totalNrOfCoins = 0
 
 override def persistenceId = "CoffeeMachine"
 
 override def receiveCommand: Receive = {
 case Coins(nr) =>
 nrOfInsertedCoins += nr
 println(s"Inserted [$nr] coins")
 persist(CoinsReceived(nr)) { evt =>
 totalNrOfCoins += nr
 println(s"Total number of coins in machine is [$totalNrOfCoins]")
 } … } override def receiveRecover: Receive = {
 case CoinsReceived(coins) =>
 totalNrOfCoins += coins
 println(s"Total number of coins in machine is [$totalNrOfCoins]")
 }
 } https://gist.github.com/jboner/1db37eeee3ed3c9422e4

Slide 93

Slide 93 text

“An escalator can never break: it can only become stairs. You should never see an Escalator Temporarily Out Of Order sign, just Escalator Temporarily Stairs. Sorry for the convenience.” - Mitch Hedberg

Slide 94

Slide 94 text

Graceful Degradation

Slide 95

Slide 95 text

Circuit Breaker

Slide 96

Slide 96 text

Little’s Law L = λW Queue Length = Arrival Rate * Response Time W = L/λ Response Time = Queue Length / Arrival Rate W: Response Time L: Queue Length

Slide 97

Slide 97 text

Flow Control Always Apply BackPressure

Slide 98

Slide 98 text

Feedback Control

Slide 99

Slide 99 text

“Continuously compare the actual output to its desired reference value; then apply a change to the system inputs that counteracts any deviation of the actual output from the reference.” - Philipp K. Janert Feedback Control for Computer Systems - Philipp K. Janet The Feedback Principle

Slide 100

Slide 100 text

Feedback Control

Slide 101

Slide 101 text

Influencing a Complex System

Slide 102

Slide 102 text

Places to Intervene in a Complex System 1. The constants, parameters or numbers 2. The sizes of buffers relative to their flows 3. The structure of material stocks and flows 4. The lengths of delays, relative to the rate of system change 5. The strength of negative feedback loops 6. The gain around driving positive feedback loops 7. The structure of information flows 8. The rules of the system 9. The power to add, change, evolve, or self-organize structure 10. The goals of the system 11. The mindset or paradigm out of which the system arises 12. The power to transcend paradigms Leverage Points: Places to Intervene in a System - Donella Meadows:

Slide 103

Slide 103 text

Triple Loop Learning Loop 1: Follow the rules Loop 2: Change the rules Loop 3: Learn how to learn Triple Loop Learning - Chris Argyris

Slide 104

Slide 104 text

Testing

Slide 105

Slide 105 text

What can we learn from Arnold? Blow things up

Slide 106

Slide 106 text

Shoot Your App Down

Slide 107

Slide 107 text

Pull the Plug …and see what happens

Slide 108

Slide 108 text

No content

Slide 109

Slide 109 text

Executive Summary

Slide 110

Slide 110 text

“Complex systems run as broken systems.” - richard Cook How Complex Systems Fail - Richard Cook

Slide 111

Slide 111 text

Resilience is by Design Photo courtesy of FEMA/Joselyne Augustino

Slide 112

Slide 112 text

Without Resilience Nothing Else Matters

Slide 113

Slide 113 text

References Drift into Failure - http://www.amazon.com/Drift-into-Failure-Components-Understanding-ebook/dp/B009KOKXKY How Complex Systems Fail - http://web.mit.edu/2.75/resources/random/How%20Complex%20Systems%20Fail.pdf Leverage Points: Places to Intervene in a System - http://www.donellameadows.org/archives/leverage-points-places-to-intervene-in-a-system/ Going Solid: A Model of System Dynamics and Consequences for Patient Safety - http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1743994/ Resilience in Complex Adaptive Systems: Operating at the Edge of Failure - https://www.youtube.com/watch?v=PGLYEDpNu60 Puppies! Now that I’ve got your attention, Complexity Theory - https://www.ted.com/talks/ nicolas_perony_puppies_now_that_i_ve_got_your_attention_complexity_theory How Bacteria Becomes Resistant - http://www.abc.net.au/science/slab/antibiotics/resistance.htm Towards Resilient Architectures: Biology Lessons - http://www.metropolismag.com/Point-of-View/March-2013/Toward-Resilient-Architectures-1-Biology-Lessons/ Dealing in Security - http://resiliencemaps.org/files/Dealing_in_Security.July2010.en.pdf What is resilience? An introduction to social-ecological research - http://www.stockholmresilience.org/download/18.10119fc11455d3c557d6d21/1398172490555/ SU_SRC_whatisresilience_sidaApril2014.pdf Applying resilience thinking: Seven principles for building resilience in social-ecological systems - http://www.stockholmresilience.org/download/ 18.10119fc11455d3c557d6928/1398150799790/SRC+Applying+Resilience+final.pdf Crash-Only Software - https://www.usenix.org/legacy/events/hotos03/tech/full_papers/candea/candea.pdf Recursive Restartability: Turning the Reboot Sledgehammer into a Scalpel - http://roc.cs.berkeley.edu/papers/recursive_restartability.pdf Out of the Tar Pit - http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.93.8928 Bulkhead Pattern - http://skife.org/architecture/fault-tolerance/2009/12/31/bulkheads.html Making Reliable Distributed Systems in the Presence of Software Errors - http://www.erlang.org/download/armstrong_thesis_2003.pdf On Erlang, State and Crashes - http://jlouisramblings.blogspot.be/2010/11/on-erlang-state-and-crashes.html Akka Supervision - http://doc.akka.io/docs/akka/snapshot/general/supervision.html Release It!: Design and Deploy Production-Ready Software - https://pragprog.com/book/mnee/release-it Feedback Control for Computer Systems - http://www.amazon.com/Feedback-Control-Computer-Systems-Philipp/dp/1449361692 The Network in Reliable - http://queue.acm.org/detail.cfm?id=2655736 Data on the Outside vs Data on the Inside - https://msdn.microsoft.com/en-us/library/ms954587.aspx Life Beyond Distributed Transactions - http://adrianmarriott.net/logosroot/papers/LifeBeyondTxns.pdf Immutability Changes Everything - http://cidrdb.org/cidr2015/Papers/CIDR15_Paper16.pdf Standing on Distributed Shoulders of Giants - https://queue.acm.org/detail.cfm?id=2953944 Thinking in Promises - http://shop.oreilly.com/product/0636920036289.do In Search Of Certainty - http://shop.oreilly.com/product/0636920038542.do Reactive Microservices Architecture - http://www.oreilly.com/programming/free/reactive-microservices-architecture-orm.csp Reactive Streams - http://reactive-streams.org Vending Machine Akka Supervision Demo - https://gist.github.com/jboner/d24c0eb91417a5ec10a6 Persistent Vending Machine Akka Supervision Demo - https://gist.github.com/jboner/1db37eeee3ed3c9422e4

Slide 114

Slide 114 text

Thank You

Slide 115

Slide 115 text

Without Resilience Nothing Else Matters Jonas Bonér CTO Lightbend @jboner