- The question is how do we handle it - Will we recover? - Will our system end up inconsistent? - Will we lose availability? - Handling failure is probably the hardest part of building a robust distributed system. - We have all built distributed systems. - If you have built a rails app, then you have built a distributed system. - The browser talks to the server which talks to the database. - Failure can happen at any stage - Consider a signup form, what happens if the user hits submit and the request failed? - Did the request reach the rails app? - Did the rails app start processing it? - At what point did the request fail? - Was it before writing to the DB? - Was it after? - If the user attempts to signup again, what will happen? - Will the user be successful? - Will the user get an error stating that there already is an account with the given email address? - How can we, as developers, prevent this from happening? - I bring up such a “simple” case, because it just gets more complex from here. Monday, April 28, 14