Upgrade to Pro — share decks privately, control downloads, hide ads and more …

The Overnight Failure

The Overnight Failure

This talk is based on a true horror story.

It is very likely that you too have created a big problem in production at some point in your career, wether by creating was a bug or running the wrong command. Here I share the story of how I did it this time and the lessons I learned from this experience.

Presented at: Euruko 2017

Sebastian Sogamoso

September 30, 2017
Tweet

More Decks by Sebastian Sogamoso

Other Decks in Programming

Transcript

  1. !

  2. A B

  3. A B

  4. A B

  5. A B

  6. Recap • Users carpooled everyday • The payment process ran

    once a week • Passengers were charged • Drivers were paid
  7. Black Saturday Wekly process was ran 06:00 06:25 User couldn’t

    pay for breakafast 06:34 Users reported bug
  8. Boss: hey, sorry to call you this early but we

    have a problem with payments in production and a lot of customers are complaining about it
  9. Black Saturday Wekly process was ran 06:00 06:25 06:43 User

    couldn’t pay for breakafast 06:34 Users reported bug Manager woke me up
  10. Black Saturday Wekly process was ran 06:00 06:25 07:28 06:43

    User couldn’t pay for breakafast 06:34 Users reported bug Manager woke me up Problem contained
  11. Passenger: UserID: 9 Driver: User ID: 100 $10.00 Passenger: UserID:

    9 Driver: User ID: 100 $10.00 Passenger: UserID: 9 Driver: User ID: 100 $10.00 Passenger: UserID: 9 Driver: User ID: 100 $10.00 Passenger: UserID: 9 Driver: User ID: 100 $10.00 Passenger: UserID: 9 Driver: User ID: 100 $10.00 Passenger: UserID: 9 Driver: User ID: 100 $10.00 Passenger: UserID: 9 Driver: User ID: 100 $10.00 Passenger: UserID: 9 Driver: User ID: 100 $10.00 Passenger: UserID: 9 Driver: User ID: 100 $10.00 Passenger: UserID: 9 Driver: User ID: 100 $10.00 Passenger: UserID: 9 Driver: User ID: 100 $10.00 0 0 0
  12. Black Saturday Wekly process was ran 06:00 06:25 06:43 User

    couldn’t pay for breakafast 06:34 Users reported bug Manager woke me up 22:50 Deployed a fix to production Problem contained 07:28
  13. Black Saturday Wekly process was ran 06:00 06:25 06:43 User

    couldn’t pay for breakafast 06:34 Users reported bug Manager woke me up 22:50 Deployed a fix to production 22:55 Started looking for a new job Problem contained 07:28
  14. Black Saturday Wekly process was ran 06:00 06:25 06:43 User

    couldn’t pay for breakafast 06:34 Users reported bug Manager woke me up 22:50 Deployed a fix to production Problem contained 07:28
  15. Thousands of users affected by the bug Users were charged

    up-to 200 times A single user was charged over $5k Maxed out credit cards. Emptied bank accounts