Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Embracing failure to prevent Meltdown

Embracing failure to prevent Meltdown

Harisankar P S

April 27, 2018
Tweet

More Decks by Harisankar P S

Other Decks in Programming

Transcript

  1. I AM THE FOUNDER AND CEO OF Ruby on Rails

    dev shop based in Kochi, India We love working in Ruby on Rails and the ruby community.
  2. { “name” => "Harisankar P S", “email” => ”[email protected]”, “twitter”

    => "coderhs", “facebook" => "coderhs", “github” => “coderhs”, “linkedin” => “coderhs”, } I have been the organiser of @keralarb Meetups since 2012,
 Rails Girls Kochi, and Open Source Saturday kochi. I am a Ruby, Elixir and Crystal Developer I volunteer and has been local organiser of Ruby Conf India, 2016 & 2017
  3. Interesting fact about India: India has 22 official languages, 1653

    spoken language and over 50,000 Dialects And I know 3 of them
  4. I am love being in # its my second visit

    here - the places is good, the people are good, the ruby community is great,
 the is amazing, and food is delicious.
  5. Me, being a fan of programming I instantly said
 software,

    because as its easy to control and extend
 using algorithms
  6. Yes when someone hacks the system, we can use software,


    and try blocking him using algorithms, filters, etc Or
  7. The reason why I told this story is to explain

    that
 many things that we build into our software
 is based on real world entities.
 
 Thats whats object oriented programming is about So I am going to talk about two such patterns today. Timeout patterns and Circuit breaker pattern.
  8. ▸ Wrap a method call inside a circuit breaker object.

    ▸ This object monitors the success rate of your piece of code ▸ If it fails often, one after the other - it breaks the circuit ▸ The mantra here is to fail FAST!
  9. ▸ Its basically about being bold when it comes to

    handling failures ▸ Our app will fail, it will break down or the API service we use might break down. ▸ There is no 100%
  10. APPLICATION 3RD PARTY RANK CHECKING TOOL Every day you send

    100,000
 Keywords and ranks to check 
 their rank, one by one You get back a response when
 they are ready.
  11. APPLICATION 3RD PARTY RANK CHECKING TOOL But today the service

    has a faulty
 hard drive and the application is slow
 and you still send it more request. You get back nothing.
  12. IN MICRO SERVICES : IT PREVENTS SNOWBALL EFFECT ▸ One

    service slowed down by 200 ms ▸ Second service slowed down by another 400 ms ▸ when it reaches the file place it slowed down by 1000-5000ms
  13. WHAT CONSTITUTES A FAILURE ▸ Not performing the required action

    ▸ Giving a different result other than the one expected ▸ Taking more than the acceptable time to get a response. Here we break to talk about timeout
  14. TIMEOUT ▸ We need to have checks and balances throughout

    the software ▸ We usually think about it when we break our app into microservices, where we give specification to another developer. ▸ But we need to think our software as well: ▸ In a DB Query ▸ HTTP Request (Ruby waits for 2 seconds by default)
  15. TEXT ▸ Timeouts improve fault isolation ▸ Timeouts forces us

    to use our resources properly ▸ Timeouts are mandatory if your mantra is Fail Fast
  16. CLOSED STATE ▸ This is when the system lets everything

    pass. The encapsulated code is executed ▸ There are no failures happening or its less not enough to cause you a problem ▸ We are all happy Just summarising again
  17. OPEN STATE ▸ When we cross a limit of the

    number of failures, we open the circuit ▸ Which means no more requests will reach the service or underlying code ▸ The system will provide it with an alternate route ▸ cache entry ▸ add it to the retry queue ▸ Just show an appropriate error.
  18. HALF OPEN ▸ This state is when the system is

    trying to see the underlying code is fixed ▸ It sends one of the requests through, after a threshold time and if it succeeds. Then try few more requests or an existing failed request, and finally closes the circuit and send all requests through ▸ So right now the system is sort of self-healing
  19. TEXT QUANTIFIABLE ANALYTICS ▸ 100 ms - 600 ms, #

    of query - your non tech team members have no idea what it means ▸ Measure in number of failures per minute/per hour/per day ▸ We can analyze code improvement by measuring improvement # of code failures, in production. ▸ We can deploy to one server out of many, monitors the performance in that server and then rollback or mass deploy automatically
  20. TEXT IMPROVE PERFORMANCE: TIMEOUT + CIRCUIT BREAKERS 80 MS 100

    MS 150 MS 75 MS Micro Services with timeouts that breaks circuits. 80 MS 100 MS 125 MS 75 MS Reduce the timeout time and see how much the failure increases and start to fine tuning
  21. ▸ Circuit breakers give you a configurable policy ▸ No

    Resource Build UP - 10K req/seconds with 5 seconds timeout thats 50K requests in the pipeline ▸ We can build Monitoring systems using this - see whats breaking