Embracing failure to prevent Meltdown

EMBRACING FAILURES TO PREVENT MELTDOWN

I AM THE FOUNDER AND CEO OF Ruby on Rails
dev shop based in Kochi, India We love working in Ruby on Rails and the ruby community.

{ “name” => "Harisankar P S", “email” => ”[email protected]”, “twitter”
=> "coderhs", “facebook" => "coderhs", “github” => “coderhs”, “linkedin” => “coderhs”, } I have been the organiser of @keralarb Meetups since 2012,  Rails Girls Kochi, and Open Source Saturday kochi. I am a Ruby, Elixir and Crystal Developer I volunteer and has been local organiser of Ruby Conf India, 2016 & 2017

I am from Kochi, Kerala, India.

Interesting fact about India: India has 22 ofﬁcial languages, 1653
spoken language and over 50,000 Dialects And I know 3 of them

I am love being in # its my second visit
here - the places is good, the people are good, the ruby community is great,  the is amazing, and food is delicious.

I want to start with a question?

HARDWARE OR SOFTWARE?

Hardware or Software which do you think is better?

Imagine a ﬁrewall, is software or hardware better?

Me, being a fan of programming I instantly said  software,
because as its easy to control and extend  using algorithms

Yes when someone hacks the system, we can use software, 
and try blocking him using algorithms, ﬁlters, etc Or

You can cut the lan cable using a scissors and
prevent anyone from hacking

The reason why I told this story is to explain
that  many things that we build into our software  is based on real world entities.    Thats whats object oriented programming is about So I am going to talk about two such patterns today. Timeout patterns and Circuit breaker pattern.

Keeping that in mind. Lets Start!

CIRCUIT BREAKER PATTERN

APP A APP B APP C Dependency Chain

▸ Wrap a method call inside a circuit breaker object.
▸ This object monitors the success rate of your piece of code ▸ If it fails often, one after the other - it breaks the circuit ▸ The mantra here is to fail FAST!

▸ Its basically about being bold when it comes to
handling failures ▸ Our app will fail, it will break down or the API service we use might break down. ▸ There is no 100%

EXAMPLE: A GOOGLE RANK TOOL (NON MICROSERVICE EXAMPLE)

APPLICATION 3RD PARTY RANK CHECKING TOOL Every day you send
100,000  Keywords and ranks to check   their rank, one by one You get back a response when  they are ready.

APPLICATION 3RD PARTY RANK CHECKING TOOL But today the service
has a faulty  hard drive and the application is slow  and you still send it more request. You get back nothing.

YOU GET CALLED TO COME ONLINE AT 3 AM IN
THE MORNING

IN MICRO SERVICES : IT PREVENTS SNOWBALL EFFECT ▸ One
service slowed down by 200 ms ▸ Second service slowed down by another 400 ms ▸ when it reaches the ﬁle place it slowed down by 1000-5000ms

WHAT CONSTITUTES A FAILURE ▸ Not performing the required action
▸ Giving a different result other than the one expected ▸ Taking more than the acceptable time to get a response. Here we break to talk about timeout

TIMEOUT ▸ We need to have checks and balances throughout
the software ▸ We usually think about it when we break our app into microservices, where we give speciﬁcation to another developer. ▸ But we need to think our software as well: ▸ In a DB Query ▸ HTTP Request (Ruby waits for 2 seconds by default)

TEXT ▸ Timeouts improve fault isolation ▸ Timeouts forces us
to use our resources properly ▸ Timeouts are mandatory if your mantra is Fail Fast

3 STATES OF A BREAKER

CLOSED STATE ▸ This is when the system lets everything
pass. The encapsulated code is executed ▸ There are no failures happening or its less not enough to cause you a problem ▸ We are all happy Just summarising again

OPEN STATE ▸ When we cross a limit of the
number of failures, we open the circuit ▸ Which means no more requests will reach the service or underlying code ▸ The system will provide it with an alternate route ▸ cache entry ▸ add it to the retry queue ▸ Just show an appropriate error.

HALF OPEN ▸ This state is when the system is
trying to see the underlying code is ﬁxed ▸ It sends one of the requests through, after a threshold time and if it succeeds. Then try few more requests or an existing failed request, and ﬁnally closes the circuit and send all requests through ▸ So right now the system is sort of self-healing

TEXT QUANTIFIABLE ANALYTICS ▸ 100 ms - 600 ms, #
of query - your non tech team members have no idea what it means ▸ Measure in number of failures per minute/per hour/per day ▸ We can analyze code improvement by measuring improvement # of code failures, in production. ▸ We can deploy to one server out of many, monitors the performance in that server and then rollback or mass deploy automatically

TEXT IMPROVE PERFORMANCE: TIMEOUT + CIRCUIT BREAKERS 80 MS 100
MS 150 MS 75 MS Micro Services with timeouts that breaks circuits. 80 MS 100 MS 125 MS 75 MS Reduce the timeout time and see how much the failure increases and start to ﬁne tuning

▸ Circuit breakers give you a conﬁgurable policy ▸ No
Resource Build UP - 10K req/seconds with 5 seconds timeout thats 50K requests in the pipeline ▸ We can build Monitoring systems using this - see whats breaking

GEMS FOR IT IN RUBY?

Build your own

https://github.com/wsargent/circuit_breaker

https://github.com/pedro/cb2

https://github.com/jnunemaker/resilient

LIBRARIES FOR IT IN ELIXIR?

https://github.com/awochna/breaker

https://github.com/mmzeeman/breaky

https://github.com/klarna/circuit_breaker

TO SUMMARIZE

SEEING THIS REALLY FAST

IS BETTER THAN

SEEING THIS AFTER WAITING FOR 1 MINUTE

Thank You | 谢谢

Embracing failure to prevent Meltdown

Embracing failure to prevent Meltdown

More Decks by Harisankar P S

Other Decks in Programming

Featured

Transcript