Upgrade to Pro — share decks privately, control downloads, hide ads and more …

An exploration of black holes: strange failure modes - Craft Conf - Tammy - Gremlin

An exploration of black holes: strange failure modes - Craft Conf - Tammy - Gremlin

In this presentation, Tammy will share the many wonders of black holes - not in space - but in computer science. We'll explore what happens when requests never make it out of a blackhole, how this impacts observability, and what we can learn from blackhole-related failures. Tammy will explain how you can use black holes to learn how to make systems more reliable. We'll send requests into black holes on purpose and observe the consequences in detail. This session will be a technical deep dive on strange failure modes that are unexpected and surprising.

Tammy Bryant Butow

June 02, 2021
Tweet

More Decks by Tammy Bryant Butow

Other Decks in Technology

Transcript

  1. A region of a distributed system where gravity is so

    strong that nothing—no requests or transactions —can escape from it. All IP packets in this region are unable to escape. What is a black hole? @tambryantbutow
  2. Capture IP packets at the transport layer, targeted by supplied

    port and host arguments. Use existing traffic policing features in the Linux Kernel to drop targeted IP packets. How can we create a black hole? @tambryantbutow
  3. Does blackholing a critical path service like the Balance Reader

    result in graceful degradation of the customer experience? https://4503-f37e5de5-39bf-4406-acbc-9c7f2abb0d16.cs-us-east1-wzxb.cloudshell.dev/home @tambryantbutow
  4. https://app.gremlin.com/attacks/new/kubernetes Does blackholing a critical path service like the Balance

    Reader result in graceful degradation of the customer experience? @tambryantbutow
  5. The balance appears as $--- This could make the user

    think they have no money in their account @tambryantbutow
  6. The user is still able to make a deposit of

    $1000 while the Balance Reader service is in a blackhole. @tambryantbutow
  7. The user is unable to send payments. They will see

    an error that the payment failed due to Balance Reader. @tambryantbutow
  8. The user is unable to send payments. They will see

    an error that the payment failed due to Balance Reader. @tambryantbutow
  9. Free demo environment to learn about black holes 1. Use

    this link to install with minikube on google cloud shell: https://ssh.cloud.google.com/cloudshell/editor?show=ide&cloudshell_git_repo=http s://github.com/GoogleCloudPlatform/bank-of-anthos&cloudshell_workspace=.&clo udshell_tutorial=extras/cloudshell/tutorial.md 2. Click minikube → start 3. In Cloud Shell terminal, run kubectl apply -f extras/jwt/jwt-secret.yaml 4. Click <> Cloud Code → Run on Kubernetes 5. To create black holes, create a namespace for gremlin and install gremlin as helm chart https://github.com/gremlin/helm @tambryantbutow
  10. Does blackholing transaction history result in graceful degradation of the

    customer experience? https://4503-f37e5de5-39bf-4406-acbc-9c7f2abb0d16.cs-us-east1-wzxb.cloudshell.dev/home @tambryantbutow
  11. Does blackholing transaction history result in graceful degradation of the

    customer experience? https://4503-f37e5de5-39bf-4406-acbc-9c7f2abb0d16.cs-us-east1-wzxb.cloudshell.dev/home @tambryantbutow
  12. @tambryantbutow kubectl scale deployment transactionhistory --replicas=2 What can we do

    to mitigate against a blackhole? Depending on the service, scaling replicas may work well
  13. https://github.com/GoogleCloudPlatform/bank-of-anthos There will be a very short outage and then

    the other pod will take over Pod 2: Transaction History Pod 1: Transaction History Deployment Set: Transaction History replicas=2 @tambryantbutow
  14. We are still able to see transaction history and no

    longer receive error messages. https://4503-f37e5de5-39bf-4406-acbc-9c7f2abb0d16.cs-us-east1-wzxb.cloudshell.dev/home @tambryantbutow
  15. @tambryantbutow Does blackholing a non-critical path service like the Ad

    Service result in graceful degradation of the customer experience? @tambryantbutow
  16. Graceful Degradation Yes, our experiment was successful and our results

    were what we expected them to be. The blackhole did not negatively impact the customer experience or our overall SLOs. @tambryantbutow
  17. @tambryantbutow Micro Stellar Supermassive We can experience and create black

    holes of all sizes. When creating black holes, start micro and gradually expand the blast radius @tambryantbutow
  18. @tambryantbutow How can we use black holes to learn how

    to make systems more reliable? @tambryantbutow
  19. Thank you Get a copy of the O’Reilly ebook Reducing

    MTTD for High-Severity Incidents gremlin.com/talk/black holes @tambryantbutow