Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Cloud Native Chaos Engineering with Litmus

Cloud Native Chaos Engineering with Litmus

The cloud-native approach has taken the DevOps world by a pleasant surprise by the welcome adoption of Kubernetes across all categories - from Developers to SREs to VP of digital transformation. As the huge mass of legacy applications move Cloud-Native platforms, an important problem arises. How do SREs make sure the systems do not have weaknesses and have the required level of resilience? A well thought out chaos engineering methodology is the right answer. And for a large number of fast-changing applications and infrastructure, finding the right set of chaos experiments and identifying if the impact of chaos has resulted in showing up a weakness in the system is almost an impossible task.

In Cloud-Native Chaos engineering, the developers develop chaos tests as an extension of the development process. These tests are developed using standard Kubernetes Custom Resources or CRs so that they are easier to manipulate according to the environment. These chaos experiments are groomed in CI pipelines and finally published in the Chaos Hub so that they are available to SREs using the Cloud-Native applications in production. SREs use such chaos experiments of various microservices to schedule chaos in a random fashion to find weaknesses in their deployments, which leads to increased reliability.

Uma Mukkara

January 23, 2020
Tweet

More Decks by Uma Mukkara

Other Decks in Technology

Transcript

  1. Conf42.com : Cloud Native Chaos Engineering About me 2 Uma

    Mukkara Co-Founder & COO @uma_mukkara Open source projects
  2. Conf42.com : Cloud Native Chaos Engineering Chaos Eng is ...

    3 • A need • A culture • A practice • I got all that.. But how do I start? ◦ Yes, I am on Kubernetes
  3. Conf42.com : Cloud Native Chaos Engineering Agenda 4 • What

    is Cloud-Native Chaos Engineering ? • Principles of Cloud-Native Chaos Engineering • Introduction to Litmus • Chaos-Hub • Examples • What can you do? • Q&A
  4. Conf42.com : Cloud Native Chaos Engineering Cloud Native Chaos Engineering

    (Introduction) 5 Chaos engineering for cloud native environments Chaos engineering done cloud-native way Chaos engineering done Kubernetes native way
  5. Conf42.com : Cloud Native Chaos Engineering Cloud-Native environments • Credits:

    Takens from GitLab commit conference slides; Author is Dan Kohn, CNCF 6
  6. Conf42.com : Cloud Native Chaos Engineering Linux Kubernetes Nodejs Libraries

    Cloud-Native environment Your code vs entire stack 7 17M SLOC 35M SLOC 12M SLOC 2.5M SLOC Mycode (50K) • Kubernetes upgrades happen frequently • Associated applications and libraries will have updates as often as your code
  7. Conf42.com : Cloud Native Chaos Engineering Linux Kubernetes Nodejs Libraries

    Cloud-Native environment Your environment - Is very Dynamic - Needs continuous validation 8 17M SLOC 35M SLOC 12M SLOC 2.5M SLOC Mycode (50K)
  8. Conf42.com : Cloud Native Chaos Engineering Linux Kubernetes Nodejs Libraries

    Cloud-Native environment You need - Chaos Engineering 9 17M SLOC 35M SLOC 12M SLOC 2.5M SLOC Mycode (50K)
  9. Conf42.com : Cloud Native Chaos Engineering Cloud-Native environment You need

    - Cloud-Native Chaos Engineering 10 The other big differences in Cloud-Native environment are: - YAML manifests for intent (kubectl apply) - GitOps Chaos Engineering Cloud Native Chaos Engineering =
  10. Conf42.com : Cloud Native Chaos Engineering Cloud-Native Chaos Engineering Cloud

    Native APIs POD Deployment PVC Statefulset SVC CRDs For Development For Chaos Testing Cloud Native APIs ? Cloud-native Application 11
  11. Conf42.com : Cloud Native Chaos Engineering Cloud Native APIs POD

    Deployment PVC Statefulset SVC CRDs For Chaos Testing Cloud Native APIs Chaos Resources New CRDs Cloud-native Application For Development Cloud-Native Chaos Engineering 12
  12. Conf42.com : Cloud Native Chaos Engineering Chaos Resources Cloud-Native Chaos

    Engineering 13 Chaos Operator Chaos CRDs Chaos Metrics
  13. Conf42.com : Cloud Native Chaos Engineering Open Source Principles of

    Cloud-Native Chaos Engineering 15 Chaos API/CRDs Pluggable chaos Community driven
  14. Conf42.com : Cloud Native Chaos Engineering Principles of Cloud-Native Chaos

    Engineering 16 CNCF blog: http://bit.ly/cncf-chaos
  15. Conf42.com : Cloud Native Chaos Engineering Litmus project - Introduction

    18 • Leading open source project for Chaos Engineering on Kubernetes • Apache2 License • https://github.com/litmuschaos • 50+ contributors • 600+ stars • CNCF Landscape - https://landscape.cncf.io/selected=litmus • Chaos Hub https://hub.litmuschaos.io/ • CNCF Blog https://bit.ly/cncf-chaos
  16. Conf42.com : Cloud Native Chaos Engineering Litmus - Cloud Native

    19 Open Source Chaos API/CRDs Pluggable chaos Community driven
  17. Conf42.com : Cloud Native Chaos Engineering Litmus - Cloud Native

    20 Chaos API/CRDs ChaosEngine ChaosExperiment ChaosResult
  18. Conf42.com : Cloud Native Chaos Engineering Litmus - Cloud Native

    21 CRDs Chaos Libraries Pluggable chaos PowerfulSeal Pumba LitmusLib Build your own chaos library
  19. Conf42.com : Cloud Native Chaos Engineering Example of plugging new

    chaos library 22 Pluggable chaos PowerfulSeal Chaos logic Build a Docker image Create a new experiment Create a new CR Set powerfulseal as the ChaosLib Litmus chaos runner automatically takes care of calling powerfulseal kill experiment, observes the results and updates the ChaosResult CR. Create PowerfulSeal Deploy
  20. Conf42.com : Cloud Native Chaos Engineering Litmus - Cloud Native

    23 Community driven ChaosHub Experiments Developers Push SREs Pull
  21. Conf42.com : Cloud Native Chaos Engineering How Litmus works 26

    Chaos Libraries hub.litmuschaos.io Chaos Operator kubectl apply -f <litmus.yaml> Chaos Charts App container Chaos container Install Litmus Install Charts Inject Chaos Chaos Result
  22. Conf42.com : Cloud Native Chaos Engineering Cloud-Native Chaos Engineering -

    Example apiVersion: v1 kind: Pod metadata: name: percona-pod labels: app: percona spec: containers: - name: percona image: percona:2.4 apiVersion: v1 kind: PersistentVolumeClaim metadata: Name: demo-vol1-claim spec: storageClassName: openebs-jiva-default accessModes: - ReadWriteOnce resources: requests: storage: 5Gi Create POD Create PV Inject Chaos Cloud Native Developer apiVersion: litmuschaos.io/v1alpha1 kind: ChaosEngine metadata: name: engine-percona spec: appinfo: appns: default applabel: "app=percona" experiments: - name: replica-kill spec: components: - name: target-kill spec: components: 27
  23. Conf42.com : Cloud Native Chaos Engineering ChaosHub - Generic experiments

    29 Generic Experiments Pod-Delete Container-kill pod-network-loss pod-network-latency pod-cpu-hog node-cpu-hog disk-fill disk-loss node-drain pod-network-corruption
  24. Conf42.com : Cloud Native Chaos Engineering ChaosHub - Application specific

    experiments 30 Cloud Native Application Application Pod(s) Application Service(s) Application Data(Pod) New CR Pre Checks Post Checks Result CR Pod-Delete (Generic Experiment)
  25. Conf42.com : Cloud Native Chaos Engineering How can you contribute

    32 • Join #litmus channel on Kubernetes Slack community • Use Litmus and create new issues • Push new experiments to ChaosHub
  26. Conf42.com : Cloud Native Chaos Engineering If you are practicing

    Chaos Engineering .. 33 ChaosHub Experiments Developers Push SREs Pull