Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Auto DevOps and Chaos Engineering for Stateful Applications on Kubernetes

Uma Mukkara
November 16, 2019

Auto DevOps and Chaos Engineering for Stateful Applications on Kubernetes

On stateful applications in production, the data patterns keep changing as the time progresses. CI pipelines need to have access to the latest data or closer to the latest data for effective testing. As Enterprises and FinTechs start to use Kubernetes and microservices-based architecture, their DevOps teams would like to solve two challenges that are well known in the pre-Kubernetes era. The first challenge is to automate the data lifecycle between production and testing. The second challenge is to give developers instantaneous access to the failed environment when the CI pipeline fails. Solving these two challenges in the Kubernetes space will make the DevOps more productive in enterprises.

The second part of this presentation is about hardening stateful applications on Kubernetes using Chaos Engineering. We introduce need for Litmus, demonstrate injecting Chaos into Kafka.

Uma Mukkara

November 16, 2019
Tweet

More Decks by Uma Mukkara

Other Decks in Technology

Transcript

  1. Building Auto DevOps for Stateful Applications on Kubernetes Uma Mukkara

    COO, MayaData @uma_mukkara Visit us at KubeCon Booth #SE23
  2. About Me Uma Mukkara • Co-Founder and COO at MayaData

    Inc. • Works on OpenEBS and Litmus projects • Data Agility for Stateful Applications • Chaos Engineering for Kubernetes • Twitter: @uma_mukkara
  3. About MayaData Mission: Data Agility on Kubernetes by using Kubernetes

    as Data Plane • Founded OpenEBS project and donated to CNCF. Key contributor of the project. • Sponsor of Litmus project, a chaos Engineering framework for Kubernetes • OpenEBS Enterprise Platform - Delivering Data Agility for Enterprises using OpenEBS, Litmus and Director.
  4. Cloud_Native Rejects - NA’19 • GitHub: https://github.com/openebs/openebs • Website: https://openebs.io/

    • Slack: https://slack.openebs.io • Twitter: https://twitter.com/openebs • Overall 350+ Code contributors (https://devstats.openebs.io/) • 1400+ Slack Members, 600+ Forks, 6000+ stars • 1.0 released in June • Deployed in 100s of clusters every week.
  5. MayaData this week BOOTH #SE23 Talk to us at @openebs

    or @mayadata_inc about using Kubernetes as your data plane
  6. Cloud_Native Rejects - NA’19 • Challenges in stateful applications on

    Kubernetes ◦ Data Agility and CI pipelines ▪ Data challenges in DevOps (Data DevOps) • Find the weaknesses quicker using Chaos Engineering
  7. Cloud_Native Rejects - NA’19 Developers / Development SREs/ Operations CI

    Pipelines Build the stateless app Deploy the stateless app Test the stateless app Build the stateful app Deploy the stateful app Test the stateful app Data is dynamic
  8. Cloud_Native Rejects - NA’19 • Challenge 1 - Keeping the

    latest data pattern in the Pipelines Production Deployment Pseudonymisation Stage Deployment CI Pipelines Update the data
  9. Cloud_Native Rejects - NA’19 • Challenge 2 - Providing the

    data state to Developers when CI pipelines fail
  10. Cloud_Native Rejects - NA’19 PIPELINES SPIN UP DEBUG ENVIRONMENT with

    the state of the data retained ACCESS AND DEBUG DEVELOPER CODE MERGE
  11. Cloud_Native Rejects - NA’19 1) Get the pipelines to run

    on latest data (or data that is closer to production) CHALLENGES: 2) Give instant debug environment to developers to fix the issues in failed pipelines Typically can take days to perform the above tasks manually
  12. Cloud_Native Rejects - NA’19 • Use Container Attached Storage architecture

    with Snapshots and Clones 1) Get the pipelines to run on latest data 2) Give instant debug environment to developers Snapshot the data Move the data Restore the data Repeat DMaaS Clone the data Run the pipeline Snapshot the data Clone the data
  13. Cloud_Native Rejects - NA’19 • Use Container Attached Storage architecture

    with Snapshots and Clones + Simple + Teams are autonomous + Additive to underlying systems or cloud volumes or JBODs + Target Users: ◦ SRE ◦ App Developer ◦ Storage Admin OpenEBS cStor - Best suited for Snapshots and Clones
  14. Cloud_Native Rejects - NA’19 STAGE CLUSTER SEED DATA AKS DATA

    SNAPSHOTS (DMAAS through AWS S3) CI CLUSTER SEED DATA GKE CI PIPELINES DEBUG ENV for DEV
  15. Cloud_Native Rejects - NA’19 • Failure testing in CI pipelines

    is not good enough Failure testing breaks a system in some preconceived way, but doesn’t explore the wide open field of weird, unpredictable things that could happen - Ali Basiri, Chaos Engineering Expert • Break things on purpose - In production ◦ Find weaknesses ◦ Fix them ◦ Repeat the process
  16. Cloud_Native Rejects - NA’19 • Practice chaos engineering to increase

    resiliency Resiliency Achieved by CI Pipelines Functional Tests Failure Tests + Achieved by Staging / Production Good CI Random Chaos +
  17. Cloud_Native Rejects - NA’19 • My code is 1%. Rest

    is not controlled by me. • Linux is the least dynamic stack • Rest is all microservices, based - highly dynamic CHAOS ENGINEERING Then, how to achieve Resilience ?
  18. Cloud_Native Rejects - NA’19 Cloud Native APIs POD Deployment PVC

    Statefulset SVC CRDs For Development For Chaos Testing Cloud Native APIs ? Cloud-native Application
  19. Cloud_Native Rejects - NA’19 Cloud Native APIs POD Deployment PVC

    Statefulset SVC CRDs For Chaos Testing Cloud Native APIs Chaos Engine Chaos Experiment Chaos Result New CRDs Cloud-native Application For Development
  20. Cloud_Native Rejects - NA’19 apiVersion: v1 kind: Pod metadata: name:

    percona-pod labels: app: percona spec: containers: - name: percona image: percona:2.4 kind: PersistentVolumeClaim apiVersion: v1 metadata: Name: demo-vol1-claim spec: storageClassName: openebs-jiva-default accessModes: - ReadWriteOnce resources: requests: storage: 5G Create POD Create PV Inject Chaos Cloud Native Developer apiVersion: litmuschaos.io/v1alpha1 kind: ChaosEngine metadata: name: engine-percona spec: appinfo: appns: default applabel: "app=percona" experiments: - name: replica-kill spec: components: - name: read-only spec: components: