Auto DevOps and Chaos Engineering for Stateful Applications on Kubernetes

Building Auto DevOps for Stateful Applications on Kubernetes Uma Mukkara
COO, MayaData @uma_mukkara Visit us at KubeCon Booth #SE23

About Me Uma Mukkara • Co-Founder and COO at MayaData
Inc. • Works on OpenEBS and Litmus projects • Data Agility for Stateful Applications • Chaos Engineering for Kubernetes • Twitter: @uma_mukkara

About MayaData Mission: Data Agility on Kubernetes by using Kubernetes
as Data Plane • Founded OpenEBS project and donated to CNCF. Key contributor of the project. • Sponsor of Litmus project, a chaos Engineering framework for Kubernetes • OpenEBS Enterprise Platform - Delivering Data Agility for Enterprises using OpenEBS, Litmus and Director.

Cloud_Native Rejects - NA’19 • GitHub: https://github.com/openebs/openebs • Website: https://openebs.io/
• Slack: https://slack.openebs.io • Twitter: https://twitter.com/openebs • Overall 350+ Code contributors (https://devstats.openebs.io/) • 1400+ Slack Members, 600+ Forks, 6000+ stars • 1.0 released in June • Deployed in 100s of clusters every week.

MayaData this week

MayaData this week BOOTH #SE23 Talk to us at @openebs
or @mayadata_inc about using Kubernetes as your data plane

Cloud_Native Rejects - NA’19 • Challenges in stateful applications on
Kubernetes ◦ Data Agility and CI pipelines ▪ Data challenges in DevOps (Data DevOps) • Find the weaknesses quicker using Chaos Engineering

Cloud_Native Rejects - NA’19 Developers / Development SREs/ Operations CI
Pipelines Build the stateless app Deploy the stateless app Test the stateless app Build the stateful app Deploy the stateful app Test the stateful app Data is dynamic

Cloud_Native Rejects - NA’19 • Challenge 1 - Keeping the
latest data pattern in the Pipelines Production Deployment Pseudonymisation Stage Deployment CI Pipelines Update the data

Cloud_Native Rejects - NA’19 PIPELINES SRE/DevOPS ADMIN STAGE CLUSTER SEED
DATA SEED DATA CI CLUSTER Keep moving the data

Cloud_Native Rejects - NA’19 • Challenge 2 - Providing the
data state to Developers when CI pipelines fail

Cloud_Native Rejects - NA’19 PIPELINES SPIN UP DEBUG ENVIRONMENT with
the state of the data retained ACCESS AND DEBUG DEVELOPER CODE MERGE

Cloud_Native Rejects - NA’19 1) Get the pipelines to run
on latest data (or data that is closer to production) CHALLENGES: 2) Give instant debug environment to developers to ﬁx the issues in failed pipelines Typically can take days to perform the above tasks manually

Cloud_Native Rejects - NA’19 • Use Container Attached Storage architecture
with Snapshots and Clones 1) Get the pipelines to run on latest data 2) Give instant debug environment to developers Snapshot the data Move the data Restore the data Repeat DMaaS Clone the data Run the pipeline Snapshot the data Clone the data

Cloud_Native Rejects - NA’19 • Use Container Attached Storage architecture
with Snapshots and Clones + Simple + Teams are autonomous + Additive to underlying systems or cloud volumes or JBODs + Target Users: ◦ SRE ◦ App Developer ◦ Storage Admin OpenEBS cStor - Best suited for Snapshots and Clones

Cloud_Native Rejects - NA’19 STAGE CLUSTER SEED DATA AKS DATA
SNAPSHOTS (DMAAS through AWS S3) CI CLUSTER SEED DATA GKE CI PIPELINES DEBUG ENV for DEV

Cloud_Native Rejects - NA’19

Cloud_Native Rejects - NA’19 Seed Data Seed Data snapshot Successful
pipeline Failed pipeline Debug Instance

Cloud_Native Rejects - NA’19 Stateful applications resilience depend on many
micro services that you use externally

Cloud_Native Rejects - NA’19 • Failure testing in CI pipelines
is not good enough Failure testing breaks a system in some preconceived way, but doesn’t explore the wide open ﬁeld of weird, unpredictable things that could happen - Ali Basiri, Chaos Engineering Expert • Break things on purpose - In production ◦ Find weaknesses ◦ Fix them ◦ Repeat the process

Cloud_Native Rejects - NA’19 * Images and content authored by:
Mark McBride, Turbine Labs

Cloud_Native Rejects - NA’19 • Practice chaos engineering to increase
resiliency Resiliency Achieved by CI Pipelines Functional Tests Failure Tests + Achieved by Staging / Production Good CI Random Chaos +

Cloud_Native Rejects - NA’19 • My code is 1%. Rest
is not controlled by me. • Linux is the least dynamic stack • Rest is all microservices, based - highly dynamic CHAOS ENGINEERING Then, how to achieve Resilience ?

Cloud_Native Rejects - NA’19 Cloud Native APIs POD Deployment PVC
Statefulset SVC CRDs For Development For Chaos Testing Cloud Native APIs ? Cloud-native Application

Cloud_Native Rejects - NA’19 Cloud Native APIs POD Deployment PVC
Statefulset SVC CRDs For Chaos Testing Cloud Native APIs Chaos Engine Chaos Experiment Chaos Result New CRDs Cloud-native Application For Development

Cloud_Native Rejects - NA’19 apiVersion: v1 kind: Pod metadata: name:
percona-pod labels: app: percona spec: containers: - name: percona image: percona:2.4 kind: PersistentVolumeClaim apiVersion: v1 metadata: Name: demo-vol1-claim spec: storageClassName: openebs-jiva-default accessModes: - ReadWriteOnce resources: requests: storage: 5G Create POD Create PV Inject Chaos Cloud Native Developer apiVersion: litmuschaos.io/v1alpha1 kind: ChaosEngine metadata: name: engine-percona spec: appinfo: appns: default applabel: "app=percona" experiments: - name: replica-kill spec: components: - name: read-only spec: components:

Cloud_Native Rejects - NA’19 www.litmuschaos.io

Cloud_Native Rejects - NA’19 hub.litmuschaos.io

Cloud_Native Rejects - NA’19 Kafka broker-0 Kafka broker-1 Kafka broker-2
Producer Producer OpenEBS LocalPV

Cloud_Native Rejects - NA’19

Auto DevOps and Chaos Engineering for Stateful Applications on Kubernetes

Auto DevOps and Chaos Engineering for Stateful Applications on Kubernetes

Uma Mukkara

More Decks by Uma Mukkara

Other Decks in Technology

Featured

Transcript

Building Auto DevOps for Stateful Applications on Kubernetes Uma Mukkara

About Me Uma Mukkara • Co-Founder and COO at MayaData

About MayaData Mission: Data Agility on Kubernetes by using Kubernetes

Cloud_Native Rejects - NA’19 • GitHub: https://github.com/openebs/openebs • Website: https://openebs.io/

MayaData this week

MayaData this week

MayaData this week BOOTH #SE23 Talk to us at @openebs

Cloud_Native Rejects - NA’19 • Challenges in stateful applications on

Cloud_Native Rejects - NA’19 Developers / Development SREs/ Operations CI

Cloud_Native Rejects - NA’19 • Challenge 1 - Keeping the

Cloud_Native Rejects - NA’19 PIPELINES SRE/DevOPS ADMIN STAGE CLUSTER SEED

Cloud_Native Rejects - NA’19 • Challenge 2 - Providing the

Cloud_Native Rejects - NA’19 PIPELINES SPIN UP DEBUG ENVIRONMENT with

Cloud_Native Rejects - NA’19 1) Get the pipelines to run

Cloud_Native Rejects - NA’19 • Use Container Attached Storage architecture

Cloud_Native Rejects - NA’19 • Use Container Attached Storage architecture

Cloud_Native Rejects - NA’19 STAGE CLUSTER SEED DATA AKS DATA

Cloud_Native Rejects - NA’19

Cloud_Native Rejects - NA’19 Seed Data Seed Data snapshot Successful

Cloud_Native Rejects - NA’19 Stateful applications resilience depend on many

Cloud_Native Rejects - NA’19 • Failure testing in CI pipelines

Cloud_Native Rejects - NA’19 * Images and content authored by:

Cloud_Native Rejects - NA’19 • Practice chaos engineering to increase

Cloud_Native Rejects - NA’19 • My code is 1%. Rest

Cloud_Native Rejects - NA’19 Cloud Native APIs POD Deployment PVC

Cloud_Native Rejects - NA’19 Cloud Native APIs POD Deployment PVC

Cloud_Native Rejects - NA’19 apiVersion: v1 kind: Pod metadata: name:

Cloud_Native Rejects - NA’19 www.litmuschaos.io

Cloud_Native Rejects - NA’19 hub.litmuschaos.io

Cloud_Native Rejects - NA’19 Kafka broker-0 Kafka broker-1 Kafka broker-2

Cloud_Native Rejects - NA’19

Cloud_Native Rejects - NA’19