Sopra Steria - The Journey of Backups in Kubernetes

Slide 1

Slide 1 text

Michael Courcy DevOps Architect The Journey of Backups in Kubernetes

Slide 2

Slide 2 text

Sopra Steria’s Use Case • SopraSteria is a Digital Services and transformation Company (Similar to Capgemini or Atos) with more than 30 000 collaborators world wide. • A lot of customers, big accounts : Banking, Government, Army, Energy, Insurance… • Kubernetes is the tool that we always dreamed : • Contract with many small apps are easy to resuscitate • Size matter when you want to reduce cost • All the usual benefits of Kubernetes and devops practices of course • Provide common tool for all the teams in a consistent manner (CI/CD, home made or those we choose to invest) • The adoption is recent (2 years) but the acceleration is impressive

Slide 3

Slide 3 text

Sopra Steria's Infrastructure • Openshift Container Platform • On AWS and on premise • EBS, GlusterFS and CEPH • Mongodb, Pgsql, Mysql, Oracle, elasticsearch • 10/03 Test/Prod Clusters • 50 Nodes per Cluster

Slide 4

Slide 4 text

Disaster Recovery Requirements Business data Mongo, PG, Mysql Namespaces resources: Deployment, secrets, services … ETCD : describe the whole cluster state Machines : root/os filesystem + attached devices

Slide 5

Slide 5 text

Journey w/ Various Approaches

Slide 6

Slide 6 text

Our Initial Approach (“The Old Way”) • Save the bottom of the pyramid • No need to understand kube • No need to understand the apps • No need to understand the databases But that left us with these issues : • Expensive … • Hard to rebuild: nodes and disks are always changing • Need to shutdown all the machines for complete consistency • Back in the past is back in the past for all tenants …

Slide 7

Slide 7 text

Our Next Stop: EBS + Lambda • Use lambda to make EBS snaphot • Use the tags to choose the EBS • Lambda runtime can’t be a SPOF But that left us with these issues : • Only works for AWS and not other clouds or on prem • Hot Database EBS Snapshot ”may” not be consistent • Orchestration to stop a DB make your lambda spaghettis • Dev teams can’t be involved in the backup code

Slide 8

Slide 8 text

Our Next Stop: Port Forward + Cron Server • Use port-forward and run a XXDump (i.e. mongodump) • Work on cloud and on prem • Manageable by each tenant But that left us with these issues : • Cron Server becomes the SPOF • Was fragile when port forwarding broke • Two systems to manage : kube and cronsever • Secrets sharing between the 2 systems may be a security issue • Hard to share a backup blueprint between the different tenants

Slide 9

Slide 9 text

Our Next Stop: Kubernetes CronJob • Build an image for backup and call it in a Cronjob • Only one system : kube • Secret sharing is easy in the same NS • No SPOF But that left us with these issues : • Single system and cloud-native but hard to monitor for compliance • Still left with custom engineering and maintenance • Hard to get right at scale (security, snapshot costs without cleanups)

Slide 10

Slide 10 text

Our Next Stop: Kanister Kanister: A Kubernetes-native data management framework • Common infra/DB integrations including our data services • Open-source and easy to extend via simple "recipes" • Supports complex data management workflows But that left us with these issues : • Also wanted to protect Kubernetes specs that made up our app • Want better scheduling and monitoring • Multitenancy Security: If all teams share a single Kanister controller, they can access secrets from different groups • Does not support some Openshift “idioms”

Slide 11

Slide 11 text

Our Current Stop: Kasten K10 • Looking for an enterprise-grade solution Why K10 may fit our requirements: • Easy to use with a reactive GUI; CRDs under the hood • Multitenancy Security: APIs are namespaced for RBAC support • Fits our requirements around scale, retention policies, encryption, DR • Captures both application configuration and data • Extensible with Kanister for complex data management workflows

Slide 12

Slide 12 text

Thank you to all of our sponsors