Slide 1

Slide 1 text

Cluster State Management Michael Hausenblas, Red Hat @mhausenblas London, 2017-09-08

Slide 2

Slide 2 text

$ whoami ● Working on distributed systems in the past 20 years, containers some 4+ years ● Web data & big data (research, MapR) ● Containers and container orchestrators (Mesosphere, Red Hat) ● Developer turned ops: C++, Java, Python, Node.js and since around 2014 a Gopher @mhausenblas

Slide 3

Slide 3 text

Motivation

Slide 4

Slide 4 text

Kubernetes ops/dev use cases ● Saving money ● Troubleshooting ● Auditing ● Billing ● Capacity planning ● Upgrading ● Restore ● Disaster Recovery Read more about use cases here.

Slide 5

Slide 5 text

Cluster state — state in the cluster

Slide 6

Slide 6 text

Some terms we’ll be using … ● State (static vs. dynamic) ● Artefacts (files, records, etc.) ● Levels (system vs. app)

Slide 7

Slide 7 text

Scope

Slide 8

Slide 8 text

Example write path

Slide 9

Slide 9 text

State of the art of backup & restore in Kubernetes land

Slide 10

Slide 10 text

The community view ● kubernetes/kubernetes#24229: Backup/migrate cluster? ● kubernetes/kubernetes#21582: Kubectl needs export and import commands ● Two schools of thought: ○ ‘Replay all from repo’ ○ ‘Backups are necessary/useful’

Slide 11

Slide 11 text

Available solutions ● Initially, only prod-ready solution was backup & restore with etcdctl ● kubernetes-incubator/bootkube (control plane) ● pieterlange/kube-backup (resource state sync to Git inspired by RANCID) ● heptio/ark: disaster recovery utility (cluster resources & persistent volumes) ● kaptaind/kaptaind: intra-cluster sync for specific resources ● ReShifter (more in a moment)

Slide 12

Slide 12 text

Cluster state snapshots: levels of abstraction ● raw etcd data (WAL, log snapshots) → etcdctl backup ● etcd API → ReShifter ● Kubernetes API server → Heptio Ark, kaptaind

Slide 13

Slide 13 text

Challenges ● Need to take care of app-level backups/restores separately ● Which system-level cluster state should be recovered? ● Multitenancy (for example: OpenShift online) ● Disaster Recovery: RTO/RPO ● Low-level: encryption, access rights, etc.

Slide 14

Slide 14 text

Case study: ReShifter reshifter.info

Slide 15

Slide 15 text

What is ReShifter? ● A library: github.com/mhausenblas/reshifter/pkg ● A CLI tool (rcli) ● A Web app (K8S deployment + svc + UI)

Slide 16

Slide 16 text

ReShifter architecture

Slide 17

Slide 17 text

ReShifter walkthrough

Slide 18

Slide 18 text

Next steps?

Slide 19

Slide 19 text

Where do we go from here? ● Review use cases, evaluate solutions, provide feedback ● Let me know if you’re interested in contributing ● Maybe form a Kubernetes Incubator?

Slide 20

Slide 20 text

Resources ● Kubernetes Deep Dive: API Server – Part 2 https://blog.openshift.com/kubernetes-deep-dive-api-server-part-2/ ● ReShifter: Architecture, design considerations and prior art https://github.com/mhausenblas/reshifter/blob/master/docs/architecture.md

Slide 21

Slide 21 text

@mhausenblas reshifter.info openshift.com plus.google.com/+RedHat linkedin.com/company/red-hat youtube.com/user/RedHatVideos facebook.com/redhatinc twitter.com/RedHatNews