Slide 1

Slide 1 text

Kubernetes Storage: Current Capabilities and Future Opportunities September 25, 2018 Saad Ali & Nikhil Kasinadhuni Google

Slide 2

Slide 2 text

Agenda ● Google & Kubernetes ● Kubernetes Volume Subsystem ● Container Storage Interface (CSI) ● Untapped Opportunities ● Q&A

Slide 3

Slide 3 text

Google & Kubernetes

Slide 4

Slide 4 text

“Google is living a few years in the future and sends the rest of us messages,” -- Doug Cutting, Hadoop founder, 2013 WWGD?

Slide 5

Slide 5 text

Humble Beginnings

Slide 6

Slide 6 text

Humble Beginnings Google File System

Slide 7

Slide 7 text

No content

Slide 8

Slide 8 text

Compute Compute Engine App Engine Container Engine Container Registry Cloud Functions Networking Cloud Virtual Network Cloud Load Balancing Cloud CDN Cloud Interconnect Cloud DNS Big Data BigQuery Cloud Dataflow Cloud Dataproc Cloud Datalab Cloud Pub/Sub Genomics Storage and Databases Cloud Storage Cloud Bigtable Cloud Datastore Cloud SQL Cloud Spanner Identity & Security Cloud IAM Cloud Resource Manager Cloud Security Scanner Key Management Service BeyondCorp Data Loss Prevention Identity-Aware Proxy Security Key Enforcement Persistent Disk Machine Learning Cloud Machine Learning Cloud Vision API Cloud Speech API Cloud Natural Language API Cloud Translation API Cloud Jobs API Networking

Slide 9

Slide 9 text

Cattle Not Pets

Slide 10

Slide 10 text

No content

Slide 11

Slide 11 text

Compute Compute Engine App Engine Container Engine Container Registry Cloud Functions Networking Cloud Virtual Network Cloud Load Balancing Cloud CDN Cloud Interconnect Cloud DNS Big Data BigQuery Cloud Dataflow Cloud Dataproc Cloud Datalab Cloud Pub/Sub Genomics Storage and Databases Cloud Storage Cloud Bigtable Cloud Datastore Cloud SQL Cloud Spanner Identity & Security Cloud IAM Cloud Resource Manager Cloud Security Scanner Key Management Service BeyondCorp Data Loss Prevention Identity-Aware Proxy Security Key Enforcement Persistent Disk Machine Learning Cloud Machine Learning Cloud Vision API Cloud Speech API Cloud Natural Language API Cloud Translation API Cloud Jobs API Networking

Slide 12

Slide 12 text

No content

Slide 13

Slide 13 text

No content

Slide 14

Slide 14 text

Kubernetes Storage Layer

Slide 15

Slide 15 text

What do these words mean and how do they fit together? Flex CSI In-tree Out-of-tree Persistent Volumes Persistent Volume Claims Local Storage Classes Dynamic Provisioning Driver Plugin Volume Block File Object Remote Ephemeral Stateful Stateless

Slide 16

Slide 16 text

Kubernetes Principle Workload Portability

Slide 17

Slide 17 text

Kubernetes: Workload Portability Kubernetes Goal ● Abstract away cluster details ● Decouple apps from infrastructure To enable users to ● Write once, run anywhere (workload portability!) ● Avoid vendor lock-in

Slide 18

Slide 18 text

Kubernetes: Workload Portability Node 1 App 1 Kubernetes Cluster Kernel/OS Hardware Node 3 Kernel/OS Hardware Node 2 Kernel/OS Hardware App 2 App 3 App 4

Slide 19

Slide 19 text

Kubernetes: Workload Portability GCE Instance 1 App 1 Kubernetes Cluster Kernel/OS Hardware GCE Instance 3 Kernel/OS Hardware GCE Instance 2 Kernel/OS Hardware App 2 App 3 App 4

Slide 20

Slide 20 text

Kubernetes: Workload Portability EC2 Instance 1 App 1 Kubernetes Cluster Kernel/OS Hardware EC2 Instance 3 Kernel/OS Hardware EC2 Instance 2 Kernel/OS Hardware App 2 App 3 App 4

Slide 21

Slide 21 text

Kubernetes: Workload Portability Bare Metal 1 App 1 Kubernetes Cluster Kernel/OS Hardware Bare Metal 3 Kernel/OS Hardware Bare Metal 2 Kernel/OS Hardware App 2 App 3 App 4

Slide 22

Slide 22 text

Kubernetes: Workload Portability Node 1 App 1 Kubernetes Cluster Kernel/OS Hardware Node 3 Kernel/OS Hardware Node 2 Kernel/OS Hardware App 2 App 3 App 4 apiVersion: apps/v1 kind: ReplicaSet metadata: name: frontend spec: replicas: 2 template: spec: containers: - name: php-redis image: gcr.io/google_samples/gb-frontend:v3

Slide 23

Slide 23 text

Kubernetes: Workload Portability Node 1 App 1 Kubernetes Cluster Kernel/OS Hardware Node 3 Kernel/OS Hardware Node 2 Kernel/OS Hardware App 2 App 3 App 4 Frontend Pod Replica 1 Frontend Pod Replica 2

Slide 24

Slide 24 text

Problem with Containers and State What about stateful apps? Pod and ReplicaSet abstract compute and memory. 1. Containers are ephemeral: no way to persist state ○ Container termination/crashes result in loss of data ○ Can’t run stateful applications 2. Containers can’t share data between each other. Consumers Content Manager File Puller Web Server Pod

Slide 25

Slide 25 text

Challenges with Abstracting Storage ● Time series databases ○ InfluxDB, Graphite, etc. ● File Storage ○ NFS, SMB, etc. ● Block Storage ○ GCE PD, AWS EBS, iSCSI, Fibre Channel, etc. ● File on Block Storage ● And more! So many different types of storage ● Object Stores ○ AWS S3, GCE GCS, etc. ● SQL Databases ○ MySQL, SQL Server, Postgres, etc. ● NoSQL Databases ○ MongoDB, ElasticSearch, etc. ● Pub Sub Systems ○ Apache Kafka, Google Cloud Pub/Sub, AWS SNS, etc. What do we focus on?

Slide 26

Slide 26 text

What do we focus on? Out of scope: ● Object Stores ○ AWS S3, GCE GCS, etc. ● SQL Databases ○ MySQL, SQL Server, Postgres, etc. ● NoSQL Databases ○ MongoDB, ElasticSearch, etc. ● Pub Sub Systems ○ Apache Kafka, Google Cloud Pub/Sub, AWS SNS, etc. ● Time series databases ○ InfluxDB, Graphite, etc. ● etc. In scope: ● File Storage ○ NFS, SMB, etc. ● Block Storage ○ GCE PD, AWS EBS, iSCSI, Fibre Channel, etc. ● File on Block Storage

Slide 27

Slide 27 text

What do we focus on? Out of scope: ● Object Stores ○ AWS S3, GCE GCS, etc. ● SQL Databases ○ MySQL, SQL Server, Postgres, etc. ● NoSQL Databases ○ MongoDB, ElasticSearch, etc. ● Pub Sub Systems ○ Apache Kafka, Google Cloud Pub/Sub, AWS SNS, etc. ● Time series databases ○ InfluxDB, Graphite, etc. ● etc. In scope: ● File Storage ○ NFS, SMB, etc. ● Block Storage ○ GCE PD, AWS EBS, iSCSI, Fibre Channel, etc. ● File on Block Storage Data Path Standardized (Posix, SCSI) Data Path Not Standardized, yet

Slide 28

Slide 28 text

Kubernetes Volume Plugins A way to reference block device or mounted filesystem (possibly with some data in it) Accessible by all containers in pod Volume plugins specify ● How volume is setup in pod ● Medium that backs it Lifetime of volume is same as the pod or longer Consumers Content Manager File Puller Web Server Pod

Slide 29

Slide 29 text

Kubernetes has many volume plugins Remote Storage ● GCE Persistent Disk ● AWS Elastic Block Store ● Azure File Storage ● Azure Data Disk ● Dell EMC ScaleIO ● iSCSI ● Flocker ● NFS ● vSphere ● GlusterFS ● Ceph File and RBD ● Cinder ● Quobyte Volume ● FibreChannel ● VMware Photon PD Kubernetes Volume Plugins Ephemeral Storage ● EmptyDir ● Expose Kubernetes API ○ Secret ○ ConfigMap ○ DownwardAPI Local ● Host path ● Local Persistent Volume (Beta) Out-of-Tree ● Flex (exec a binary) ● CSI (Beta) ● Other

Slide 30

Slide 30 text

Temp scratch file space from host machine Data exists only for lifecycle of pod. Can only be referenced “in-line” in pod definition not via PV/PVC. Volume Plugin: EmptyDir Ephemeral Storage Consumers Content Manager File Puller Web Server EmptyDir Pod

Slide 31

Slide 31 text

Temp scratch file space from host machine Data exists only for lifecycle of pod. Can only be referenced “in-line” in pod definition not via PV/PVC. Volume Plugin: EmptyDir Ephemeral Storage apiVersion: v1 kind: Pod metadata: name: test-pod spec: containers: - image: k8s.gcr.io/container1 name: container1 volumeMounts: - mountPath: /shared name: shared-scratch-space - image: k8s.gcr.io/container2 name: container2 volumeMounts: - mountPath: /shared name: shared-scratch-space volumes: - name: shared-scratch-space emptyDir: {}

Slide 32

Slide 32 text

Ephemeral Storage Built on top of EmptyDir: ● Secret Volume ● ConfigMap Volume ● DownwardAPI Volume Populate Kubernetes API as files in to an EmptyDir

Slide 33

Slide 33 text

Kubernetes Principle Meet the user where they are

Slide 34

Slide 34 text

Ephemeral Storage Built on top of EmptyDir: ● Secret Volume ● ConfigMap Volume ● DownwardAPI Volume Populate Kubernetes API as files in to an EmptyDir

Slide 35

Slide 35 text

Data persists beyond lifecycle of any pod Referenced in pod either in-line or via PV/PVC Examples: ● GCE Persistent Disk ● AWS Elastic Block Store ● Azure Data Disk ● iSCSI ● NFS ● GlusterFS ● Cinder ● Ceph File and RBD ● And more! Remote Storage

Slide 36

Slide 36 text

Remote Storage Kubernetes will automatically: ● Attach volume to node ● Mount volume to pod apiVersion: v1 kind: Pod metadata: name: sleepypod spec: volumes: - name: data gcePersistentDisk: pdName: panda-disk fsType: ext4 containers: - name: sleepycontainer image: gcr.io/google_containers/busybox command: - sleep - "6000" volumeMounts: - name: data mountPath: /data readOnly: false

Slide 37

Slide 37 text

Remote Storage Kubernetes will automatically: ● Attach volume to node ● Mount volume to pod apiVersion: v1 kind: Pod metadata: name: sleepypod spec: volumes: - name: data gcePersistentDisk: pdName: panda-disk fsType: ext4 containers: - name: sleepycontainer image: gcr.io/google_containers/busybox command: - sleep - "6000" volumeMounts: - name: data mountPath: /data readOnly: false

Slide 38

Slide 38 text

Kubernetes Principle Workload Portability

Slide 39

Slide 39 text

Remote Storage Pod yaml is no longer portable across clusters!! apiVersion: v1 kind: Pod metadata: name: sleepypod spec: volumes: - name: data gcePersistentDisk: pdName: panda-disk fsType: ext4 containers: - name: sleepycontainer image: gcr.io/google_containers/busybox command: - sleep - "6000" volumeMounts: - name: data mountPath: /data readOnly: false

Slide 40

Slide 40 text

Persistent Volumes & Persistent Volume Claims PersistentVolume and PersistentVolumeClaim Abstraction Decouples storage implementation from storage consumption

Slide 41

Slide 41 text

PersistentVolume apiVersion: v1 kind: PersistentVolume metadata: name : myPV2 spec: accessModes: - ReadWriteOnce capacity: storage: 100Gi persistentVolumeReclaimPolicy: Retain gcePersistentDisk: fsType: ext4 pdName: panda-disk2 apiVersion: v1 kind: PersistentVolume metadata: name : myPV1 spec: accessModes: - ReadWriteOnce capacity: storage: 10Gi persistentVolumeReclaimPolicy: Retain gcePersistentDisk: fsType: ext4 pdName: panda-disk

Slide 42

Slide 42 text

PersistentVolumeClaim apiVersion: v1 kind: PersistentVolumeClaim metadata: name: mypvc namespace: testns spec: accessModes: - ReadWriteOnce resources: requests: storage: 100Gi

Slide 43

Slide 43 text

PV to PVC Binding $ kubectl create -f pv.yaml persistentvolume "pv1" created persistentvolume "pv2" created $ kubectl get pv NAME CAPACITY ACCESSMODES STATUS CLAIM REASON AGE pv1 10Gi RWO Available 1m pv2 100Gi RWO Available 1m $ kubectl create -f pvc.yaml persistentvolumeclaim "mypvc" created $ kubectl get pv NAME CAPACITY ACCESSMODES STATUS CLAIM REASON AGE pv1 10Gi RWO Available 3m pv2 100Gi RWO Bound testns/mypvc 3m

Slide 44

Slide 44 text

Remote Storage Volume referenced via PVC Pod YAML is portable across clusters again!! apiVersion: v1 kind: Pod metadata: name: sleepypod spec: volumes: - name: data gcePersistentDisk: pdName: panda-disk fsType: ext4 containers: - name: sleepycontainer image: gcr.io/google_containers/busybox command: - sleep - "6000" volumeMounts: - name: data mountPath: /data readOnly: false volumes: - name: data persistentVolumeClaim: claimName: mypvc

Slide 45

Slide 45 text

Dynamic Provisioning Cluster admin pre-provisioning PVs is painful and wasteful. Dynamic provisioning creates new volumes on-demand (when requested by user). Eliminates need for cluster administrators to pre-provision storage.

Slide 46

Slide 46 text

Dynamic Provisioning Dynamic provisioning “enabled” by creating StorageClass. StorageClass defines the parameters used during creation. StorageClass parameters opaque to Kubernetes so storage providers can expose any number of custom parameters for the cluster admin to use. kind: StorageClass apiVersion: storage.k8s.io/v1 metadata: name: slow provisioner: kubernetes.io/gce-pd parameters: type: pd-standard -- kind: StorageClass apiVersion: storage.k8s.io/v1 metadata: name: fast provisioner: kubernetes.io/gce-pd parameters: type: pd-ssd

Slide 47

Slide 47 text

Dynamic Provisioning Users consume storage the same way: PVC “Selecting” a storage class in PVC triggers dynamic provisioning apiVersion: v1 kind: PersistentVolumeClaim metadata: name: mypvc namespace: testns spec: accessModes: - ReadWriteOnce resources: requests: storage: 100Gi storageClassName: fast

Slide 48

Slide 48 text

Dynamic Provisioning $ kubectl create -f storage_class.yaml storageclass "fast" created $ kubectl create -f pvc.yaml persistentvolumeclaim "mypvc" created $ kubectl get pvc --all-namespaces NAMESPACE NAME STATUS VOLUME CAPACITY ACCESSMODES AGE testns mypvc Bound pvc-331d7407-fe18-11e6-b7cd-42010a8000cd 100Gi RWO 6s $ kubectl get pv pvc-331d7407-fe18-11e6-b7cd-42010a8000cd NAME CAPACITY ACCESSMODES RECLAIMPOLICY STATUS CLAIM REASON AGE pvc-331d7407-fe18-11e6-b7cd-42010a8000cd 100Gi RWO Delete Bound testns/mypvc 13m

Slide 49

Slide 49 text

Dynamic Provisioning Volume referenced via PVC apiVersion: v1 kind: Pod metadata: name: sleepypod spec: volumes: - name: data persistentVolumeClaim: claimName: mypvc containers: - name: sleepycontainer image: gcr.io/google_containers/busybox command: - sleep - "6000" volumeMounts: - name: data mountPath: /data readOnly: false

Slide 50

Slide 50 text

Hostpath Volumes Expose a directory on the host machine to pod What happens if your pod is moved to a different node? Don't use hostpath (unless you know what you are doing)!!

Slide 51

Slide 51 text

Expose a local block or file as a PersistentVolume Reduced durability Useful for building distributed storage systems Useful for high performance caching Kubernetes takes care of data gravity Referenced via PV/”PVC so workload portability is maintained Local Persistent Volumes

Slide 52

Slide 52 text

In-Tree Volume Plugins Kubernetes “In-tree” Volume Plugins are awesome =) Powerful abstraction for file and block storage Automate provisioning, attaching, mounting, and more! Storage portability via PV/PVC/StorageClass objects

Slide 53

Slide 53 text

In-Tree Volume Plugins Kubernetes “In-tree” Volume Plugins are painful =( ● Painful for Kubernetes Developers ○ Testing and maintaining external code ○ Bugs in volume plugins affect critical Kubernetes components ○ Volume plugins get full privileges of kubernetes components (kubelet and kube-controller-manager) ● Painful for Storage Vendors ○ Dependent on Kubernetes releases ○ Source code forced to be open source

Slide 54

Slide 54 text

Out-of-Tree Volume Plugins Container Storage Interface (CSI) - Beta in v1.10; Targeting GA in v1.13 ● Follows in the steps of CRI and CNI ● Collaboration with other cluster orchestration systems ● CSI makes Kubernetes volume layer truly extensible ● Plugins may be containerized Flex Volumes ● Legacy attempt at out-of-tree ● Exec based ● Deployment difficult ● Doesn't support clusters with no master access

Slide 55

Slide 55 text

Untapped Opportunities

Slide 56

Slide 56 text

Legacy Software Local Execution Edge / IoT Cloud bursting Ecommerce site Catalog, ERP Warehouse Factory Branch Augmented Services On-Prem Cloud Cloud Storage Cloud ML Big Query Jurisdictional / PII Europe Secure records US IT policy Application Portability

Slide 57

Slide 57 text

Snapshot Portability

Slide 58

Slide 58 text

Unified Observability

Slide 59

Slide 59 text

Uniform Management

Slide 60

Slide 60 text

“The most profound technologies are those that disappear. They weave themselves into the fabric of everyday life until they are indistinguishable from it." - Mark Weiser, The Computer for the 21st Century

Slide 61

Slide 61 text

Questions? Get Involved! ● Container Storage Interface Community ○ github.com/container-storage-interface/community ○ Meeting every week, Wednesdays at 9 AM (PT) ○ [email protected] ● Kubernetes Storage Special-Interest-Group (SIG) ○ github.com/kubernetes/community/tree/master/sig-storage ○ Meeting every 2 weeks, Thursdays at 9 AM (PST) ○ [email protected]