Kubernetes for Data Engineers

Kubernetes for Data Engineers

The talk will give an introduction to Kubernetes in general and then focus on topics relevant to Data Engineers: in particular, we will talk about how to run stateful workloads on Kubernetes and how to run Machine Learning workloads that use GPUs on Kubernetes.

https://www.brighttalk.com/webcast/15789/321823/kubernetes-for-data-engineers

1b8e71d44dfd35e111e7642c284169dd?s=128

Rohit Agarwal

April 13, 2018
Tweet

Transcript

  1. 6.

    Running stateful applications YARN: MapReduce, Hive, Spark etc. Rest of

    workloads: bespoke deployments. Siloed clusters and underutilization. No standard and management pain.
  2. 9.

    StatefulSet Stable, unique network identifiers. Stable, persistent storage. Ordered, graceful

    deployment and scaling. Ordered, graceful termination. Ordered, automated rolling updates. Built-in, no need to reinvent.
  3. 14.

    GPUs in Kubernetes Support for NVIDIA GPUs. Support for scheduling

    any device (GPUs, FPGAs, Infiniband etc.)
  4. 15.

    Recap Stateless > Deployment and ReplicaSet Simple stateful > StatefulSet

    Distributed databases > Operators Spark/Airflow > Native integration ML > Kubeflow