Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Kubernetes for Data Engineers

Kubernetes for Data Engineers

The talk will give an introduction to Kubernetes in general and then focus on topics relevant to Data Engineers: in particular, we will talk about how to run stateful workloads on Kubernetes and how to run Machine Learning workloads that use GPUs on Kubernetes.


Rohit Agarwal

April 13, 2018

More Decks by Rohit Agarwal

Other Decks in Technology


  1. Running stateful applications YARN: MapReduce, Hive, Spark etc. Rest of

    workloads: bespoke deployments. Siloed clusters and underutilization. No standard and management pain.
  2. StatefulSet Stable, unique network identifiers. Stable, persistent storage. Ordered, graceful

    deployment and scaling. Ordered, graceful termination. Ordered, automated rolling updates. Built-in, no need to reinvent.
  3. GPUs in Kubernetes Support for NVIDIA GPUs. Support for scheduling

    any device (GPUs, FPGAs, Infiniband etc.)
  4. Recap Stateless > Deployment and ReplicaSet Simple stateful > StatefulSet

    Distributed databases > Operators Spark/Airflow > Native integration ML > Kubeflow