Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Dive into Kubeflow

Dive into Kubeflow

Presented 5/10/2018 and Cloud Native Easy Bay Meetup at Adobe

Lachlan Evenson

May 10, 2018
Tweet

More Decks by Lachlan Evenson

Other Decks in Technology

Transcript

  1. None
  2. OpenAI Scaling Kubernetes to 2,500 Nodes https://blog.openai.com/scaling-kubernetes-to-2500-nodes/

  3. None
  4. Foundations Containers, Kubernetes Why? 1. Typical ML development workflow and

    some of its shortcomings 2. How can a distributed system like Kubernetes help us improve this flow? Labs aka.ms/kubeflow-labs
  5. How? Setup a Kubernetes cluster (acs-engine/AKS) Demos 1. Running a

    basic Docker container 2. Running a TensorFlow job with GPU 3. JupyterHub 4. Distributed TensorFlow 5. Hyper-parameter sweeping 6. TensorFlow Serving
  6. None
  7. None
  8. None
  9. https://github.com/Azure/kubeflow-labs/tree/master/1-docker

  10. OpenAI - Building the Infrastructure that Powers the Future of

    AI
  11. None
  12. Azure/acs-engine github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler/cloudprovider/azure https://docs.microsoft.com/en-us/azure/aks/gpu-cluster

  13. https://github.com/Azure/kubeflow-labs/tree/master/2-kubernetes

  14. • Kubeflow project is dedicated to making deployments of machine

    learning workflows on Kubernetes simple, portable and scalable. • https://github.com/kubeflow/kubeflow
  15. https://github.com/Azure/kubeflow-labs/tree/master/4-kubeflow https://github.com/Azure/kubeflow-labs/tree/master/6-tfjob

  16. • multi-user Hub • spawns, manages, and proxies multiple instances

    of the single-user Jupyter notebook server.
  17. https://github.com/Azure/kubeflow-labs/tree/master/5-jupyterhub

  18. !

  19. https://github.com/Azure/kubeflow-labs/tree/master/7-distributed-tensorflow

  20. None
  21. Andrej Karpathy's Image painting demo

  22. https://github.com/Azure/kubeflow-labs/tree/master/8-hyperparam-sweep

  23. • Provides out-of-the-box integration with TensorFlow models • Multiple models

    (or multiple versions of the same model) can be served simultaneously
  24. https://github.com/Azure/kubeflow-labs/tree/master/9-serving

  25. one) Solution is Kubernetes: • Highly Scalable • Easy to

    explore hyper-parameters space • Easy to do distributed training But really, Data Scientists shouldn’t have to care about containers, kubernetes and all that stuff • Pachyderm can version datasets and trigger new trainings when changes occur • Distributed File Systems • NFS • HDFS • … Classic DevOps solutions: • Containers • CI/CD • Autoscaling • A/B testing and canary release of Models • Comparing Production accuracy vs expected accuracy when possible • Rolling-updates • …
  26. aka.ms/kubeflow-labs