Machine learning on Kubernetes

0aa2ebd008cdd198af5e9765062bb265?s=47 dharmeshkakadia
August 17, 2018
270

Machine learning on Kubernetes

Kubernetes SF meetup. Aug, 2018

Other talks at http://dharmeshkakadia.github.io/talks

0aa2ebd008cdd198af5e9765062bb265?s=128

dharmeshkakadia

August 17, 2018
Tweet

Transcript

  1. Mobile Data Labs & Microsoft Dharmesh Kakadia

  2. Mobile Data Labs & Microsoft • Applied Scientist/Software Engineer, MobileDataLabs

    (acquired team inside Microsoft) team building AI/Analytics platform • Spent couple of years with Microsoft Research • Spent couple of years with Azure HDInsight • Among other things, author of “Apache Mesos Essentials” • Opinions are mine and biased • You can find me as @dharmeshkakadia everywhere whoami
  3. Mobile Data Labs & Microsoft Machine Learning

  4. Mobile Data Labs & Microsoft Why do you need an

    AI platform?
  5. Mobile Data Labs & Microsoft • Already a widely adopted

    standard for deploying. We are just extending it to Data/AI • Operational benefits for free – monitoring, CI/CD, secrets, alerts, log management …. • No separate process and tools for data and other parts of engineering • Ability to leverage latest improvements faster • Better resource utilization • Helps avoiding data silos • Not have to worry about installing Nvidia drivers… • With a caveat, that you need a little more cross functional expertise Why build AI platform on k8s?
  6. Mobile Data Labs & Microsoft Our AI/Analytics Platform Architecture

  7. Lifecycle Mobile Data Labs & Microsoft Dev Work inner loop

    development inside Juypter notebook. Write docker + YAML file when ready for PR Build Takes Dockerfile and build and pushes image to container registry with build tags. Release Combines YAML file and secrets applies on k8s cluster Monitor and visualize Produces output data, models and results. That is used for further analysis/decision making. No special ops required.
  8. Mobile Data Labs & Microsoft Example End-to-End pipeline • Tensorflow

    for model training and serving • Spark for feature engineering • Kubernetes & related tools for deployment • Data lives on blob and DW
  9. Mobile Data Labs & Microsoft • Docker as a single

    build tool • Consistent deploys – even more useful in data experiments • Freedom to use any library/versions I want • k8s as a single deployment tool • Easier to think about for everyone on the team • East to remember (and optimize!) one pattern and workflow • Separation of concerns • Build/ops tools doesn’t need to understand how TF work • YAML for separating code and configs. • Secrets for code and secrets. • Blobfuse for code and data paths. Declarative deployments
  10. Mobile Data Labs & Microsoft Volumes • Blobfuse • k8s

    volume plugin that makes blob data accessible as a mounted file system • Great for inner loop dev. Avoids additional IO to remote storage. • Allows read-only or read-write mounting • Not every tool needs to understand and integrate with blob • Easy when playing around with data rather than dealing with blob explorers • Configurable cache interval allows trading off fast access/freshness constraints. • Hostpath • Local SSD for temporary storage • Great for speed and intermediate results • Azure files • For permanent storage for fast non-blob data
  11. Mobile Data Labs & Microsoft Demo time !

  12. Serving • Early days. • Currently we store and serve

    directly out of blob • Versioned though names • Want to validate and understand use cases to help us to choose the right tool • Need something that plays nice with other data tools as well i.e. spark etc. • We are considering onnx and tensor-serve
  13. Kubeflow? • We evaluated very early version 0.1.0 and had

    a lot of issues with it • Opinionated and bundles a lot of tools that we currently don’t need • Ksonnet business is messy • End user simplicity is paramount • Start with simple tools and add tools as necessary. Don’t like big bang complex pieces • Too much upfront complexity, before we can get the value • Having said that, • Huge fan of the community. • We are keeping an eye on direction its going. • We do like some parts – especially TF job operator. We already use spark operator and realize the benefits.
  14. Mobile Data Labs & Microsoft Thanks ! @dharmeshkakadia