$30 off During Our Annual Pro Sale. View Details »

Cloud Native Machine Learning

Cloud Native Machine Learning

East Bay Cloud Native meetup. Apr, 2019

Other talks at http://dharmeshkakadia.github.io/talks

dharmeshkakadia

April 18, 2019
Tweet

More Decks by dharmeshkakadia

Other Decks in Technology

Transcript

  1. Mobile Data Labs & Microsoft
    Dharmesh Kakadia

    View Slide

  2. Mobile Data Labs & Microsoft
    • Sr. Applied Scientist/Software Engineer,
    MobileDataLabs (acquired team inside Microsoft)
    team building AI/Analytics platform
    • Spent couple of years with Microsoft Research
    • Spent couple of years with Azure HDInsight
    • Among other things, author of “Apache Mesos
    Essentials”
    • Opinions are mine and biased
    • You can find me as @dharmeshkakadia
    everywhere
    • Slides @ : http://dharmeshkakadia.github.io/talks/
    whoami

    View Slide

  3. Mobile Data Labs & Microsoft
    MileIQ: Mileage Tracking Made Easy
    Automatic
    Detection
    Easy Classification
    Personalized
    Experience
    Robust Reporting
    MileIQ.com

    View Slide

  4. Mobile Data Labs & Microsoft
    Machine Learning

    View Slide

  5. Mobile Data Labs & Microsoft
    Why do you need an AI platform?

    View Slide

  6. What is a platform ?
    Mobile Data Labs & Microsoft
    Platform

    View Slide

  7. Mobile Data Labs & Microsoft
    • Already a widely adopted standard for deploying. We are just extending it to Data/AI
    • Operational benefits for free – monitoring, CI/CD, secrets, alerts, log management ….
    • No separate process and tools for data and other parts of engineering
    • Ability to leverage latest improvements faster
    • Better resource utilization
    • Helps avoiding data silos
    • Not have to worry about installing Nvidia drivers…
    • With a caveat, that you need a little more cross functional expertise
    Why build AI platform on k8s?

    View Slide

  8. Mobile Data Labs & Microsoft
    • Drive quality measurement
    • Dexter: Signals and Metrics Framework
    • Marketing and Engagement analytics
    • Data management – GDPR, schema etc.
    • Data science and Experimentation
    • User segmentation
    • …
    • …Runs billions of tasks a day
    • Used by engineering, marketing, data science teams
    • SQL, Python, Spark, Pandas, Tensorflow, Jupyter
    AI/Analytics Platform : Our current use cases

    View Slide

  9. Mobile Data Labs & Microsoft
    AI/Analytics Platform : Tools

    View Slide

  10. AI/Analytics Platform : Process
    Mobile Data Labs & Microsoft
    Dev
    Work inner loop
    development inside
    Juypter notebook.
    Write docker +
    YAML file when
    ready for PR
    Build
    Takes Dockerfile
    and build and
    pushes image to
    container registry
    with build tags.
    Release
    Combines YAML file
    and secrets applies
    on k8s cluster
    Monitor and
    visualize
    Produces output
    data, models and
    results. That is used
    for further
    analysis/decision
    making. No special
    ops required.

    View Slide

  11. AI/Analytics Platform : Guiding Principles and Tradeoffs
    • Optimize for agility & turn around time
    • Covers all the use cases – ETL, Streaming, ML, Visualization
    • Covers full life cycle – Dev, Deployment, Monitoring, Alerting, and so on
    • Full API access – connect to all the tools you wish
    • Get the cutting edge features (latest versions etc.)
    • Easy to use
    • Permission-less or self-serve
    • Somewhat future proof
    • Open Source and Linux friendly
    • Cloud Native/friendly
    • Best practice enforcing by default
    • Gall's Law – start with simple system that works and evolve
    Mobile Data Labs & Microsoft

    View Slide

  12. Mobile Data Labs & Microsoft
    • Docker as a single build tool
    • Consistent deploys – even more useful in data experiments
    • Freedom to use any library/versions I want
    • k8s as a single deployment tool
    • Easier to think about for everyone on the team
    • East to remember (and optimize!) one pattern and workflow
    • Separation of concerns
    • Build/ops tools doesn’t need to understand how TF work
    • YAML for separating code and configs.
    • Secrets for code and secrets.
    • Blobfuse for code and data paths.
    Declarative ML deployments

    View Slide

  13. Mobile Data Labs & Microsoft
    Volumes
    • Blobfuse
    • k8s volume plugin that makes blob data accessible as a mounted file system
    • Great for inner loop dev. Avoids additional IO to remote storage.
    • Allows read-only or read-write mounting
    • Not every tool needs to understand and integrate with blob
    • Easy when playing around with data rather than dealing with blob explorers
    • Configurable cache interval allows trading off fast access/freshness constraints.
    • Hostpath
    • Local SSD for temporary storage
    • Great for speed and intermediate results
    • Azure files
    • For permanent storage for fast non-blob data

    View Slide

  14. Mobile Data Labs & Microsoft
    Example End-to-End pipeline
    • Tensorflow for model
    training and serving
    • Spark for feature
    engineering
    • Kubernetes & related tools
    for deployment
    • Data lives on blob and DW

    View Slide

  15. Mobile Data Labs & Microsoft
    Demo time !
    https://github.com/dharmeshkakadia/demos/

    View Slide

  16. FAQ : Serving
    • Early days
    • Currently we store and serve directly out of blob
    • Versioned though names
    • Want to validate and understand use cases to help us to choose the right tool
    • Need something that plays nice with other data tools as well i.e. spark etc.
    • We are considering onnx and tensor-serve

    View Slide

  17. FAQ : Kubeflow?
    • We evaluated very early version 0.1.0 and had a lot of issues with it
    • Opinionated and bundles a lot of tools that we currently don’t need
    • Ksonnet L
    • End user simplicity is paramount for us
    • We like to start with simple tools and add tools as necessary vs big bang complex pieces
    • Gall’s law
    • Having said that,
    • Huge fan of the community.
    • We are keeping an eye on its direction
    • We do like some parts – especially TF job operator. We already use spark operator and realize the
    benefits.

    View Slide

  18. Mobile Data Labs & Microsoft
    Thanks !
    @dharmeshkakadia

    View Slide