Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Making Distributed Computing Easy (Ion Stoica, Anyscale)

Making Distributed Computing Easy (Ion Stoica, Anyscale)

Ion Stoica, co-founder, executive chairman & president, Anyscale, highlights the product developments on the Anyscale platform, spotlights a few customer stories, and provides a sneak preview into the future.

Anyscale

July 14, 2021
Tweet

More Decks by Anyscale

Other Decks in Technology

Transcript

  1. Distributed apps are becoming the norm This Talk 01 Building

    distributed apps is very hard 02 Ray and Anyscale make developing, deploying and managing distributed apps easy 03
  2. 01 02 Apps increasingly incorporate AI AI workloads are becoming

    distributed Distributed apps becoming the norm
  3. 01 02 Apps increasingly incorporate AI AI workloads are becoming

    distributed Distributed apps becoming the norm 01
  4. 01 02 Apps increasingly incorporate AI AI workloads are becoming

    distributed Distributed apps becoming the norm 01 02
  5. 35x every 18 m onths 2020 GPT-3 Growing gap between

    demand and supply Moore’s Law (2x every 18 months) CPU https://openai.com/blog/ai-and-compute/
  6. 35x every 18 m onths 2020 GPT-3 Specialized hardware is

    not enough Moore’s Law (2x every 18 months) CPU https://openai.com/blog/ai-and-compute/ GPU* TPU * No way out but to distribute these apps!
  7. 01 02 Apps becoming more and more complex Development ⇒

    production is challenging Building distributed apps very hard!
  8. 01 02 Apps becoming more and more complex Development ⇒

    production is challenging Building distributed apps very hard! 01
  9. Improve inference accuracy in fast changing environments Examples: recommendations, financial

    predictions, resource allocations Data Ingestion & Featurization Training Serving Online learning
  10. Solution: stitch together a bunch of distributed systems Data Ingestion

    & Featurization Training Serving Online learning Serve
  11. Training Serving Serve Simulations Environment Agent State/ reward Action Examples:

    industry automation, self-driving, trading & finance, system optimizations, recommendations, etc. Solution: stitch together a bunch of distributed systems (e.g., Facebook’s Horizon) Reinforcement learning
  12. Training Serving Simulations Environment Agent Solution: build it from scratch

    (e.g. DeepMind’s Acme) Reinforcement learning State/ reward Action
  13. Backend Business Logic Serving / Inference request reply Serving Solution:

    stitch together a bunch of distributed systems Backend: Business Logic & Inference
  14. High performance, but very expensive • Time • People Few

    companies can afford, e.g., Google, Facebook, ... Challenges with building from scratch
  15. Hard to develop: different APIs Hard to deploy & manage:

    impedance mismatch Slow: high overhead of moving data between different systems Data Processing Training Serving Hyper. Tuning Business Logic Simulations Serving KFServing Challenges with stitching together
  16. Data Processing Training Serving Hyper. Tuning Others Ray ecosystem +

    Native universal framework for distributed computing Business Logic
  17. Data Processing Training Serving Hyper. Tuning Others Ray ecosystem +

    Native Best ecosystem of distributed libraries Instead of stitching systems, call libraries in same system Easy to develop, manage, and deploy Business Logic
  18. 01 02 Apps becoming more and more complex Development ⇒

    production is challenging Building distributed apps very hard! 01 02
  19. Edit Run Debug Test / staging Deploy Development Production What

    do developers want ? Develop on your laptop Test and deploy on the cluster/cloud Laptop Cluster/cloud
  20. Edit Run Debug Test / staging Deploy Development Production Develop

    on your laptop? Huge barrier - Local development and tests cannot reproduce cluster deployment - Need to package/dockerize app
  21. Edit Run Debug Test / staging Deploy Development Production Develop

    on the cluster? Hard to develop on the cluster/cloud: no good tools, slow to launch nodes, expensive.
  22. universal framework for distributed computing Data Processing Training Serving Hyper.

    Tuning Business Logic & Simulations Others Ray ecosystem + Native
  23. Edit Run Debug Test / staging Deploy Development Production Anyscale:

    Best of both worlds Laptop development experience and cloud scale
  24. Edit Run Debug Development Development: Infinite laptop ... then transparently

    move to the cloud Like your laptop but with “infinite” resources!
  25. Infinite laptop: How? import ray ray.client().connect() ... >python ray_prog.py >RAY_ADDRESS=”anyscale://“

    python ray_prog.py ray_prog.py run on laptop run in the cloud 1. Sync up local environment and files to the cloud 2. Run program in the cloud with no code changes NEW NEW
  26. 1. Sync up local environment and files to the cloud

    2. Run program in the cloud with no code changes 3. Serverless experience Development Infinite laptop: How? NEW NEW Edit Run Debug
  27. 1. Sync up local environment and files to the cloud

    2. Run program in the cloud with no code changes 3. Serverless experience 4. Debug programs like on your laptop Development Infinite laptop: How? NEW NEW NEW Edit Run Debug
  28. Development → Production Edit Run Debug Test / staging Deploy

    Development Production 1. App packaging NEW
  29. Development → Production Edit Run Debug Test / staging Deploy

    Development Production 1. App packaging 2. SDK & REST APIs NEW NEW
  30. Development → Production Edit Run Debug Test / staging Deploy

    Development Production 1. App packaging 2. SDK & REST APIs 3. Monitoring & observability NEW NEW