Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Ray_Essentials__Introduction_to_Ray_for_machine_learning.pdf

Anyscale
November 02, 2023

 Ray_Essentials__Introduction_to_Ray_for_machine_learning.pdf

Anyscale

November 02, 2023
Tweet

More Decks by Anyscale

Other Decks in Programming

Transcript

  1. Ray Essentials: Introduction to Ray
    for machine learning
    Jules S. Damji, Ray Team @Anyscale
    X: @2twitme
    LinkedIn: https://www.linkedin.com/in/dmatrix/

    View full-size slide

  2. A quick poll…

    View full-size slide

  3. $whoami
    ● Lead Developer Advocate, Anyscale & Ray Team
    ● Sr. Developer Advocate, Databricks, Apache Spark/MLflow Team
    ● Led Developer Advocacy, Hortonworks
    ● Held SWE positions:
    ○ Sun Microsystems
    ○ Netscape
    ○ @Home
    ○ Loudcloud/Opsware
    ○ Verisign

    View full-size slide

  4. Who do I work for …
    Who we are::Original creators of Ray, a unified general-purpose
    framework for scalable distributed computing
    What we do: Scalable compute for AI as managed service, with Ray at
    its core, and the best platform to develop & run AI apps
    Why we do it: Scaling is a necessity, scaling is hard; make distributed
    computing easy and simple for everyone

    View full-size slide

  5. Anyscale Platform for AI Infra
    Anyscale cloud infrastructure
    Ray
    Data Train Tune RL Serve
    Workspaces Jobs Services
    Serving Fine-tuning
    Anyscale Endpoints OS LLMs

    View full-size slide

  6. 🗓 Today’s agenda
    ● Why & What’s Ray & Ray Ecosystem
    ● Ray Architecture & Components
    ● Ray Core Design & Scaling Patterns & APIs
    ● Demo
    ● Wrap up…

    View full-size slide

  7. Why Ray + What’s Ray

    View full-size slide

  8. Why Ray?
    Machine
    learning is
    pervasive
    Distributed
    computing is a
    necessity
    Python is the
    default language
    for DS/ML

    View full-size slide

  9. Blessings of scale ….

    View full-size slide

  10. Blessings of scale ….
    1. Model size are getting larger
    - Model size is exponentially increasing.
    - Models are too large to into a single GPU.
    - We need to shard the models across multiple
    GPUs for training
    - e.g. ZeRO, Model Parallel, Pipeline Parallel
    BERT(2019): 336M params(1.34GB)
    Llama-2 (2023): 70B params(280GB) ~20x
    GPT-4: ~1800B >5,000x

    View full-size slide

  11. Supply demand-problem
    11
    11
    35x every 18 m
    onths
    2020-2023
    GPT-[3, 4]
    CPU
    https://openai.com/blog/ai-and-compute/
    GPU*
    TPU
    *
    Llama 2,
    Falcon,
    PaLM etc…

    View full-size slide

  12. Supply demand-problem
    12
    12
    35x every 18 m
    onths
    2020-2023
    GPT-[3, 4]
    CPU
    https://openai.com/blog/ai-and-compute/
    GPU*
    TPU
    *
    Llama 2,
    Falcon,
    PaLM etc…
    No way out but to distribute!

    View full-size slide

  13. Python DS/ML Ecosystem

    View full-size slide

  14. What’s Ray ?
    ● A simple/general-purpose library for distributed computing
    ● An ecosystem of Python Ray AI libraries (for scaling ML & more)
    ● Runs on laptop, public cloud, K8s, on-premise
    ● Easy to install and get started …. pip install ray[default]
    A layered cake of functionality and
    capabilities for scaling ML workloads

    View full-size slide

  15. A layered cake and ecosystem
    15
    Ray AI Libraries enable simple scaling of AI workloads.

    View full-size slide

  16. 16
    A Layered Cake and Ecosystem
    Ray AI Libraries enable simple scaling of AI workloads.

    View full-size slide

  17. AI libraries

    View full-size slide

  18. Ray architecture & components

    View full-size slide

  19. Anatomy of a Ray cluster
    Driver Worker
    Global Control Store
    (GCS)
    Scheduler
    Object Store
    Raylet
    Worker Worker
    Scheduler
    Object Store
    Raylet
    Worker Worker
    Scheduler
    Object Store
    Raylet
    … …
    Head Node Worker Node #1 Worker Node #N
    . . .
    Unique to
    Ray

    View full-size slide

  20. Anatomy of a Ray worker process

    View full-size slide

  21. Creation of a Ray task …
    # Driver code

    A_ref = A.remote()
    # Worker 1 code

    B = B.remote()
    return B

    View full-size slide

  22. Creation of a Ray object …

    View full-size slide

  23. Creation of a Ray Actor …
    handle_A =
    ActorA.remote()
    handle_B=
    ActorB.remote()

    View full-size slide

  24. Actor creation sequence …

    View full-size slide

  25. Ray core design & scaling
    patterns

    View full-size slide

  26. Ray basic design pattern
    ● Ray Parallel Tasks
    ○ Functions as stateless units of execution
    ○ Functions distributed across the cluster as tasks
    ● Ray Objects as Futures
    ○ Distributed (immutable objects) store in the cluster
    ○ Fetched when materialized
    ○ Enable massive asynchronous parallelism
    ● Ray Actors
    ○ Stateful service on a cluster
    ○ Enable Message passing
    ● Patterns for Parallel Programming
    ● Ray Distributed Library Integration Patterns

    View full-size slide

  27. Scaling design patterns
    Different data / Same function Same data / Different function
    Compute
    Data
    Batch Training / Inference AutoML Batch Tuning
    Different data / Same function
    Different hyperparam per job
    ...
    ...

    View full-size slide

  28. 28
    Python → Ray API
    28
    def f(x):
    # do something with x:
    y = …
    return y
    v = f(x)
    @ray.remote
    def f(x):
    # do something with x:
    y = …
    return y
    v_ref= f.remote(x)
    f()
    Node

    Task Distributed
    f()
    Node
    class Cls():
    def
    __init__(self, x):
    def f(self, a):

    def g(self, a):

    Actor
    @ray.remote
    class Cls():
    def
    __init__(self, x):
    def f(self, a):

    def g(self, a):

    cls = Cls.remote()
    cls.f.remote(a)
    Cls
    Node

    Distributed
    Cls
    Node
    import numpy as np
    a= np.arange(1, 10e6)
    b = a * 2
    Distributed
    immutable
    object
    import numpy as np
    a = np.arange(1, 10e6)
    obj_a = ray.put(a)
    b = ray.get(obj_a) * 2 Node

    Distributed
    Node
    a a

    View full-size slide

  29. Ray Task
    def f(a, b):
    return a + b
    f(1, 2)
    f(2, 3)
    f(4, 5)
    @ray.remote(num_cpus=3)
    f.remote(1, 2) # returns 3
    f.remote(2, 3) # returns 5
    f.remote(3, 4) # returns 7
    f.remote(4, 5) # returns 9
    A function remotely executed in a cluster Result = 3
    Result = 5
    f(3, 4)
    Result = 7
    Result = 9

    View full-size slide

  30. 30
    Ray Actor
    class HostActor:
    def __init__(self):
    self.model = load_model(“s3://model_checkpoint”)
    self.num_devices = os.environ["CUDA_VISIBLE_DEVICES"]
    def inference(self, data):
    return self.model(data)
    def f(self, output):
    return f"{output} {self.num_devices}"
    A class remotely executed in a cluster
    @ray.remote(num_gpus=4)
    actor = HostActor.remote() # Create an actor
    actor.f.remote("hi") # returns "hi 0,1,2,3"
    actor.inference(input) # returns predictions…
    Host Host
    client client
    method method
    local states local states

    View full-size slide

  31. Distributed objects

    View full-size slide

  32. Distributed object store
    Aa = read_array(file1)
    b = read_array(file2)
    a_x = ray.put(a)
    b_x = ray.put(b)
    s_x = sum.remote(a_x, b_x)
    val = ray.get(s_x)
    print(val)
    aaa
    a b s
    sum()
    Shared object
    store

    View full-size slide

  33. When to use Ray

    View full-size slide

  34. When to use Ray & Ray AI Libraries?
    Scale a single type of
    workload
    ● Data ingestion for ML
    ● Batch Inference at scale
    ● Distributed Training
    ● Only serving or online inference
    Scale end-to-end ML
    applications
    Run ecosystem libraries
    using a unified API
    Build a custom ML
    platform
    ● Spotify, Instacart
    ● Pinterest & DoorDash
    ● Samsara & Niantic
    ● Uber Eats & LinkedIn
    Data Train Tune Serve

    View full-size slide

  35. When to use Ray & Ray AI Libraries?
    Scale a single type of workload
    ● Data ingestion for ML
    ● Batch Inference at scale
    ● Distributed Training

    View full-size slide

  36. Ray for Data ingest
    ● Ray Datasets as a common data format
    ● Easily read from disk/cloud, or from other formats (images, CVS, Parquet, HF etc)
    ● Fully distributed
    ○ Can handle data too big to fit on one node or even the entire cluster
    Trainer
    Worker
    Worker
    Worker
    Worker
    Dataset
    Trainer.fit

    View full-size slide

  37. Ray Data overview
    High performance distributed IO
    ds = ray.data.read_parquet("s3://some/bucket")
    ds = ray.data.read_csv("/tmp/some_file.csv")
    Leverages Apache Arrow’s
    high-performance IO
    Parallelized using Ray’s
    high-throughput task execution or
    actor pool execution
    Scales to PiB-scale jobs in production
    (Amazon)
    Read from storage
    Transform data
    ds = ds.map_batches(batch_func)
    ds = ds.map(func)
    ds.iter_batches() -> Iterator
    ds.write_parquet("s3://some/bucket")
    Consume data

    View full-size slide

  38. Simple batch inference example
    Using user defined functions (UDFs)
    Logical data flow:
    CPU CPU

    View full-size slide

  39. A simple batch inference example

    View full-size slide

  40. Multi-stage (heterogeneous) pipeline
    Read Preprocess Inference Save
    GPU
    CPU

    View full-size slide

  41. Heterogeneous pipeline (CPU + GPU)

    View full-size slide

  42. Ray Train: Distributed ML/DL training
    Ray Train is a library for
    developing, orchestrating, and scaling
    distributed deep learning applications.

    View full-size slide

  43. Scaling across cluster …
    Ray Train: Integrates with deep
    learning frameworks

    View full-size slide

  44. When to use Ray & Ray AI Libraries?
    Scale end-to-end ML
    applications
    Data Train Tune Serve

    View full-size slide

  45. Ray for end-to-end ML application
    Storage and
    Tracking
    Preprocessing
    Training
    Scoring
    Serving
    …`
    ...
    ...
    Ray AI libraries can provide that … single end-to-end application

    View full-size slide

  46. Who’s using Ray ….
    24,000+
    GitHub
    5,000+
    Depend on Ray
    1,000+
    Organizations
    Using Ray
    27,000+
    GitHub
    stars
    5,000+
    Repositories
    Depend on Ray
    870+
    Community
    Contributors

    View full-size slide

  47. Who’s using Ray ….
    24,000+
    GitHub
    5,000+
    Depend on Ray
    1,000+
    Organizations
    Using Ray
    27,000+
    GitHub
    stars
    5,000+
    Repositories
    Depend on Ray
    870+
    Community
    Contributors

    View full-size slide

  48. 🍎 Recap: Today we learned…
    🚂 Why Ray & What’s Ray Ecosystem
    ● Architecture components
    ● When to use Ray
    ● Who uses Ray at scale
    🔬 Ray Design & Scaling Patterns & APIs
    ● Ray Tasks, Actors, Objects

    View full-size slide

  49. 🔗 Reading list.
    Ray Education GitHub
    Access bonus notebooks and scripts about Ray.
    Ray documentation
    API references and user guides.
    Anyscale Blogs
    Real world use cases and announcements.
    YouTube Tutorials
    Video walkthroughs about learning LLMs with Ray.

    View full-size slide

  50. 🔗 Resources
    ● How to fine tune and serve LLMs simply, quickly and cost effectively using Ray +
    DeepSpeed + HuggingFace
    ● Get started with DeepSpeed and Ray
    ● Training 175B Parameter Language Models at 1000 GPU scale with Alpa and Ray
    ● Fast, flexible, and scalable data loading for ML training with Ray Data
    ● Ray Serve: Tackling the cost and complexity of serving AI in production
    ● Scaling Model Batch Inference in Ray: Using Actors, ActorPool, and Ray Data
    ● Fine-Tuning Llama-2: A Comprehensive Case Study for Tailoring Models to Unique
    Applications (part-1)
    ● Fine-Tuning LLMs: LoRA or Full-Parameter? An in-depth Analysis with Llama 2
    (part-2)

    View full-size slide