$30 off During Our Annual Pro Sale. View Details »

Ray AIR: A. Scalable Toolkit for End-to-end ML Applications

Ray AIR: A. Scalable Toolkit for End-to-end ML Applications

Existing production machine learning systems often suffer from various problems that make them hard to use. For example, data scientists and ML practitioners often spend most of their time-fighting YAMLs and refactoring code to push models to production.

To address this, the Ray community has built Ray AI Runtime (AIR), an open-source toolkit for building large-scale end-to-end ML applications. By leveraging Ray’s distributed compute strata and library ecosystem, the AIR Runtime brings scalability and programmability to ML platforms.

The main focus of the Ray AI Runtime is on providing the compute layer for Python-based ML workloads and is designed to interoperate with other systems for storage and metadata needs.

In this session, we’ll explore and discuss the following:

* How AIR is different from existing ML platform tools like TFX, Sagemaker, and Kubeflow
* How AIR allows you to program and scale your machine learning workloads easily
* Interoperability and easy integration points with other systems for storage and metadata needs
* AIR’s cutting-edge features for accelerating the machine learning lifecycle such as data preprocessing, last-mile data ingestion, tuning and training, and serving at scale
Key takeaways for attendees are:

* Understand how Ray AI Runtime can be used to implement scalable, programmable machine learning workflows.
* Learn how to pass and share data across distributed trainers and Ray native libraries: Tune, Serve, Train, RLlib, etc.
* How to scale python-based workloads across supported public clouds

Anyscale
PRO

October 25, 2022
Tweet

More Decks by Anyscale

Other Decks in Technology

Transcript

  1. Ray AIR: A Scalable Toolkit for
    End-to-end ML Applications
    Richard Liaw, Anyscale
    Xiaowei Jiang, Anyscale
    SF Bay ACM Meetup 10/24/2022

    View Slide

  2. 2
    Who we are::Original creators of Ray
    What we do: Unified compute platform to develop, deploy,
    and manage scalable AI & Python applications with Ray
    Why do it: Scaling is a necessity, scaling is hard; make
    distributed computing easy and simple for everyone

    View Slide

  3. What is Ray
    → A simple/general-purpose library for distributed computing
    → A unified Python toolkit Ray AI Runtime (for scaling ML and
    more)
    → Runs on laptop, public cloud, K8s, on-premise
    3
    A layered cake of functionality and
    capabilities for scaling ML workloads

    View Slide

  4. A Layered Cake and Ecosystem
    4
    Run
    anywhere
    general-purpose
    framework for
    distributed
    computing
    Library +
    app
    ecosystem
    Ray core

    View Slide

  5. 5
    An anatomy of a Ray cluster
    Driver Worker
    Global Control Store
    (GCS)
    Scheduler
    Object Store
    Raylet
    Worker Worker
    Scheduler
    Object Store
    Raylet
    Worker Worker
    Scheduler
    Object Store
    Raylet
    … …
    Head Node Worker Node #1 Worker Node #N
    . . .
    Unique to
    Ray

    View Slide

  6. 6
    Python → Ray APIs
    def f(x):
    # do something with x:
    y = …
    return y
    @ray.remote
    def f(x):
    # do something with x:
    Y = …
    return y
    f.remote() for i in
    range(10000)
    f()
    Node

    Task Distributed
    f()
    Node
    class Cls():
    def __init__(self,
    x):
    def f(self, a):

    def g(self, a):

    Actor
    @ray.remote(num_cpus=2,
    num_gpus=4)
    class Cls():
    def __init__(self, x):
    def f(self, a):

    def g(self, a):

    cls = Cls.remote()
    cls.f.remote(a)
    del cls
    Cls
    Node

    Distributed
    Cls
    Node
    import numpy as np
    a= np.arange(1, 10e6)
    b = a * 2
    Distributed
    immutable
    object
    import numpy as np
    a = np.arange(1, 10e6)
    obj_a = ray.put(a)
    b = ray.get(obj_a) * 2 Node

    Distributed
    Node
    a a

    View Slide

  7. Ray AI Runtime (AIR) is a scalable runtime for
    end-to-end ML applications
    7

    View Slide

  8. Project Overview
    8
    Ray team has worked with ML users and infra groups at e.g., Uber, Ant,
    Shopify, Cruise, OpenAI, etc. for several years.
    AIR is an effort to synthesize lessons learned into a simple toolkit for the
    community.
    - Built on Ray's existing scalable libraries
    - Unified APIs for e2e ML
    - Simplify ML Infra

    View Slide

  9. Challenges we hear from users about ML
    Infrastructure

    View Slide

  10. 10
    Still not easy to go
    from dev to prod at
    scale.
    preprocess.py
    train.py eval.py
    run_workflow.py

    View Slide

  11. 11
    What happens when
    your ML infra gets
    out of date?
    preprocess.py
    train.py eval.py
    run_workflow.py

    View Slide

  12. 12
    Scaling is hard,
    especially for
    data scientists.
    Key Problems of existing ML infrastructure
    Platforms
    solutions can
    limit flexibility.
    But custom
    distributed apps
    are too hard.

    View Slide

  13. Analogy from a simpler time
    Filesystem
    "single sklearn script"

    View Slide

  14. What AIR Provides
    Storage and
    Tracking
    "single AIR app"
    Preprocessing
    Training
    Scoring
    Serving
    ...
    ...
    ...

    View Slide

  15. HERE IS A SECTION
    HEADER
    Introducing Ray AI
    Runtime (AIR)

    View Slide

  16. Ray AI Runtime (AIR) is a scalable toolkit for
    end-to-end ML applications
    16

    View Slide

  17. Ray AI Runtime (AIR) is a scalable toolkit for
    end-to-end ML applications
    17
    Built on Ray Core for
    open and flexible ML
    compute end-to-end.

    View Slide

  18. Ray AI Runtime (AIR) is a scalable toolkit for
    end-to-end ML applications
    18
    Built on Ray Core for
    open and flexible ML
    compute end-to-end.
    Since Ray focuses on compute, AIR
    leverages integrations for storage
    and tracking.

    View Slide

  19. Ray AI Runtime (AIR) is a scalable runtime for
    end-to-end ML applications
    19
    High-level libraries that
    make scaling easy for
    both data scientists and
    ML engineers.

    View Slide

  20. ● Non-distributed
    systems / libraries
    ● Opinionated
    distributed systems
    / libraries
    Data science team
    High Friction
    Eng team
    Easy to scale with Ray AIR
    and libraries of their choice
    Data science team Eng team
    More performant, robust
    and scalable.
    Seamless
    handoff
    Development environment Production environment
    Development environment Production environment
    With non-scalable
    libraries
    With scalable
    libraries
    Importance of scalable library layer

    View Slide

  21. Ray AI Runtime (AIR) is a scalable toolkit for
    end-to-end ML applications
    21
    Scalable integrations
    with best-of-breed
    libraries/MLOps tools

    View Slide

  22. 22
    ● Built-in integrations
    Built-in
    integrations
    Integrations
    API
    Custom
    scalable
    components
    AIR Integrations
    ● Integrations API to easily
    add integrations
    ● Custom scalable
    components can be built
    on Ray Core

    View Slide

  23. 23
    AIR simplifies scalable ML infrastructure
    Integrations with
    best-of-breed
    libraries/ MLOps tools
    A unified
    end-to-end ML
    runtime
    Make scaling easy for
    both data scientists
    and ML engineers

    View Slide

  24. When would you use Ray AIR?
    24
    Scale a single type of
    workload
    Scale end-to-end ML
    applications
    Run ecosystem libraries
    using a unified API
    Build a custom ML platform

    View Slide

  25. AIR is for the entire ML org
    25
    Scale a single type of
    workload
    Scale end-to-end ML
    applications
    Run ecosystem libraries
    using a unified API
    Build a custom ML platform
    A scalable, unified toolkit for both data
    scientists and software engineers.

    View Slide

  26. Ray AIR vs Ray Core
    26
    Ray AIR Ray Core
    Who should use... Data Scientists & ML
    Engineers
    Advanced Infra & ML
    Groups
    If you want... Easy to get started and
    Ecosystem integrations
    Customizability and
    Control

    View Slide

  27. What comes out of the box with AIR?
    27
    Training Tuning
    Batch
    Prediction
    Data
    Preprocessing
    Serving

    View Slide

  28. Scalable Data Prep and Loading
    with Ray Data
    • Dataset library built for ML
    tasks
    • Seamlessly load distributed
    data from MB to TB scale
    • Preprocessors for unified
    training<>inference
    Trainer
    Worker
    Worker
    Worker
    Worker
    Dataset
    Trainer.fit
    dataset = ray.data.read_csv(“...”)
    preprocessor = ray.data.preprocessors.MinMaxScaler(
    ["value"])
    trainer = ray.train.TorchTrainer(
    ..., preprocessor=preprocessor, dataset=dataset)

    View Slide

  29. • Single API to run the most
    popular ML training
    frameworks
    • Seamless integration with
    other AIR libraries
    Scalable Model Training
    with Ray Train
    Trainer Checkpoint
    Datasets Tuner
    trainer = ray.train.TorchTrainer(
    train_loop,
    scaling_config=ScalingConfig(
    num_workers=100, use_gpu=True)
    preprocessor=preprocessor,
    dataset=dataset)
    result = trainer.fit()

    View Slide

  30. Scalable Hyperparameter Tuning
    with Ray Tune
    • Run multiple concurrent
    Training jobs
    • Cutting edge optimization
    algorithms
    • Fault tolerance at scale
    trainer = TorchTrainer(...)
    tuner = Tuner(
    trainer,
    param_space={
    “batch_size”: tune.grid_search(
    [1, 2, 3])})
    results = tuner.fit()
    Trainer Tuner
    Trial
    Trainer
    Worker
    Worker
    Worker
    Tuner.fit

    View Slide

  31. Scalable Batch Prediction
    with AIR's BatchPredictor
    • Execute inference on
    distributed data using
    CPUs and GPUs
    • Bring your own model or
    load existing checkpoints
    from Train
    predictor = BatchPredictor.from_checkpoint(
    checkpoint, XGBoostPredictor)(...)
    results = predictor.predict(dataset)
    results.write_parquet("s3://...")
    Model
    Batch
    Predictor
    Worker
    Worker
    GPU
    Worker
    predict
    Ray Dataset
    Shard
    Shard
    Shard

    View Slide

  32. Scalable Online Inference
    with Ray Serve
    • Deploy single models as
    HA inference services in
    Ray
    • Build multi-model
    pipelines with custom
    business logic
    deployment = PredictorDeployment.options(
    name="XGBoostService")
    deployment.deploy(XGBoostPredictor,
    checkpoint, ...)
    print(deployment.url)
    Model Predictor
    Deployment
    Prediction requests

    View Slide

  33. How to use AIR?

    View Slide

  34. When would you use Ray AIR?
    34
    Scale a single type of
    workload
    Scale end-to-end ML
    applications
    Run ecosystem libraries
    using a unified API
    Build a custom ML platform

    View Slide

  35. 35
    Using Ray AIR for a single workload
    Preprocess Training
    Batch
    Prediction
    on Ray
    Data loading
    Orchestrator
    Kubernetes/Cloud
    Prediction
    Results

    View Slide

  36. Batch Prediction
    Prediction results
    Here’s an example of using Ray
    for one part of your ML pipeline.
    Scalable Batch Prediction on Ray AIR
    36
    Ray AIR
    data_url = "s3://YOUR_IMAGE_DATA”
    model = models.resnet18(pretrained=True)

    View Slide

  37. Batch Prediction
    Prediction results
    Here’s an example of using Ray
    for one part of your ML pipeline.
    Scalable Batch Prediction on Ray AIR
    37
    Ray AIR
    data_url = "s3://YOUR_IMAGE_DATA”
    model = models.resnet18(pretrained=True)
    dataset = ray.data.read_datasource(
    ImageFolderDatasource(),
    paths=[data_url])

    View Slide

  38. Batch Prediction
    Prediction results
    Here’s an example of using Ray
    for one part of your ML pipeline.
    Scalable Batch Prediction on Ray AIR
    38
    Ray AIR
    data_url = "s3://YOUR_IMAGE_DATA”
    model = models.resnet18(pretrained=True)
    dataset = ray.data.read_datasource(
    ImageFolderDatasource(),
    paths=[data_url])
    ckpt = TorchCheckpoint.from_model(model)
    predictor = BatchPredictor.from_checkpoint(
    ckpt, TorchPredictor)
    outputs = predictor.predict(
    dataset, column=["image"])

    View Slide

  39. Batch Prediction
    Prediction results
    Here’s an example of using Ray
    for one part of your ML pipeline.
    Scalable Batch Prediction on Ray AIR
    39
    Ray AIR
    data_url = "s3://YOUR_IMAGE_DATA”
    model = models.resnet18(pretrained=True)
    dataset = ray.data.read_datasource(
    ImageFolderDatasource(),
    paths=[data_url])
    ckpt = TorchCheckpoint.from_model(model)
    predictor = BatchPredictor.from_checkpoint(
    ckpt, TorchPredictor)
    outputs = predictor.predict(
    dataset, column=["image"])
    outputs.write_s3(...)

    View Slide

  40. Compared to SageMaker Batch
    Inference…
    → Create Airflow DAG
    → Create Docker Image
    → Test Locally
    → Push docker image to ECR
    → Decide how many machines
    → Partition work across all machines
    → Copy files from S3 to local
    → Read all results from machines
    → Collate results
    → Tear all down
    Ray AIR vs Sagemaker Batch Inference
    Ray AIR in 3 steps
    → Start a Ray cluster
    → Submit your Python script
    → [Maybe] Shut down your Ray
    cluster

    View Slide

  41. When would you use Ray AIR?
    41
    Scale a single type of
    workload
    Scale end-to-end ML
    applications
    Run ecosystem libraries
    using a unified API
    Build a custom ML platform

    View Slide

  42. 42
    Using Ray AIR for E2E ML Workflows
    Hyperparameter
    Tuning
    on Ray
    Batch
    Prediction
    on Ray
    Distributed
    Training
    on Ray
    Data
    Processing
    on Ray

    View Slide

  43. Scalable Data Processing
    (Ray Data)
    Using Ray AIR to scale E2E ML Workflows
    43
    dataset = ray.data.read_csv(...)
    train_ds, valid_ds = train_test_split(
    dataset, test_size=0.3)
    test_ds = valid_ds.drop_columns(["target"])
    preprocessor = StandardScaler(columns=["mean radius"])

    View Slide

  44. Scalable Data Processing
    (Ray Data)
    Using Ray AIR to scale E2E ML Workflows
    44
    dataset = ray.data.read_csv(...)
    train_ds, valid_ds = train_test_split(
    dataset, test_size=0.3)
    test_ds = valid_ds.drop_columns(["target"])
    preprocessor = StandardScaler(columns=["mean radius"])
    Scalable Model Training
    (Ray Train)
    trainer = ray.train.xgboost.XGBoostTrainer(
    scaling_config=ScalingConfig(num_workers=128),
    label_column="target",
    datasets=dict(train=train_ds, valid=valid_ds},
    preprocessor=preprocessor)
    result = trainer.fit()

    View Slide

  45. Scalable Data Processing
    (Ray Data)
    Using Ray AIR to scale E2E ML Workflows
    45
    dataset = ray.data.read_csv(...)
    train_ds, valid_ds = train_test_split(
    dataset, test_size=0.3)
    test_ds = valid_ds.drop_columns(["target"])
    preprocessor = StandardScaler(columns=["mean radius"])
    Scalable Model Training
    (Ray Train)
    trainer = ray.train.xgboost.XGBoostTrainer(
    scaling_config=ScalingConfig(num_workers=128),
    label_column="target",
    datasets=dict(train=train_ds, valid=valid_ds},
    preprocessor=preprocessor)
    result = trainer.fit()
    Scalable Model Tuning
    (Ray Tune)
    tuner = ray.tune.Tuner(
    trainer,
    param_space={"params": {"max_depth": tune.randint(1, 9)}},
    tune_config=TuneConfig(
    num_samples=5, metric="logloss", mode="min"),
    )
    checkpoint = tuner.fit().get_best_result().checkpoint

    View Slide

  46. Scalable Data Processing
    (Ray Data)
    Using Ray AIR to scale E2E ML Workflows
    46
    dataset = ray.data.read_csv(...)
    train_ds, valid_ds = train_test_split(
    dataset, test_size=0.3)
    test_ds = valid_ds.drop_columns(["target"])
    preprocessor = StandardScaler(columns=["mean radius"])
    Scalable Model Training
    (Ray Train)
    trainer = ray.train.xgboost.XGBoostTrainer(
    scaling_config=ScalingConfig(num_workers=128),
    label_column="target",
    datasets=dict(train=train_ds, valid=valid_ds},
    preprocessor=preprocessor)
    result = trainer.fit()
    Scalable Model Tuning
    (Ray Tune)
    tuner = ray.tune.Tuner(
    trainer,
    param_space={"params": {"max_depth": tune.randint(1, 9)}},
    tune_config=TuneConfig(
    num_samples=5, metric="logloss", mode="min"),
    )
    checkpoint = tuner.fit().get_best_result().checkpoint
    Scalable Batch Prediction
    (Predictors)
    batch_predictor = BatchPredictor.from_checkpoint(
    checkpoint, XGBoostPredictor)
    predicted_probabilities = batch_predictor.predict(test_ds)
    predicted_probabilities.show()

    View Slide

  47. Scalable Data Processing
    (Ray Data)
    Using Ray AIR to scale E2E ML Workflows
    47
    Scalable Model Training
    (Ray Train)
    Scalable Model Tuning
    (Ray Tune)
    Scalable Batch Prediction
    (Predictors)
    dataset = ray.data.read_csv(...)
    train_ds, valid_ds = train_test_split(
    dataset, test_size=0.3)
    test_ds = valid_dataset.drop_columns(["target"])
    preprocessor = StandardScaler(columns=["mean radius"])
    trainer = ray.train.xgboost.XGBoostTrainer(
    scaling_config=ScalingConfig(num_workers=128),
    label_column="target",
    datasets=dict(train=train_ds, valid=valid_ds},
    preprocessor=preprocessor)
    result = trainer.fit()
    tuner = ray.tune.Tuner(
    trainer,
    param_space={"params": {"max_depth": tune.randint(1, 9)}},
    tune_config=TuneConfig(
    num_samples=5, metric="logloss", mode="min"),
    )
    checkpoint = tuner.fit().get_best_result().checkpoint
    batch_predictor = BatchPredictor.from_checkpoint(
    checkpoint, XGBoostPredictor)
    predicted_probabilities = batch_predictor.predict(test_ds)
    predicted_probabilities.show()
    Scale out to a cluster
    with 1 line change.

    View Slide

  48. When would you use Ray AIR?
    48
    Scale a single type of
    workload
    Scale end-to-end ML
    applications
    Run ecosystem libraries
    using a unified API
    Build a custom ML platform

    View Slide

  49. 49
    Integrations with:
    ● Data Ecosystem
    ● ML frameworks
    ● Optimization Libraries
    ● Model Monitoring
    ● Model Serving

    View Slide

  50. Custom Integrations with Ray AIR
    50
    Scalable Data
    Preprocessing
    (Ray Data)
    Scalable Model
    Training
    (Ray Train)
    Scalable Model Tuning
    (Ray Tune)
    Scalable Batch
    Prediction
    (Predictors)
    from ray.train import DataParallelTrainer
    class JaxTrainer(DataParallelTrainer):
    # define custom training logic
    trainer = JaxTrainer(dataset={..})
    trainer.fit()
    from ray.air.callbacks import Callback
    class CustomMLflowTracker(Callback):
    # define custom training logic
    tuner = Tuner(trainer,
    run_config=RunConfig(
    callback=CustomMLflowTracker())
    tuner.fit()
    from ray.data.datasource import Datasource
    class DeltaLakeDatasource(Datasource):
    # define custom data source
    ds.read_datasource(DeltaLakeDatasource(...))
    ds.write_datasource(DeltaLakeDatasource(...))

    View Slide

  51. When would you use Ray AIR?
    51
    Scale a single type of
    workload
    Scale end-to-end ML
    applications
    Run ecosystem libraries
    using a unified API
    Build a custom ML
    platform

    View Slide

  52. 52
    Case study: Merlin at Shopify

    View Slide

  53. 53
    Using Ray AIR as the compute core for your
    ML platform
    Ray AIR
    program

    View Slide

  54. Ray AIR
    program
    54
    Using Ray AIR as the compute core for your
    ML platform
    KubeRay / Anyscale
    Ray AIR
    program
    Ray AIR
    program

    View Slide

  55. Ray AIR
    program
    55
    Using Ray AIR as the compute core for your
    ML platform
    KubeRay / Anyscale
    Ray AIR
    program
    Ray AIR
    program
    Model
    Registry
    Monitoring
    Experiment
    Tracking
    Feature
    Store
    Lakehouse
    Notebook
    Service
    Job
    Scheduler

    View Slide

  56. Current status with AIR

    View Slide

  57. Ray AIR Roadmap
    In Ray 2.0, Ray AIR is released as beta
    In future releases, we plan to add:
    ● Improved integrations with data sources, feature stores, and model
    monitoring services
    ● Improved scalability and performance benchmarks for
    data-intensive use cases
    57

    View Slide



  58. 58
    Users are excited about Ray AIR!
    I’d say the productivity at least doubled if not more… I can’t wait
    until AIR is released. I’m using the nightly builds and it’s already a
    massive productivity boost.
    - Data Scientist from large music streaming company
    Ray AIR has greatly improved my developer experience … through
    intuitive abstractions like Ray Datasets and Train. In two weeks I
    was able to recreate and outperform a data ingest pipeline I built
    by hand over the course of 6 months.
    - ML Engineer at telematics startup

    View Slide

  59. - Chat with the developers on the Ray Slack (#air-dogfooding
    channel!)
    - Come talk afterwards -- maybe we can form a recurring meetup in
    Seattle!
    How to get involved?
    59

    View Slide

  60. Thank you.
    Contact: rliaw @ anyscale.com (@richliaw)

    View Slide