Slide 1

Slide 1 text

Anyscale Overview Welcome to the Bay Area Ray Meetup! March 2, 2022 Ray Train, TorchX, and Distributed Deep Learning with Ray

Slide 2

Slide 2 text

Agenda: Virtual Meetup Welcome Remarks, Introduction, Announcements: Jules S Damji, Anyscale Talk 1: Ray Train: Production-ready Distributed Deep Learning, Will Drevo, Amog Kamsetty, & Matthew Deng, Anyscale Inc Talk 2: Large Scale Distributed Training with TorchX and Ray, Mark Saroufim, Meta AI & PyTorch Engineering

Slide 3

Slide 3 text

Production RL Summit MARCH 29 - VIRTUAL - FREE A reinforcement learning event for practitioners Ben Kasper Sumitra Ganesh Sergey Levine Marc Weber Volkmar Sterzing Adam Kelloway ORGANIZED BY Register: https://tinyurl.com/mr9rd32h

Slide 4

Slide 4 text

Instructor: Sven Mika, Lead maintainer, RLlib HANDS-ON TUTORIAL Contextual Bandits & RL with RLlib Learn how to apply cutting edge RL in production with RLlib. Tutorial covers: ● Brief overview of RL concepts. ● Train and tune contextual bandits and SlateQ algorithm ● Offline RL using cutting-edge algos ● Deploy RL models into a live service $75 $50 (use code MEETUP50) Register: https://tinyurl.com/mr9rd32h $75 $50 Use code MEETUP50 Production RL Summit MARCH 29 - VIRTUAL A reinforcement learning event for practitioners ORGANIZED BY

Slide 5

Slide 5 text

No content

Slide 6

Slide 6 text

Anyscale Overview Ray Train: A high-level library for deep learning training Amog Kamsetty Matthew Deng Will Drevo

Slide 7

Slide 7 text

Anyscale Overview Overview I. Problems in DL training II. Ray Train A. Simple scaling B. Flexible C. High level API, low level optimizations III. Roadmap for H1 2022 IV. CIFAR-10 deep learning demo

Slide 8

Slide 8 text

Anyscale Overview Problems in DL training Training time, distributed systems, and finding the right framework

Slide 9

Slide 9 text

Ray Train: https://tinyurl.com/ray-train; reach out at [email protected] 1. Training takes too long 2. Too much data to fit in one node 3. Large models that do not fit on one device Problems seen today in Deep Learning

Slide 10

Slide 10 text

Ray Train: https://tinyurl.com/ray-train; reach out at [email protected] OK, so we go distributed. Now what? We see a bunch of new problems! ● Managing a new infra stack ● Rewriting all your training code ● Dealing with increased cost ● Setting up new optimizations ● Tuning hyperparameters Also, it is horribly painful from a developer’s perspective! Machine 1 Machine 2 Machine 4 Machine 3

Slide 11

Slide 11 text

● is easy to onboard onto. ● abstracts away the infrastructure. ● gives me extremely fast iteration speed. ● allows me to use affordable GPUs from any cloud provider. ● integrates well in an end-to-end machine learning pipeline. How do we fix the problems? My ideal solution is a tool that…

Slide 12

Slide 12 text

Anyscale Overview Deep learning framework tradeoffs Production-readiness Ease of development Heavyweight Inflexible, hard to customize Lightweight Nimble, but unscalable Ray Train

Slide 13

Slide 13 text

Anyscale Overview Ray Train Our approach

Slide 14

Slide 14 text

Ray Train: https://tinyurl.com/ray-train; reach out at [email protected] What is Ray Train? Ray Train Distributed Training Deep Learning Frameworks Compute Data Processing Model Tuning / Serving

Slide 15

Slide 15 text

Flexible High-level API, Low-level optimizations + Easy scaling Ray Train

Slide 16

Slide 16 text

DL: division of labor DL frameworks like PyTorch, Horovod, Tensorflow do a great job at: • NN modules, components, & patterns • Writing training loops • Gradient communication protocols • Having great developer community Ray’s strengths are: • Managing compute • Anticipating and scheduling around data locality and constraints • Seamless way to distribute Python • Distributed systems Ray Train is the union of these different competencies!

Slide 17

Slide 17 text

Anyscale Overview Simple scaling Easy integration, runs anywhere, on anything

Slide 18

Slide 18 text

Ray Train: easy as 1, 2, 3 Scale up in your code, not in your infrastructure from ray import train def train_func(): … # existing model and data loader setup model = train.torch.prepare_model(model) dataloader = train.torch.prepare_data_loader(dataloader) for _ in range(num_epochs): … # training loop trainer = Trainer(backend="torch", num_workers=4) trainer.start() results = trainer.run(train_func) trainer.shutdown() Step 1: Put training code in one function Step 2: Wrap you model and dataset Step 3: Create your Trainer and run!

Slide 19

Slide 19 text

Ray Train: https://tinyurl.com/ray-train; reach out at [email protected] With Ray, moving from local to cluster is as easy as: And scaling up your workload is even easier! Move between laptop & cluster, CPU & GPU, (un)distributed Multi-Node and Multi-GPU trainer = Trainer(backend="torch", num_workers=1) trainer = Trainer(backend="torch", num_workers=100, use_gpu=True) $ ray up cluster-config.yaml $ ray job submit cluster-config.yaml -- python my_script.py

Slide 20

Slide 20 text

Train iteratively, or in production Method 1: Ray Client (docs) Method 2: Ray Jobs (docs) script.py Good for longer running jobs (ie: “close the laptop”), or production jobs Good for interactive runs, can use ipdb, or Ray debugger Ray cluster $ ray up my_cluster.yaml $ ray job submit -- “python script.py” ray.init("ray://:") # or use on managed service, like Anyscale #ray.init("anyscale://")

Slide 21

Slide 21 text

Anyscale Overview Flexible Programmatic, function interface, batteries included

Slide 22

Slide 22 text

ML training: upstream and downstream ETL, feature preprocessing (“SQL land”) ML/DL training “Last mile” data ingest (“Dataloaders”) Hyperparameter Tuning Model Serving, A/B testing Monitoring Ray Ray Tune Ray Train Ray Serve Ray Datasets

Slide 23

Slide 23 text

The Ray Train Ecosystem! Datasets Workflows Compute Cluster User ML apps Ray Core Ray Train Ray Distributed Libraries Your app/library here! 3rd party training libraries

Slide 24

Slide 24 text

Distributed Data Loading with Ray Datasets 1. Sharded datasets: easily split data across workers 2. Windowed datasets: train on data larger than RAM 3. Pipelined Execution: keep your GPUs fully saturated 4. Global shuffling: improve model accuracy

Slide 25

Slide 25 text

Hyperparameter Optimization with Ray Tune Perform distributed hyperparameter tuning / training in 2 lines of code! trainable = trainer.to_tune_trainable(train_func) analysis = tune.run(trainable, config=...)

Slide 26

Slide 26 text

Ray Tune: code example Choose from state of the art searchers, and search over anything you can parameterize... Easily load your model from a checkpoint in cloud storage later!

Slide 27

Slide 27 text

Track your training with Tensorboard

Slide 28

Slide 28 text

Find training bottlenecks with PyTorch profiler!

Slide 29

Slide 29 text

Building ML platforms on top of Ray...

Slide 30

Slide 30 text

Anyscale Overview High-level API, low-level optimizations UX stays great, more optimizations added all the time

Slide 31

Slide 31 text

First: Ray Train underneath the hood Worker 1 Worker 2 Worker 3 Worker 4 Trainer

Slide 32

Slide 32 text

High level API!

Slide 33

Slide 33 text

High level API! Multi-GPU Spot instances DeepSpeed fp16 Zero copy reads ... future optimizations here

Slide 34

Slide 34 text

High level API! Multi-GPU Spot instances DeepSpeed fp16 Zero copy reads ... future optimizations here Tensor communication handled by DL frameworks

Slide 35

Slide 35 text

DL: division of labor DL frameworks like PyTorch, Horovod, Tensorflow do a great job at: • NN modules, components, & patterns • Writing training loops • Gradient communication protocols • Having great developer community Ray’s strengths are: • Managing compute • Anticipating and scheduling around data locality and constraints • Seamless way to distribute Python • Distributed systems Ray Train is the union of these different competencies!

Slide 36

Slide 36 text

Anyscale Overview Ray Train: Roadmap What to look forward to this year

Slide 37

Slide 37 text

Ray Train H1 2022 Roadmap Q1, 2022: ● Elastic training, better metrics/results handling, model parallelism ● DeepSpeed, fp16 support ● Integrations: W&B, PyTorch profiler ● Unified ML API alpha Q2, 2022: ● Better checkpointing ● Advanced operations: GNNs, parameter servers, benchmarking ● Unified ML API beta

Slide 38

Slide 38 text

Anyscale Overview Demo: Training a deep learning classification model with Pytorch Training a Pytorch image model at scale

Slide 39

Slide 39 text

Start learning Ray and contributing … Getting Started: pip install ray Documentation (docs.ray.io) Quick start example, reference guides, etc Join Ray Meetup Revived in Jan 2022. Next meetup March 2nd. Meetup each month and publish recording to the members https://www.meetup.com/Bay-Area-Ray-Meetup/ Forums (discuss.ray.io) Learn / share with broader Ray community, including core team Ray Slack Connect with the Ray team and community Social Media (@raydistrtibuted, @anyscalecompute) Follow us on Twitter and linkedIn