Simplify and scale your XGBoost model using Ray on Anyscale

Scaling XGBoost with Ray Phi Nguyen ML Solution Architect, Anyscale
Antoni Baum ML Software Engineer, Anyscale

1. State of Machine Learning Platform 2. Next Generation ML
Platform with Ray & Anyscale 3. Simplify and Scale XGBoost on Ray 4. Demo 5. Questions Agenda

Distributed apps will become the norm The World Today “AI
is the new electricity” - Andrew Ng Distributed computing necessary to achieve the promise of ML - The end of Moore’s Law - Data growing faster than compute MLOps is very hard and is a key initiative for many CDOs/CTOs as a competitive differentiator.

Distributed apps will become the norm ML ecosystem moving at
a breakneck pace - new frameworks, new algorithms Lack of a universal framework for distributed computing - often custom built or stitched together Developers are stuck managing infrastructure instead of building applications Fundamental Challenges

State of ML Platform

Uber Michelangelo in 2019

Existing ML Pipelines Preprocess Save data Train

Existing ML Pipelines

• Performance overheads ◦ Serialization/Deserialization ◦ Data materialized to external
storage • Implementation/Operational Complexity ◦ Impedance mismatch: Cross-languages, cross-workload ◦ CPUs vs GPUs • Missing operations ◦ Per-epoch shuffling ▪ How to do a fast, in-memory, distributed shuffle? • MLOps ◦ Often requires bespoke ML CI/CD tooling Challenges

Simplify and Scale your ML Pipeline on Ray

• A simple and general library for distributed computing ◦
Single machine or 1000s of nodes ◦ Agnostic to the type of work • An ecosystem of libraries (for scaling ML and more) ◦ e.g. Ray RLlib, Ray Train, Ray Tune, Ray Serve • Tools for launching clusters on any cloud provider What is Ray?

Native Libraries 3rd Party Libraries Your app here! Universal framework
for distributed computing Run anywhere Library + app ecosystem

Ray Users 13000+ Repositories Depend on Ray 1600+ Open Source
Contributors 449+ Who uses Ray?

Ray Users 13000+ Repositories Depend on Ray 1600+ Open Source
Contributors 449+ Growth of Ray open-source 19K stars as of 1/2/22

Anyscale The best way to develop, scale, and deploy AI
apps on Ray

Supercharge your Ray journey on Anyscale Accelerate time to market
Enterprise ready Observability Get full visibility into your Ray workloads Multi-Cloud Diversify and deploy your workloads across public clouds with a click of a button. Fully-managed service Focus on innovation; not infra ops From the creators of Ray Access to Ray experts Built for dev -> prod journey Scale from laptop to cloud seamless; Easy CI/CD integration

Simplify your MLOps with Anyscale Effortlessly deploy AI workflows and
models into production with your existing CI/CD tools. Production jobs & services Deploy ML workflows & models into production with ease Observability Monitor health with event logs and prebuilt dashboards App packaging Package apps, incl. all code and library dependencies APIs & SDKs Automate and integrate into your workflows (eg. CI/CD)

- Client makes it easy to run on the cloud
as easily as your laptop - Built-in dashboards, integration with Tensorboard, Grafana. Infinite Laptop experience w/ power of the cloud

Unified Compute Unify the end-to-end ML lifecycle on a single,
scalable platform with a rich ecosystem of distributed machine learning libraries Data Processing Training Serving Hyper. Tuning Others Ray ecosystem + Native Ray + Anyscale universal framework for distributed computing Business Logic

Uber Michelangelo in 2021 - All in on Ray!

Universal Data Loading Last Mile Preprocessing Parallel GPU/CPU Compute Ray
Datasets ray.data.Dataset Node 1 Block Node 2 Block Block Node 3 Block Blocks

Efficient algorithms that enable running trials in parallel Effective orchestration
of distributed trials Easy to use APIs Ray Tune Cutting edge optimization algorithms Minimal code changes to work in distributed settings Compatible with ML ecosystem

XGBoost-Ray

Motivation • There are existing solutions for distributed XGBoost ◦
E.g. Spark, Dask, Kubernetes • But most existing solutions lack one or more of: ◦ Dynamic computation graphs ◦ Fault tolerance handling ◦ GPU support ◦ Integration with hyperparameter tuning libraries

XGBoost-Ray • Ray actors for stateful training workers • Advanced
fault tolerance mechanisms • Full (multi) GPU support • Locality-aware distributed data loading • Integration with Ray Tune

Gradient boosting: • Add a new model at each iteration
• Trees or linear models • Each step try to fit the residuals using loss gradients • (XGBoost: 2nd order Taylor approximations) Tree 1 Tree 2 Tree 3 + + + ... Recap: XGBoost

Recap: Distributed XGBoost

load_data() Worker 1 Worker 2 Worker 3 Worker 4 load_data()
load_data() load_data() Distributed data loading @ray.remote Actors Architecture Driver

load_data() Worker 1 Worker 2 Worker 3 Worker 4 xgb.train()
load_data() xgb.train() load_data() xgb.train() load_data() xgb.train() Distributed data loading Tree-based allreduce (Rabit) Architecture Driver

Driver load_data() Worker 1 Worker 2 Worker 3 Worker 4
xgb.train() load_data() xgb.train() load_data() xgb.train() load_data() xgb.train() Distributed data loading Tree-based allreduce (Rabit) Checkpoints Eval results Architecture

Performance 100K rows, 1K features, 2 classes, 10 boosting rounds,
all on GPU aside from hist

Partition A Node 1 Node 2 Node 3 Node 4
Partition B Partition C Partition F Partition D Partition E Partition G Partition H Partition A Worker 1 Worker 2 Worker 3 Worker 4 Partition B Partition C Partition F Partition D Partition E Partition G Partition H Distributed dataframe (e.g. Ray Datasets, Dask) XGBoost-Ray workers Distributed data loading

• In distributed training, some worker nodes are bound to
fail eventually • Default: Simple (cold) restart from last checkpoint • Non-elastic training (warm restart): Only failing worker restarts • Elastic training: Continue training with fewer workers until failed actor is back Fault tolerance

Worker 1 Worker 2 Worker 3 Worker 4 Training Paused
Failed Stopped Loading data Worker 1 Worker 2 Worker 3 Worker 4 Worker 1 Worker 2 Worker 3 Worker 4 Worker 1 Worker 2 Worker 3 Worker 4 Worker 1 Worker 2 Worker 3 Worker 4 Time Fault tolerance: Simple (cold) restart

Fault tolerance: Non-elastic training (warm restart) Worker 1 Worker 2
Worker 3 Worker 4 Training Paused Failed Stopped Loading data Worker 1 Worker 2 Worker 3 Worker 4 Worker 1 Worker 2 Worker 3 Worker 4 Worker 1 Worker 2 Worker 3 Worker 4 Worker 1 Worker 2 Worker 3 Worker 4 Time

Worker 1 Worker 2 Worker 3 Worker 4 Training Paused
Failed Stopped Loading data Worker 1 Worker 2 Worker 3 Worker 4 Worker 1 Worker 2 Worker 3 Worker 4 Worker 1 Worker 2 Worker 3 Worker 4 Worker 1 Worker 2 Worker 3 Worker 4 Time Finishes earlier Fault tolerance: Elastic training

Fault tolerance: Benchmarks Condition Affected workers Eval error Time (s)
Baseline 0 0.133326 1441.44 Fewer workers 1 0.134000 1227.45 Fewer workers 2 0.133977 1249.45 Fewer workers 3 0.133333 1291.54 Non elastic 1 0.133552 2205.95 Non elastic 2 0.133211 2226.96 Non elastic 3 0.133552 2033.94 Elastic training 1 0.133763 1231.58 Elastic training 2 0.133771 1197.55 Elastic training 3 0.133704 1259.37 30M rows, 500 features, 2 classes, 100 boosting rounds, 10 workers

Hyperparameter tuning Trial 1 eta: 0.1 gamma: 0.2 Trial ...
eta: 0.3 gamma: 0.1 Trial n eta: 0.2 gamma: 0.0 Worker 1 Worker 2 Worker ... Worker m Worker 1 Worker 2 Worker ... Worker m Worker 1 Worker 2 Worker ... Worker m Early stopping Searchers (e.g. BO, TPE) Report checkpoints and results

API example from sklearn.datasets import load_breast_cancer from xgboost import DMatrix,
train train_x, train_y = load_breast_cancer(return_X_y=True) train_set = DMatrix(train_x, train_y) bst = train( {"objective": "binary:logistic"}, train_set ) bst.save_model("trained.xgb") bst = train( {"objective": "binary:logistic"}, train_set, ray_params=RayParams(num_actors=2) ) bst.save_model("trained.xgb") from xgboost_ray import RayDMatrix, RayParams, train train_set = RayDMatrix(train_x, train_y)

Questions & Answers

Simplify and scale your XGBoost model using Ray...

Simplify and scale your XGBoost model using Ray on Anyscale

More Decks by Anyscale

Other Decks in Technology

Featured

Transcript