End-to-End AutoML with Ludwig on Ray (Travis Addair)

Travis Addair // [email protected] Making ML at scale simple &
ﬂexible declaratively End-to-end AutoML with Ludwig on Ray

What are the problems today ML projects take 6-18mo for
most organizations Data scientists are a precious & limited resource ML today is limited and slow Solutions tend to be a blackbox and dead-end AutoML has become a bad word in DS orgs Lacks ﬂexibility & introspection from expert users Dilemma: We need solutions that are simpler, but also more ﬂexible

Trends in Modern Data Infrastructure High level Simpler to use
Low level More flexible

Trends in Machine Learning Frameworks High level Simpler to use
Low level More flexible

Ludwig on Ray: Declarative Framework for DL at Scale General
Models Stochastic gradient descent, transformers for text, vision, and tabular. Structured Data Data lakes to cloud data warehouses, ETL to ELT, uniﬁed schemas / storage. Declarative ML +

What is Ludwig? Ludwig is a low-code declarative framework to
build deep neural networks Ludwig supports multi-task learning and mixed modality data types Developed and used in production at Uber Now a Linux Foundation project 2400+ downloads/month 7700+ Stars on GitHub 60+ Contributors

Ludwig Architecture Category Column Numerical Column Binary Column Text Column
Image Column Audio Column ... Preproc Preproc Preproc Preproc Preproc Preproc Preproc Encode Encode Encode Encode Encode Encode Encode Category Column Numerical Column Binary Column Text Column Image Column Audio Column ... Decode Decode Decode Decode Decode Decode Decode Postproc Postproc Postproc Postproc Postproc Postproc Postproc Combine INPUTS OUTPUTS

Ludwig Task Flexibility Category Numerical Binary Encode Encode Encode Numerical
Decode Combine Regression Text Encode Category Decode Combine Text Classification Image Encode Text Decode Combine Image Captioning Audio Encode Binary Decode Combine Speech Verification Audio Encode Time series Encode Numerical Decode Combine Forecasting ... Encode Binary Decode Combine Binary Classification

Why declarative solves our earlier dilemma

Why declarative? Abstract away complexity of scale, optimization, and productionization
into a “query planner” (i.e., AutoML / hyperopt) Retain full ﬂexibility over model properties; specify what you want and Ludwig will determine how to optimally deliver. input_features: - name: utterance type: text encoder: rnn cell_type: lstm num_layers: 2 output_features: - name: class type: category training: learning_rate: 0.001 optimizer: type: adam Easy to install: pip install ludwig Programmatic API: from ludwig.api import LudwigModel model = LudwigModel(config) model.train(train_data) predictions = model.predict(other_data) Model serving: ludwig serve --model_path <model_path> One-line to create, train and use a model Models as Conﬁg

Ludwig is SOTA on NLP & Tabular Tasks Model Task
Benchmarks Accuracy RoBERTa (‘19) Yelp Review Classiﬁcation 97.3% Tabnet (‘19) Higgs-Boson Particle Detection 78.5% T5 (‘20) Multi-dimensional Gender Bias 89.4% NLP Tabular Dataset XGBoost Accuracy Tabnet Paper Accuracy Ludwig Accuracy Forest Tree Cover 0.8934 0.9699 0.9508 Higgs Boson - 0.7884 0.7846 Poker Hands 0.711 0.992 0.9914

Ludwig before Ray

Ludwig before Ray Only runs locally Data must fit in
memory No distributed training No parallel evaluation

Ludwig on Ray

Ludwig on Ray Dask distributed data Horovod distributed training Ray
remote execution Dask distributed evaluation

Conﬁguring Ray for Ludwig cluster_name: ludwig-ray-gpu-nightly min_workers: 4 max_workers: 4
docker: image: "ludwigai/ludwig-ray-gpu:nightly" container_name: "ray_container" head_node: InstanceType: c5.2xlarge ImageId: latest_dlami worker_nodes: InstanceType: g4dn.xlarge ImageId: latest_dlami

Running Ludwig $ ludwig train --config config.yaml --dataset s3://mybucket/dataset.parquet $
ray up cluster.yaml $ ray submit cluster.yaml ludwig train --config config.yaml --dataset s3://mybucket/dataset.parquet Running Ludwig on Ray

Ludwig with Ray Tune

Ludwig Hyperopt with Ray Tune hyperopt: parameters: training.learning_rate: space: loguniform
lower: 0.01 upper: 0.1 combiner.num_fc_layers: space: randint lower: 2 upper: 6 goal: minimize executor: type: ray sampler: type: ray

Ludwig Hyperopt with Ray Tune (advanced) hyperopt: parameters: training.learning_rate: space:
loguniform lower: 0.01 upper: 0.1 combiner.num_fc_layers: space: randint lower: 2 upper: 6 goal: minimize executor: type: ray sampler: type: ray search_algo: type: bohb scheduler: type: hb_bohb time_attr: training_iteration reduction_factor: 4 num_samples: 100

Ludwig AutoML: no conﬁg needed (experimental) import ludwig dataset =
"s3://mybucket/dataset.train.parquet" model = ludwig.auto_train(target='label', dataset=dataset) model.predict("s3://mybucket/dataset.test.parquet")

A sneak peek into Predibase Predictive Database Simplicity & Flexibility
of Declarative ML Rise of Data Warehousing & Analytics Scalability of Horovod & Ray

A sneak peek into Predibase Our Mission Use Ludwig and
Horovod to make machine learning as simple and fast as writing a query. Reach out to [email protected] if you’re interested in hearing more. We are hiring! GIVEN * FROM Customers WHERE start > ‘03-01-20’ PREDICT churn ID date amount churn Claim #7 10-31-20 $11,000 Retained Claim #8 11-15-20 $18,000 Retained Claim #9 12-20-20 $36,000 Churned

End-to-End AutoML with Ludwig on Ray (Travis Ad...

End-to-End AutoML with Ludwig on Ray (Travis Addair)

Anyscale

More Decks by Anyscale

Other Decks in Technology

Featured

Transcript

Travis Addair // [email protected] Making ML at scale simple &

What are the problems today ML projects take 6-18mo for

Trends in Modern Data Infrastructure High level Simpler to use

Trends in Machine Learning Frameworks High level Simpler to use

Ludwig on Ray: Declarative Framework for DL at Scale General

What is Ludwig? Ludwig is a low-code declarative framework to

Ludwig Architecture Category Column Numerical Column Binary Column Text Column

Ludwig Task Flexibility Category Numerical Binary Encode Encode Encode Numerical

Why declarative solves our earlier dilemma

Why declarative? Abstract away complexity of scale, optimization, and productionization

Ludwig is SOTA on NLP & Tabular Tasks Model Task

Ludwig before Ray

Ludwig before Ray Only runs locally Data must fit in

Ludwig on Ray

Ludwig on Ray Dask distributed data Horovod distributed training Ray

Conﬁguring Ray for Ludwig cluster_name: ludwig-ray-gpu-nightly min_workers: 4 max_workers: 4

Running Ludwig $ ludwig train --config config.yaml --dataset s3://mybucket/dataset.parquet $

Running Ludwig $ ludwig train --config config.yaml --dataset s3://mybucket/dataset.parquet $

Ludwig with Ray Tune

Ludwig Hyperopt with Ray Tune hyperopt: parameters: training.learning_rate: space: loguniform

Ludwig Hyperopt with Ray Tune (advanced) hyperopt: parameters: training.learning_rate: space:

Ludwig AutoML: no conﬁg needed (experimental) import ludwig dataset =

A sneak peek into Predibase Predictive Database Simplicity & Flexibility

A sneak peek into Predibase Our Mission Use Ludwig and