Dow is building a highly automated and intelligent multi-agent digital supply chain where many agents (RL, ML, MIP, and human) interact seamlessly to make better and faster decisions that positively impact customers, financial performance, and shareholders. Several of the digital agents are deployed using Ray Serve, which significantly simplifies their deployment, scaling, and interaction with each other. One of these agents is Dow’s project AlphaDow, which creates reinforcement learning-based agents for production scheduling — a non-trivial daily problem for all of Dow’s many facilities. AlphaDow agents are trained on in-house simulation models using RLLib and Ray Tune running on Azure compute clusters where Ray’s implementation of Population-Based Bandits is used to great effect for hyperparameter tuning. Once trained, these agents are deployed on Dow’s AKS cluster running Ray and Ray Serve.
AlphaDow’s success (thanks in part to Ray) has been the catalyst for accelerating progress towards Dow’s AI strategy and vision. In this talk, Adam will highlight several of the challenges of deploying such advanced models into a legacy industrial setting, as well as how Ray has helped overcome some of these challenges and accelerated deployments in general at Dow.
ALPHADOW: LEVERAGING RAY’S ECOSYSTEM TO TRAIN
AND DEPLOY AN RL INDUSTRIAL PRODUCTION SCHEDULING
AI TECH. LEAD @ DOW DIGITAL FULFILLMENT CENTER
Ray RL Conference - March 29th 2022
®™Trademark of The Dow Chemical Company ('Dow') or an affiliated company of Dow
General Business 2
THE PRODUCTION PLANNING & SCHEDULING PROBLEM
Production Reactors Warehouses
What is the optimal sequence of products to make at each reactor in order to meet
all customer demands on time and minimize total costs (inventory and transition
AlphaDow Agent is trained to
produce schedules that
✔ Max. customer demand on time
✔ Min. inventory
✔ Min. transition costs
✔ With manufacturing constraints
✔ With planning constraints
ALPHADOW TEAM IS BUILDING AN AI SCHEDULER
$$$ For Dow
• Manufacturing data
• Customer demand
• Current inventory levels
• Planned downtimes
Schedulers adjust and
confirm AlphaDow output
Make A for 3 days
Make C for 2 days
Make B for 4 days
… etc. for 90 days
repeat weekly & monitor daily
For entirely in-house RL everything is a design variable!
✔ continuous vs discrete, single vs multiple, zero time “do nothing” action
✔ sparse vs dense, scaling, frequency, content, hyperparametrized
✔ RL vs Mixed Integer objective function? Overall performance vs instantaneous.
✔ What is included, what is important, scaling of elements?
▪ Agent Neural Network Design:
✔ Linear agents?, LSTMs, Transformers/Attention
✔ Do you treat different parts of the state differently or just stack everything?
✔ Do you state stacking? Give agent historical view?
A-priori, nothing is independent, and everything is connected.
Be prepared to spend time and money on compute resources.
SOME OBSERVATIONS ON WHY IN-HOUSE RL IS CHALLENGING
BUT THE ANYSCALE ECOSYSTEM [RLLIB, TUNE] ARE HERE TO HELP…
Masked Action Logits
action masking to impose known constraints population based training/bandits
BUT TRAINING CAN BE ACHIEVED… AND SOMETIMES SIMPLER IS BETTER
Make 1 batch of product
Stacked last 10 – in single vector
Action masking vector
▪ Agent NN Model:
Linear fully connected 256 * 6 layers
+ customer demand on time
- transition costs
Deployments present their own set of challenges
▪ RL simulations don’t use the same data as traditional ML models
▪ Simulation is built on concepts but translating those concepts to data sources is
Don’t forget to consider where you will get your inference data from?
▪ Data alignment across multiple data sources 🡪 data leakage
▪ This is old-fashioned model work
Find the right people with the right knowledge
Hindered by our lack of maturity in DataOps and MLOps
✔ You could find yourself pivoting to influence the direction of MLOps and DataOps
TIME TO DEPLOY YOUR TRAINED AGENT …
BECAUSE YOU NEED AT LEAST ONE ARCHITECTURE DIAGRAM…
Models move from AML
Model registry to RayServe
deployment on AKS
Training occurs in AML heterogenous
compute clusters running Ray.
4 GPUs & 100s CPUs
are registered in
Ray Serve allows us to scale
out inferencing as needed
▪ Multiple decision-making agents:
✔ Computational and human
✔ RL, MIP, Heuristics, etc.
✔ Interacting with each other over
common or conflicting goals
▪ Addressing challenges:
✔ Faster decision making
✔ Globally considered decision making
FUTURE WORK DIRECTION
▪ Multi-Agent Ray, RLLib, Tune
▪ Multi-Agent Ray Serve for deployment
▪ Generalization of “AlphaDow” to other planning
▪ Interconnectivity and information sharing
▪ Composed models
Ray Ecosystem (Ray Serve, RLLib, Tune)
▪ AlphaDow carries a strong internal brand at Dow
▪ A strong brand help with the creation of a “lighthouse project”
▪ A lighthouse project helps to break through hesitant leaders
▪ A lighthouse project allows the project to continue when others may have been shelved
▪ A lighthouse project will uncover systemic issues affecting all AI/ML/RL/DL model
developments and deployments at your company.
MAJOR LESSON 🡨 HAVE A STRONG INTERNAL BRAND! 🡨