AlphaDow: Leveraging Ray’s ecosystem to train and deploy an RL industrial production scheduling agent

ALPHADOW: LEVERAGING RAY’S ECOSYSTEM TO TRAIN AND DEPLOY AN RL
INDUSTRIAL PRODUCTION SCHEDULING AGENT ADAM KELLOWAY AI TECH. LEAD @ DOW DIGITAL FULFILLMENT CENTER Ray RL Conference - March 29th 2022 ®™Trademark of The Dow Chemical Company ('Dow') or an affiliated company of Dow

General Business 2 THE PRODUCTION PLANNING & SCHEDULING PROBLEM Production
Reactors Warehouses Customers What is the optimal sequence of products to make at each reactor in order to meet all customer demands on time and minimize total costs (inventory and transition costs)? Site A Site B 3/29/2022

General Business AlphaDow Agent is trained to produce schedules that
✔ Max. customer demand on time ✔ Min. inventory ✔ Min. transition costs ✔ With manufacturing constraints ✔ With planning constraints ALPHADOW TEAM IS BUILDING AN AI SCHEDULER 3 AlphaDow Observe Happy Customers Happy Schedulers Happy Businesses $$$ For Dow Production Plant • Manufacturing data • Customer demand • Current inventory levels • Planned downtimes Production Schedulers Production schedules Production schedules Schedulers adjust and confirm AlphaDow output Make A for 3 days Make C for 2 days Make B for 4 days … etc. for 90 days repeat weekly & monitor daily 3/29/2022

General Business For entirely in-house RL everything is a design
variable! ▪ Actions: ✔ continuous vs discrete, single vs multiple, zero time “do nothing” action ▪ Rewards: ✔ sparse vs dense, scaling, frequency, content, hyperparametrized ✔ RL vs Mixed Integer objective function? Overall performance vs instantaneous. ▪ State/Observations: ✔ What is included, what is important, scaling of elements? ▪ Agent Neural Network Design: ✔ Linear agents?, LSTMs, Transformers/Attention ✔ Do you treat different parts of the state differently or just stack everything? ✔ Do you state stacking? Give agent historical view? A-priori, nothing is independent, and everything is connected. Be prepared to spend time and money on compute resources. SOME OBSERVATIONS ON WHY IN-HOUSE RL IS CHALLENGING 4 3/29/2022

General Business BUT THE ANYSCALE ECOSYSTEM [RLLIB, TUNE] ARE HERE
TO HELP… 5 3/29/2022 Multi-Layer Fully Connected State Action Logits Masked Action Logits Mask + = https://www.anyscale.com/blog/population-based-bandits action masking to impose known constraints population based training/bandits

General Business BUT TRAINING CAN BE ACHIEVED… AND SOMETIMES SIMPLER
IS BETTER 6 3/29/2022 ▪ Actions: Make 1 batch of product ▪ State: inventory demand forecast Stacked last 10 – in single vector Action masking vector ▪ Agent NN Model: Linear fully connected 256 * 6 layers ▪ Rewards: + customer demand on time - inventory - transition costs

General Business Deployments present their own set of challenges ▪
RL simulations don’t use the same data as traditional ML models ▪ Simulation is built on concepts but translating those concepts to data sources is challenging Don’t forget to consider where you will get your inference data from? ▪ Data alignment across multiple data sources 🡪 data leakage ▪ This is old-fashioned model work Find the right people with the right knowledge Hindered by our lack of maturity in DataOps and MLOps ✔ You could find yourself pivoting to influence the direction of MLOps and DataOps TIME TO DEPLOY YOUR TRAINED AGENT … 7 3/29/2022

General Business BECAUSE YOU NEED AT LEAST ONE ARCHITECTURE DIAGRAM…
8 Models move from AML Model registry to RayServe deployment on AKS Training occurs in AML heterogenous compute clusters running Ray. 4 GPUs & 100s CPUs Trained models are registered in AML’s model registry AlphaDow end user app Ray Serve allows us to scale out inferencing as needed My Laptop 3/29/2022

General Business ▪ Multiple decision-making agents: ✔ Computational and human
✔ RL, MIP, Heuristics, etc. ✔ Interacting with each other over common or conflicting goals ▪ Addressing challenges: ✔ Faster decision making ✔ Globally considered decision making FUTURE WORK DIRECTION 9 3/29/2022 ▪ Multi-Agent Ray, RLLib, Tune ▪ Multi-Agent Ray Serve for deployment ▪ Generalization of “AlphaDow” to other planning tasks ▪ Interconnectivity and information sharing ▪ Composed models AlphaDow Agent MIP Model AlphaDow Agent AlphaDow Agent Heuristic Model MIP Model AlphaDow Agent Ray Ecosystem (Ray Serve, RLLib, Tune)

General Business ▪ AlphaDow carries a strong internal brand at
Dow ▪ A strong brand help with the creation of a “lighthouse project” ▪ A lighthouse project helps to break through hesitant leaders ▪ A lighthouse project allows the project to continue when others may have been shelved ▪ A lighthouse project will uncover systemic issues affecting all AI/ML/RL/DL model developments and deployments at your company. MAJOR LESSON 🡨 HAVE A STRONG INTERNAL BRAND! 🡨 10 3/29/2022

General Business THANK YOU

AlphaDow: Leveraging Ray’s ecosystem to train a...

AlphaDow: Leveraging Ray’s ecosystem to train and deploy an RL industrial production scheduling agent

Anyscale

More Decks by Anyscale

Other Decks in Technology

Featured

Transcript

ALPHADOW: LEVERAGING RAY’S ECOSYSTEM TO TRAIN AND DEPLOY AN RL

General Business 2 THE PRODUCTION PLANNING & SCHEDULING PROBLEM Production

General Business AlphaDow Agent is trained to produce schedules that

General Business For entirely in-house RL everything is a design

General Business BUT THE ANYSCALE ECOSYSTEM [RLLIB, TUNE] ARE HERE

General Business BUT TRAINING CAN BE ACHIEVED… AND SOMETIMES SIMPLER

General Business Deployments present their own set of challenges ▪

General Business BECAUSE YOU NEED AT LEAST ONE ARCHITECTURE DIAGRAM…

General Business ▪ Multiple decision-making agents: ✔ Computational and human

General Business ▪ AlphaDow carries a strong internal brand at

General Business THANK YOU