Upgrade to Pro — share decks privately, control downloads, hide ads and more …

AlphaDow: Leveraging Ray’s ecosystem to train and deploy an RL industrial production scheduling agent

AlphaDow: Leveraging Ray’s ecosystem to train and deploy an RL industrial production scheduling agent

Dow is building a highly automated and intelligent multi-agent digital supply chain where many agents (RL, ML, MIP, and human) interact seamlessly to make better and faster decisions that positively impact customers, financial performance, and shareholders. Several of the digital agents are deployed using Ray Serve, which significantly simplifies their deployment, scaling, and interaction with each other. One of these agents is Dow’s project AlphaDow, which creates reinforcement learning-based agents for production scheduling — a non-trivial daily problem for all of Dow’s many facilities. AlphaDow agents are trained on in-house simulation models using RLLib and Ray Tune running on Azure compute clusters where Ray’s implementation of Population-Based Bandits is used to great effect for hyperparameter tuning. Once trained, these agents are deployed on Dow’s AKS cluster running Ray and Ray Serve.

AlphaDow’s success (thanks in part to Ray) has been the catalyst for accelerating progress towards Dow’s AI strategy and vision. In this talk, Adam will highlight several of the challenges of deploying such advanced models into a legacy industrial setting, as well as how Ray has helped overcome some of these challenges and accelerated deployments in general at Dow.

Af07bbf978a0989644b039ae6b8904a5?s=128

Anyscale
PRO

April 05, 2022
Tweet

More Decks by Anyscale

Other Decks in Technology

Transcript

  1. ALPHADOW: LEVERAGING RAY’S ECOSYSTEM TO TRAIN AND DEPLOY AN RL

    INDUSTRIAL PRODUCTION SCHEDULING AGENT ADAM KELLOWAY AI TECH. LEAD @ DOW DIGITAL FULFILLMENT CENTER Ray RL Conference - March 29th 2022 ®™Trademark of The Dow Chemical Company ('Dow') or an affiliated company of Dow
  2. General Business 2 THE PRODUCTION PLANNING & SCHEDULING PROBLEM Production

    Reactors Warehouses Customers What is the optimal sequence of products to make at each reactor in order to meet all customer demands on time and minimize total costs (inventory and transition costs)? Site A Site B 3/29/2022
  3. General Business AlphaDow Agent is trained to produce schedules that

    ✔ Max. customer demand on time ✔ Min. inventory ✔ Min. transition costs ✔ With manufacturing constraints ✔ With planning constraints ALPHADOW TEAM IS BUILDING AN AI SCHEDULER 3 AlphaDow Observe Happy Customers Happy Schedulers Happy Businesses $$$ For Dow Production Plant • Manufacturing data • Customer demand • Current inventory levels • Planned downtimes Production Schedulers Production schedules Production schedules Schedulers adjust and confirm AlphaDow output Make A for 3 days Make C for 2 days Make B for 4 days … etc. for 90 days repeat weekly & monitor daily 3/29/2022
  4. General Business For entirely in-house RL everything is a design

    variable! ▪ Actions: ✔ continuous vs discrete, single vs multiple, zero time “do nothing” action ▪ Rewards: ✔ sparse vs dense, scaling, frequency, content, hyperparametrized ✔ RL vs Mixed Integer objective function? Overall performance vs instantaneous. ▪ State/Observations: ✔ What is included, what is important, scaling of elements? ▪ Agent Neural Network Design: ✔ Linear agents?, LSTMs, Transformers/Attention ✔ Do you treat different parts of the state differently or just stack everything? ✔ Do you state stacking? Give agent historical view? A-priori, nothing is independent, and everything is connected. Be prepared to spend time and money on compute resources. SOME OBSERVATIONS ON WHY IN-HOUSE RL IS CHALLENGING 4 3/29/2022
  5. General Business BUT THE ANYSCALE ECOSYSTEM [RLLIB, TUNE] ARE HERE

    TO HELP… 5 3/29/2022 Multi-Layer Fully Connected State Action Logits Masked Action Logits Mask + = https://www.anyscale.com/blog/population-based-bandits action masking to impose known constraints population based training/bandits
  6. General Business BUT TRAINING CAN BE ACHIEVED… AND SOMETIMES SIMPLER

    IS BETTER 6 3/29/2022 ▪ Actions: Make 1 batch of product ▪ State: inventory demand forecast Stacked last 10 – in single vector Action masking vector ▪ Agent NN Model: Linear fully connected 256 * 6 layers ▪ Rewards: + customer demand on time - inventory - transition costs
  7. General Business Deployments present their own set of challenges ▪

    RL simulations don’t use the same data as traditional ML models ▪ Simulation is built on concepts but translating those concepts to data sources is challenging Don’t forget to consider where you will get your inference data from? ▪ Data alignment across multiple data sources 🡪 data leakage ▪ This is old-fashioned model work Find the right people with the right knowledge Hindered by our lack of maturity in DataOps and MLOps ✔ You could find yourself pivoting to influence the direction of MLOps and DataOps TIME TO DEPLOY YOUR TRAINED AGENT … 7 3/29/2022
  8. General Business BECAUSE YOU NEED AT LEAST ONE ARCHITECTURE DIAGRAM…

    8 Models move from AML Model registry to RayServe deployment on AKS Training occurs in AML heterogenous compute clusters running Ray. 4 GPUs & 100s CPUs Trained models are registered in AML’s model registry AlphaDow end user app Ray Serve allows us to scale out inferencing as needed My Laptop 3/29/2022
  9. General Business ▪ Multiple decision-making agents: ✔ Computational and human

    ✔ RL, MIP, Heuristics, etc. ✔ Interacting with each other over common or conflicting goals ▪ Addressing challenges: ✔ Faster decision making ✔ Globally considered decision making FUTURE WORK DIRECTION 9 3/29/2022 ▪ Multi-Agent Ray, RLLib, Tune ▪ Multi-Agent Ray Serve for deployment ▪ Generalization of “AlphaDow” to other planning tasks ▪ Interconnectivity and information sharing ▪ Composed models AlphaDow Agent MIP Model AlphaDow Agent AlphaDow Agent Heuristic Model MIP Model AlphaDow Agent Ray Ecosystem (Ray Serve, RLLib, Tune)
  10. General Business ▪ AlphaDow carries a strong internal brand at

    Dow ▪ A strong brand help with the creation of a “lighthouse project” ▪ A lighthouse project helps to break through hesitant leaders ▪ A lighthouse project allows the project to continue when others may have been shelved ▪ A lighthouse project will uncover systemic issues affecting all AI/ML/RL/DL model developments and deployments at your company. MAJOR LESSON 🡨 HAVE A STRONG INTERNAL BRAND! 🡨 10 3/29/2022
  11. General Business THANK YOU

  12. None