Upgrade to Pro — share decks privately, control downloads, hide ads and more …

AlphaDow: Leveraging Ray’s ecosystem to train and deploy an RL industrial production scheduling agent

AlphaDow: Leveraging Ray’s ecosystem to train and deploy an RL industrial production scheduling agent

Dow is building a highly automated and intelligent multi-agent digital supply chain where many agents (RL, ML, MIP, and human) interact seamlessly to make better and faster decisions that positively impact customers, financial performance, and shareholders. Several of the digital agents are deployed using Ray Serve, which significantly simplifies their deployment, scaling, and interaction with each other. One of these agents is Dow’s project AlphaDow, which creates reinforcement learning-based agents for production scheduling — a non-trivial daily problem for all of Dow’s many facilities. AlphaDow agents are trained on in-house simulation models using RLLib and Ray Tune running on Azure compute clusters where Ray’s implementation of Population-Based Bandits is used to great effect for hyperparameter tuning. Once trained, these agents are deployed on Dow’s AKS cluster running Ray and Ray Serve.

AlphaDow’s success (thanks in part to Ray) has been the catalyst for accelerating progress towards Dow’s AI strategy and vision. In this talk, Adam will highlight several of the challenges of deploying such advanced models into a legacy industrial setting, as well as how Ray has helped overcome some of these challenges and accelerated deployments in general at Dow.

Anyscale
PRO

April 05, 2022
Tweet

More Decks by Anyscale

Other Decks in Technology

Transcript

  1. ALPHADOW: LEVERAGING RAY’S ECOSYSTEM TO TRAIN
    AND DEPLOY AN RL INDUSTRIAL PRODUCTION SCHEDULING
    AGENT
    ADAM KELLOWAY
    AI TECH. LEAD @ DOW DIGITAL FULFILLMENT CENTER
    Ray RL Conference - March 29th 2022
    ®™Trademark of The Dow Chemical Company ('Dow') or an affiliated company of Dow

    View Slide

  2. General Business 2
    THE PRODUCTION PLANNING & SCHEDULING PROBLEM
    Production Reactors Warehouses
    Customers
    What is the optimal sequence of products to make at each reactor in order to meet
    all customer demands on time and minimize total costs (inventory and transition
    costs)?
    Site A
    Site B
    3/29/2022

    View Slide

  3. General Business
    AlphaDow Agent is trained to
    produce schedules that
    ✔ Max. customer demand on time
    ✔ Min. inventory
    ✔ Min. transition costs
    ✔ With manufacturing constraints
    ✔ With planning constraints
    ALPHADOW TEAM IS BUILDING AN AI SCHEDULER
    3
    AlphaDow
    Observe
    Happy Customers
    Happy Schedulers
    Happy Businesses
    $$$ For Dow
    Production Plant
    • Manufacturing data
    • Customer demand
    • Current inventory levels
    • Planned downtimes
    Production
    Schedulers
    Production
    schedules
    Production
    schedules
    Schedulers adjust and
    confirm AlphaDow output
    Make A for 3 days
    Make C for 2 days
    Make B for 4 days
    … etc. for 90 days
    repeat weekly & monitor daily
    3/29/2022

    View Slide

  4. General Business
    For entirely in-house RL everything is a design variable!
    ▪ Actions:
    ✔ continuous vs discrete, single vs multiple, zero time “do nothing” action
    ▪ Rewards:
    ✔ sparse vs dense, scaling, frequency, content, hyperparametrized
    ✔ RL vs Mixed Integer objective function? Overall performance vs instantaneous.
    ▪ State/Observations:
    ✔ What is included, what is important, scaling of elements?
    ▪ Agent Neural Network Design:
    ✔ Linear agents?, LSTMs, Transformers/Attention
    ✔ Do you treat different parts of the state differently or just stack everything?
    ✔ Do you state stacking? Give agent historical view?
    A-priori, nothing is independent, and everything is connected.
    Be prepared to spend time and money on compute resources.
    SOME OBSERVATIONS ON WHY IN-HOUSE RL IS CHALLENGING
    4
    3/29/2022

    View Slide

  5. General Business
    BUT THE ANYSCALE ECOSYSTEM [RLLIB, TUNE] ARE HERE TO HELP…
    5
    3/29/2022
    Multi-Layer Fully
    Connected
    State
    Action Logits
    Masked Action Logits
    Mask
    + =
    https://www.anyscale.com/blog/population-based-bandits
    action masking to impose known constraints population based training/bandits

    View Slide

  6. General Business
    BUT TRAINING CAN BE ACHIEVED… AND SOMETIMES SIMPLER IS BETTER
    6
    3/29/2022
    ▪ Actions:
    Make 1 batch of product
    ▪ State:
    inventory
    demand
    forecast
    Stacked last 10 – in single vector
    Action masking vector
    ▪ Agent NN Model:
    Linear fully connected 256 * 6 layers
    ▪ Rewards:
    + customer demand on time
    - inventory
    - transition costs

    View Slide

  7. General Business
    Deployments present their own set of challenges
    ▪ RL simulations don’t use the same data as traditional ML models
    ▪ Simulation is built on concepts but translating those concepts to data sources is
    challenging
    Don’t forget to consider where you will get your inference data from?
    ▪ Data alignment across multiple data sources 🡪 data leakage
    ▪ This is old-fashioned model work
    Find the right people with the right knowledge
    Hindered by our lack of maturity in DataOps and MLOps
    ✔ You could find yourself pivoting to influence the direction of MLOps and DataOps
    TIME TO DEPLOY YOUR TRAINED AGENT …
    7
    3/29/2022

    View Slide

  8. General Business
    BECAUSE YOU NEED AT LEAST ONE ARCHITECTURE DIAGRAM…
    8
    Models move from AML
    Model registry to RayServe
    deployment on AKS
    Training occurs in AML heterogenous
    compute clusters running Ray.
    4 GPUs & 100s CPUs
    Trained models
    are registered in
    AML’s model
    registry
    AlphaDow
    end user
    app
    Ray Serve allows us to scale
    out inferencing as needed
    My Laptop
    3/29/2022

    View Slide

  9. General Business
    ▪ Multiple decision-making agents:
    ✔ Computational and human
    ✔ RL, MIP, Heuristics, etc.
    ✔ Interacting with each other over
    common or conflicting goals
    ▪ Addressing challenges:
    ✔ Faster decision making
    ✔ Globally considered decision making
    FUTURE WORK DIRECTION
    9
    3/29/2022
    ▪ Multi-Agent Ray, RLLib, Tune
    ▪ Multi-Agent Ray Serve for deployment
    ▪ Generalization of “AlphaDow” to other planning
    tasks
    ▪ Interconnectivity and information sharing
    ▪ Composed models
    AlphaDow
    Agent
    MIP Model
    AlphaDow
    Agent
    AlphaDow
    Agent
    Heuristic
    Model
    MIP Model
    AlphaDow
    Agent
    Ray Ecosystem (Ray Serve, RLLib, Tune)

    View Slide

  10. General Business
    ▪ AlphaDow carries a strong internal brand at Dow
    ▪ A strong brand help with the creation of a “lighthouse project”
    ▪ A lighthouse project helps to break through hesitant leaders
    ▪ A lighthouse project allows the project to continue when others may have been shelved
    ▪ A lighthouse project will uncover systemic issues affecting all AI/ML/RL/DL model
    developments and deployments at your company.
    MAJOR LESSON 🡨 HAVE A STRONG INTERNAL BRAND! 🡨
    10
    3/29/2022

    View Slide

  11. General Business
    THANK YOU

    View Slide

  12. View Slide