Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Applying Ray and RLlib to Real-life Industrial Use Cases (Edward Junprung & Sahar Esmaeilzadeh, Pathmind)

Applying Ray and RLlib to Real-life Industrial Use Cases (Edward Junprung & Sahar Esmaeilzadeh, Pathmind)

In this session, we’ll explore industrial applications of reinforcement learning and compare the performance of an RL policy to traditional heuristics and optimizers. We have found that in certain use cases, RL can outperform all other approaches by more than 10%. We will cover the following topics:

1. A comparison of reinforcement learning versus heuristics and optimizers.
2. Bridging Ray with a simulation IDE such as AnyLogic to train a reinforcement learning policy.
3. A demo using a heating, ventilation, and air condition (HVAC) system.

At the conclusion of this session, you should be able to identify use cases suitable for RL and gain intuition on how reinforcement learning can be applied to industrial use cases.

Anyscale
PRO

July 21, 2021
Tweet

More Decks by Anyscale

Other Decks in Technology

Transcript

  1. Applying RLLib to Real-Life
    Industrial Use Cases
    |

    View Slide

  2. Agenda
    - Pathmind and Ray
    - Industrial Engineering workflow and optimization tactics.
    - Heuristics vs Optimizers vs RL
    - When and why you should try Reinforcement Learning
    - Simulation + RL Demo

    View Slide

  3. | Pathmind and Ray
    - Bridge between real-life industrial processes and RL
    - We use RLLib out-of-box (PPO, Population-Based
    Training)
    Digital Twin

    View Slide

  4. | Industrial Engineering Workflow
    Simulation/Digital Twin
    (Observations)
    Decisions
    (Action)
    Outcomes
    (Reward)
    Heuristics
    Static rules defined
    by a domain expert (if
    this, then that).
    Optimizers RL
    Automatically finds
    the best parameters
    of a model, with
    respect to certain
    constraints.

    View Slide

  5. ● Heuristics
    ○ Pros
    ■ Easy to understand and implement.
    ■ Factory manager knows exactly what
    is going on.
    ○ Cons
    ■ Not scalable. Can grow to hundreds
    of rules.
    ■ Inflexible and doesn’t react to
    change.
    | Heuristics vs Optimizers vs RL

    View Slide

  6. ● Optimizers
    ○ Pros
    ■ Relatively easy to set up
    ■ Not a black box
    ○ Cons
    ■ Clunky and slow in complex
    scenarios.
    ■ Static in nature. Have to
    re-run optimizer whenever
    things change.
    | Heuristics vs Optimizers vs RL

    View Slide

  7. ● Reinforcement Learning
    ○ Pros
    ■ Good with high variability
    ■ Handles large state spaces
    ■ Can navigate multiple contradictory
    objectives
    ○ Cons
    ■ Needs a data scientist
    ■ Black box, hard to explain why a
    policy made a decision
    | When and Why Reinforcement Learning

    View Slide

  8. | Use Cases

    View Slide

  9. Simulation
    Demo

    View Slide

  10. View Slide