Upgrade to Pro — share decks privately, control downloads, hide ads and more …

RL in the physical world

RL in the physical world

Siemens Technology has been working on physical and industrial applications of neural networks and reinforcement learning for more than 20 years. With multiple deployments in various industrial domains (steel, paper, power plants, factory automation, mobility), we have learned about the challenges and constraints that are related to safety-critical environments and real-world applicability. Developing algorithms built on domain expertise, we solve reinforcement learning (RL) tasks with little data but lots of available expert knowledge, and need to establish data and machine learning pipelines as well as deployment strategies. As part of Siemens offerings, RL needs to be reliable, trustworthy, and cost-efficient.

In this talk, we will discuss RL use cases that might impact you every day. Starting with RL-controlled power plant gas turbines, we will introduce typical requirements from the industry and present derived research and software results.

Anyscale

March 30, 2022
Tweet

More Decks by Anyscale

Other Decks in Technology

Transcript

  1. Reinforcement Learning in the Physical World Volkmar Sterzing, Marc Weber

    Siemens AG | Technology | Data Analytics & AI © Siemens 2022 | Technology | Data Analytics & AI | Volkmar Sterzing, Marc Weber
  2. Why Reinforcement Learning Solutions Make a Difference for Industry ©

    Siemens 2022 | Technology | Data Analytics & AI | Volkmar Sterzing, Marc Weber Real System Machine Simulation Digital Twin Data Records Logs Reinforcement Learning Genetic Programming Swarm Optimization Page 2 Actions Model-based
  3. A brief history of RL at Siemens © Siemens 2022

    | Technology | Data Analytics & AI | Volkmar Sterzing, Marc Weber 2003: RL for laundry machines Dynamic laundry distribution by RL is faster than best hand-engineered strategy 1994: Backgammon RL with large number of states Business Impact World 2017: Gas Turbine Auto Tuner Installation on largest Siemens gas turbine (SGT-8000H) 2019: Integration in Process Control 5) Interpretable RL learns polymer reactor policy from offline data 2006: Data-Efficient RL1-3) Architectures and IP to enable offline learning for turbines Technologies 2016: Variable Objectives Goal-conditioned, multi- objective RL 2021: MOOSE Policy 2) Model-based learning and safe operation 2017: Interpretable RL 3) Wind turbines controlled by interpretable RL Policy 198x: Reinforcement Learning Sutton & Barto invent the “RL wheel” 2016: AlphaGo 2013: DQN Atari 2019: OpenAI Five DotA 2021: Energy-efficient time tabling Multi-agent RL reduces delay and energy consumption of metro trains 2022: Control of Tokamak fusion reactor plasmas Physics-informed, model-based RL Page 3 2013: RL reduces NOx by 20% NOx emissions of large gas turbines significantly reduced Uncertainty-aware RL 4) using Bayesian NN and Deep Gaussian Processes 2017: Proximal Policy Optimization
  4. The Challenge: Bridging the Gap between the Research and Industry

    Domain © Siemens 2022 | Technology | Data Analytics & AI | Volkmar Sterzing, Marc Weber Page 4 Industry Domain Existing engineered solutions, domain experts Safe, interpretable and trustworthy solutions Difficult learning setup: offline, missing data, noise, … Integration of domain know-how and constraints (Time + budget) limitations on compute resources Research Domain Unsolved problems and uncharted territory Fast moving and competitive research field Clean learning setup: online, fast-feedback, infinite rollouts, unlimited sampling Virtually no limits on (compute) resources
  5. Use Case Examples © Siemens 2022 | Technology | Data

    Analytics & AI | Volkmar Sterzing, Marc Weber Page 5
  6. Real-Time Combustion Optimization for Large Gas Turbines: Gas Turbine Auto

    Tuner © Siemens 2022 | Technology | Data Analytics & AI | Volkmar Sterzing, Marc Weber Page 6 Offline RL Modular Safety- embedded Model-based https://www.siemens-energy.com/global/en/offerings/services/digital-services/gt-autotuner.html 300 MW
  7. Energy-efficient Time Tabling for Subway Systems © Siemens 2022 |

    Technology | Data Analytics & AI | Volkmar Sterzing, Marc Weber Page 7 Subway Simulator Reinforcement Learning in the Cloud Subway topology Timetables Selected KPIs Test scenarios interact Policy Timetable engineer
  8. High-dimensional RL Control For Parcel Logistics © Siemens 2022 |

    Technology | Data Analytics & AI | Volkmar Sterzing, Marc Weber Page 8 Simulation/ Digital Twin or real machine (at customer site) Multi-Agent Deep Learning Framework Image Action Reward Multi-agent reinforcement learning
  9. Research Focus: Facing the Industry Challenge © Siemens 2022 |

    Technology | Data Analytics & AI | Volkmar Sterzing, Marc Weber Impressive results for problems where exploration is cheap If exploration on real systems is prohibited offline data and simulations come to the rescue. Policy search, e.g. by using evolutionary methods, enables robust and interpretable RL solutions Generating uncertainty aware surrogate models from (offline) data or simulations increases robustness and speed of policy training Implicit black-box policies are acceptable Testing and evaluating generated policies is cheap and safe Page 9
  10. References / Recent Publications © Siemens 2022 | Technology |

    Data Analytics & AI | Volkmar Sterzing, Marc Weber Page 10 (1) Daniel Schneegaß, Steffen Udluft, Thomas Martinetz Kernel Rewards Regression: An Information Efficient Batch Policy Iteration Approach Artificial Intelligence and Applications, 2006 (2) Anton M. Schaefer, Steffen Udluft, Hans-Georg Zimmermann A recurrent control neural network for data efficient reinforcement learning 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning, 2007 (3) Daniel Schneegaß, Steffen Udluft, Thomas Martinetz improving optimality of neural rewards regression for data-efficient batch near-optimal policy identification. International Conference on Artificial Neural Networks, 2007 (4) Stefan Depeweg, Jose Miguel Hernandez-Lobato, Finale Doshi-Velez, and Steffen Udluft. Learning and policy search in stochastic dynamical systems with Bayesian neural networks. In International Conference on Learning Representations (ICLR 2017), 2017. (5) Markus Kaiser, Clemens Otte, Thomas A. Runkler, and Carl Henrik Ek. Bayesian decomposition of multi-modal dynamical systems for reinforcement learning. Neurocomputing 416, 2020. (6) Phillip Swazinna, Steffen Udluft, and Thomas Runkler. Overcoming model bias for robust offline deep reinforcement learning. Engineering Applications of Artificial Intelligence 104, 2021. (7) Daniel Hein, Steffen Udluft, Thomas Runkler. Interpretable policies for reinforcement learning by genetic programming Engineering Applications of Artificial Intelligence 76, 2018. (8) Daniel Hein and Daniel Labisch. Trustworthy AI for process automation on a Chylla-Haase polymerization reactor. Proceedings of the Genetic and Evolutionary Computation Conference Companion (GECCO 2021), 2021.
  11. Current Challenges and Opportunities © Siemens 2022 | Technology |

    Data Analytics & AI | Volkmar Sterzing, Marc Weber Page 11 Convert groundbreaking technology into profitable business cases for industrial applications Enable RL solutions to scale fast and generalize over large amount of applications and use cases Develop trustworthy and robust RL solutions, which are ready to pass safety regulations required for industrial products and services