RL in the physical world

Why Reinforcement Learning Solutions Make a Difference for Industry ©
Siemens 2022 | Technology | Data Analytics & AI | Volkmar Sterzing, Marc Weber Real System Machine Simulation Digital Twin Data Records Logs Reinforcement Learning Genetic Programming Swarm Optimization Page 2 Actions Model-based

A brief history of RL at Siemens © Siemens 2022
| Technology | Data Analytics & AI | Volkmar Sterzing, Marc Weber 2003: RL for laundry machines Dynamic laundry distribution by RL is faster than best hand-engineered strategy 1994: Backgammon RL with large number of states Business Impact World 2017: Gas Turbine Auto Tuner Installation on largest Siemens gas turbine (SGT-8000H) 2019: Integration in Process Control 5) Interpretable RL learns polymer reactor policy from offline data 2006: Data-Efficient RL1-3) Architectures and IP to enable offline learning for turbines Technologies 2016: Variable Objectives Goal-conditioned, multi- objective RL 2021: MOOSE Policy 2) Model-based learning and safe operation 2017: Interpretable RL 3) Wind turbines controlled by interpretable RL Policy 198x: Reinforcement Learning Sutton & Barto invent the “RL wheel” 2016: AlphaGo 2013: DQN Atari 2019: OpenAI Five DotA 2021: Energy-efficient time tabling Multi-agent RL reduces delay and energy consumption of metro trains 2022: Control of Tokamak fusion reactor plasmas Physics-informed, model-based RL Page 3 2013: RL reduces NOx by 20% NOx emissions of large gas turbines significantly reduced Uncertainty-aware RL 4) using Bayesian NN and Deep Gaussian Processes 2017: Proximal Policy Optimization

The Challenge: Bridging the Gap between the Research and Industry
Domain © Siemens 2022 | Technology | Data Analytics & AI | Volkmar Sterzing, Marc Weber Page 4 Industry Domain Existing engineered solutions, domain experts Safe, interpretable and trustworthy solutions Difficult learning setup: offline, missing data, noise, … Integration of domain know-how and constraints (Time + budget) limitations on compute resources Research Domain Unsolved problems and uncharted territory Fast moving and competitive research field Clean learning setup: online, fast-feedback, infinite rollouts, unlimited sampling Virtually no limits on (compute) resources

Use Case Examples © Siemens 2022 | Technology | Data
Analytics & AI | Volkmar Sterzing, Marc Weber Page 5

Real-Time Combustion Optimization for Large Gas Turbines: Gas Turbine Auto
Tuner © Siemens 2022 | Technology | Data Analytics & AI | Volkmar Sterzing, Marc Weber Page 6 Offline RL Modular Safety- embedded Model-based https://www.siemens-energy.com/global/en/offerings/services/digital-services/gt-autotuner.html 300 MW

Energy-efficient Time Tabling for Subway Systems © Siemens 2022 |
Technology | Data Analytics & AI | Volkmar Sterzing, Marc Weber Page 7 Subway Simulator Reinforcement Learning in the Cloud Subway topology Timetables Selected KPIs Test scenarios interact Policy Timetable engineer

High-dimensional RL Control For Parcel Logistics © Siemens 2022 |
Technology | Data Analytics & AI | Volkmar Sterzing, Marc Weber Page 8 Simulation/ Digital Twin or real machine (at customer site) Multi-Agent Deep Learning Framework Image Action Reward Multi-agent reinforcement learning

Research Focus: Facing the Industry Challenge © Siemens 2022 |
Technology | Data Analytics & AI | Volkmar Sterzing, Marc Weber Impressive results for problems where exploration is cheap If exploration on real systems is prohibited offline data and simulations come to the rescue. Policy search, e.g. by using evolutionary methods, enables robust and interpretable RL solutions Generating uncertainty aware surrogate models from (offline) data or simulations increases robustness and speed of policy training Implicit black-box policies are acceptable Testing and evaluating generated policies is cheap and safe Page 9

References / Recent Publications © Siemens 2022 | Technology |
Data Analytics & AI | Volkmar Sterzing, Marc Weber Page 10 (1) Daniel Schneegaß, Steffen Udluft, Thomas Martinetz Kernel Rewards Regression: An Information Efficient Batch Policy Iteration Approach Artificial Intelligence and Applications, 2006 (2) Anton M. Schaefer, Steffen Udluft, Hans-Georg Zimmermann A recurrent control neural network for data efficient reinforcement learning 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning, 2007 (3) Daniel Schneegaß, Steffen Udluft, Thomas Martinetz improving optimality of neural rewards regression for data-efficient batch near-optimal policy identification. International Conference on Artificial Neural Networks, 2007 (4) Stefan Depeweg, Jose Miguel Hernandez-Lobato, Finale Doshi-Velez, and Steffen Udluft. Learning and policy search in stochastic dynamical systems with Bayesian neural networks. In International Conference on Learning Representations (ICLR 2017), 2017. (5) Markus Kaiser, Clemens Otte, Thomas A. Runkler, and Carl Henrik Ek. Bayesian decomposition of multi-modal dynamical systems for reinforcement learning. Neurocomputing 416, 2020. (6) Phillip Swazinna, Steffen Udluft, and Thomas Runkler. Overcoming model bias for robust offline deep reinforcement learning. Engineering Applications of Artificial Intelligence 104, 2021. (7) Daniel Hein, Steffen Udluft, Thomas Runkler. Interpretable policies for reinforcement learning by genetic programming Engineering Applications of Artificial Intelligence 76, 2018. (8) Daniel Hein and Daniel Labisch. Trustworthy AI for process automation on a Chylla-Haase polymerization reactor. Proceedings of the Genetic and Evolutionary Computation Conference Companion (GECCO 2021), 2021.

Current Challenges and Opportunities © Siemens 2022 | Technology |
Data Analytics & AI | Volkmar Sterzing, Marc Weber Page 11 Convert groundbreaking technology into profitable business cases for industrial applications Enable RL solutions to scale fast and generalize over large amount of applications and use cases Develop trustworthy and robust RL solutions, which are ready to pass safety regulations required for industrial products and services

RL in the physical world

RL in the physical world

Anyscale

More Decks by Anyscale

Other Decks in Technology

Featured

Transcript

Reinforcement Learning in the Physical World Volkmar Sterzing, Marc Weber

Why Reinforcement Learning Solutions Make a Difference for Industry ©

A brief history of RL at Siemens © Siemens 2022

The Challenge: Bridging the Gap between the Research and Industry

Use Case Examples © Siemens 2022 | Technology | Data

Real-Time Combustion Optimization for Large Gas Turbines: Gas Turbine Auto

Energy-efficient Time Tabling for Subway Systems © Siemens 2022 |

High-dimensional RL Control For Parcel Logistics © Siemens 2022 |

Research Focus: Facing the Industry Challenge © Siemens 2022 |

References / Recent Publications © Siemens 2022 | Technology |

Current Challenges and Opportunities © Siemens 2022 | Technology |