CompilerGym CGO Tutorial: Reinforcement Learning

Autotuning & Reinforcement Learning for Compilers with Chris Cummins Hugh
Leather https://chriscummins.cc/cgo22-compilergym-tutorial Mostafa Elhoushi

Provide a quick overview on what is Reinforcement Learning (RL).
Understand how we can use CompilerGym for RL in compiler optimization Run a basic script to use RL to optimize instruction code size. RL Tutorial https://compilergym.ai

What is RL? https://compilergym.ai Machine Learning Reinforcement Learning Supervised Learning
Unsupervised Learning

SL vs RL https://compilergym.ai Data Model Prediction Supervised Learning Reinforcement
Learning Agent Actions Observations Rewards Environment

SL vs RL https://compilergym.ai Supervised Learning Reinforcement Learning Input: Features
Output: Prediction Input: (State) Observations Output: Action Model Agent No Feedback into Model Action updates State (Observation), which is then fed into ِAgent We just predict once Agent applies a sequence of Actions Training: learns from a Dataset that consists of Feature-Label pairs Training: by Experience: on the fly, explore different Actions and record the States/Rewards Objective: Minimize Error ( = Prediction - Label) Objective: Maximize (Accumulative) Reward Applications: Recognition, Prediction e.g., Image Recognition, Object Detection, Automatic Speech Recognition, Machine Translation, etc. Applications: Decision Making e.g., Games, Robot Maneuvering, Self-Driving Car Maneuvering

- Agent: the model which you try to design that
interacts with the Environment. - Environment: Everything which isn’t the Agent; everything the Agent can interact with, either directly or indirectly. - Action: the Agent’s method which allow it to interact and change its Environment - Reward: A numerical value received by the Agent from the Environment as a direct response to the Agent’s actions. - State: Every scenario the Agent encounters in the Environment is formally called a state. We identify the state by measuring Observations. RL Terminology https://compilergym.ai

- Episode: All states that come in between an initial-state
and a terminal-state. - Policy: The decision of which action to choose given a State. - Value Function, (a.k.a. State-Value Function): the total reward over all steps in an episode - Action-Value Function, (a.k.a. Q-Value): same as Value Function, but starting from a speciﬁc step till the end of the episode. RL Terminology https://compilergym.ai

Have fun! https://compilergym.ai

CompilerGym CGO Tutorial: Reinforcement Learning

CompilerGym CGO Tutorial: Reinforcement Learning

Mostafa Elhoushi

Other Decks in Technology

Featured

Transcript

Autotuning & Reinforcement Learning for Compilers with Chris Cummins Hugh

Provide a quick overview on what is Reinforcement Learning (RL).

What is RL? https://compilergym.ai Machine Learning Reinforcement Learning Supervised Learning

SL vs RL https://compilergym.ai Data Model Prediction Supervised Learning Reinforcement

SL vs RL https://compilergym.ai Supervised Learning Reinforcement Learning Input: Features

- Agent: the model which you try to design that

- Episode: All states that come in between an initial-state

Have fun! https://compilergym.ai