CompilerGym CGO Tutorial: Reinforcement Learning

Slide 1

Slide 1 text

Autotuning & Reinforcement Learning for Compilers with Chris Cummins Hugh Leather https://chriscummins.cc/cgo22-compilergym-tutorial Mostafa Elhoushi

Slide 2

Slide 2 text

Provide a quick overview on what is Reinforcement Learning (RL). Understand how we can use CompilerGym for RL in compiler optimization Run a basic script to use RL to optimize instruction code size. RL Tutorial https://compilergym.ai

Slide 3

Slide 3 text

What is RL? https://compilergym.ai Machine Learning Reinforcement Learning Supervised Learning Unsupervised Learning

Slide 4

Slide 4 text

SL vs RL https://compilergym.ai Data Model Prediction Supervised Learning Reinforcement Learning Agent Actions Observations Rewards Environment

Slide 5

Slide 5 text

SL vs RL https://compilergym.ai Supervised Learning Reinforcement Learning Input: Features Output: Prediction Input: (State) Observations Output: Action Model Agent No Feedback into Model Action updates State (Observation), which is then fed into ِAgent We just predict once Agent applies a sequence of Actions Training: learns from a Dataset that consists of Feature-Label pairs Training: by Experience: on the fly, explore different Actions and record the States/Rewards Objective: Minimize Error ( = Prediction - Label) Objective: Maximize (Accumulative) Reward Applications: Recognition, Prediction e.g., Image Recognition, Object Detection, Automatic Speech Recognition, Machine Translation, etc. Applications: Decision Making e.g., Games, Robot Maneuvering, Self-Driving Car Maneuvering

Slide 6

Slide 6 text

- Agent: the model which you try to design that interacts with the Environment. - Environment: Everything which isn’t the Agent; everything the Agent can interact with, either directly or indirectly. - Action: the Agent’s method which allow it to interact and change its Environment - Reward: A numerical value received by the Agent from the Environment as a direct response to the Agent’s actions. - State: Every scenario the Agent encounters in the Environment is formally called a state. We identify the state by measuring Observations. RL Terminology https://compilergym.ai

Slide 7

Slide 7 text

- Episode: All states that come in between an initial-state and a terminal-state. - Policy: The decision of which action to choose given a State. - Value Function, (a.k.a. State-Value Function): the total reward over all steps in an episode - Action-Value Function, (a.k.a. Q-Value): same as Value Function, but starting from a speciﬁc step till the end of the episode. RL Terminology https://compilergym.ai

Slide 8

Slide 8 text

Have fun! https://compilergym.ai