Upgrade to Pro — share decks privately, control downloads, hide ads and more …

CompilerGym CGO Tutorial: Reinforcement Learning

CompilerGym CGO Tutorial: Reinforcement Learning

Mostafa Elhoushi

March 31, 2022
Tweet

Other Decks in Technology

Transcript

  1. Autotuning & Reinforcement Learning for Compilers with Chris Cummins Hugh

    Leather https://chriscummins.cc/cgo22-compilergym-tutorial Mostafa Elhoushi
  2. Provide a quick overview on what is Reinforcement Learning (RL).

    Understand how we can use CompilerGym for RL in compiler optimization Run a basic script to use RL to optimize instruction code size. RL Tutorial https://compilergym.ai
  3. SL vs RL https://compilergym.ai Data Model Prediction Supervised Learning Reinforcement

    Learning Agent Actions Observations Rewards Environment
  4. SL vs RL https://compilergym.ai Supervised Learning Reinforcement Learning Input: Features

    Output: Prediction Input: (State) Observations Output: Action Model Agent No Feedback into Model Action updates State (Observation), which is then fed into ِAgent We just predict once Agent applies a sequence of Actions Training: learns from a Dataset that consists of Feature-Label pairs Training: by Experience: on the fly, explore different Actions and record the States/Rewards Objective: Minimize Error ( = Prediction - Label) Objective: Maximize (Accumulative) Reward Applications: Recognition, Prediction e.g., Image Recognition, Object Detection, Automatic Speech Recognition, Machine Translation, etc. Applications: Decision Making e.g., Games, Robot Maneuvering, Self-Driving Car Maneuvering
  5. - Agent: the model which you try to design that

    interacts with the Environment. - Environment: Everything which isn’t the Agent; everything the Agent can interact with, either directly or indirectly. - Action: the Agent’s method which allow it to interact and change its Environment - Reward: A numerical value received by the Agent from the Environment as a direct response to the Agent’s actions. - State: Every scenario the Agent encounters in the Environment is formally called a state. We identify the state by measuring Observations. RL Terminology https://compilergym.ai
  6. - Episode: All states that come in between an initial-state

    and a terminal-state. - Policy: The decision of which action to choose given a State. - Value Function, (a.k.a. State-Value Function): the total reward over all steps in an episode - Action-Value Function, (a.k.a. Q-Value): same as Value Function, but starting from a specific step till the end of the episode. RL Terminology https://compilergym.ai