Slide 5
Slide 5 text
SL vs RL
https://compilergym.ai
Supervised Learning Reinforcement Learning
Input: Features
Output: Prediction
Input: (State) Observations
Output: Action
Model Agent
No Feedback into Model Action updates State (Observation), which is then fed
into ِAgent
We just predict once Agent applies a sequence of Actions
Training: learns from a Dataset that consists of
Feature-Label pairs
Training: by Experience: on the fly, explore different
Actions and record the States/Rewards
Objective: Minimize Error ( = Prediction - Label) Objective: Maximize (Accumulative) Reward
Applications: Recognition, Prediction
e.g., Image Recognition, Object Detection, Automatic Speech
Recognition, Machine Translation, etc.
Applications: Decision Making
e.g., Games, Robot Maneuvering, Self-Driving Car Maneuvering