Output: Prediction Input: (State) Observations Output: Action Model Agent No Feedback into Model Action updates State (Observation), which is then fed into ِAgent We just predict once Agent applies a sequence of Actions Training: learns from a Dataset that consists of Feature-Label pairs Training: by Experience: on the fly, explore different Actions and record the States/Rewards Objective: Minimize Error ( = Prediction - Label) Objective: Maximize (Accumulative) Reward Applications: Recognition, Prediction e.g., Image Recognition, Object Detection, Automatic Speech Recognition, Machine Translation, etc. Applications: Decision Making e.g., Games, Robot Maneuvering, Self-Driving Car Maneuvering