Mario Bros meets reinforcement learning

Mario meets Mario meets reinforcement learning reinforcement learning

Cristian Vargas Lead developer at Swapps Electronic engineer PythonCali and
Calidev organizer @cdvv7788 https://github.com/cdvv7788 About Me About Me

What is Reinforcement What is Reinforcement Learning? Learning? Environment Agent
Action Interpreter Reward State

Agent Agent

Environment Environment Taken from: https://www.researchgate.net/figure/The-flag-pole-at-the-end-of-Super- Mario-Bros-Level-1-1-Marios-designer-Miyamoto_fig3_271823214

State and Action State and Action Taken from: https://www.primagames.com/tips/super-mario-bros-tips-and-secrets- nes-classic-edition

Reward Reward Taken from: http://pngimg.com/download/36890

Repeat until it works...or Repeat until it works...or you give
up trying you give up trying Taken from: https://www.livechatinc.com/blog/what-is-artiﬁcial-intelligence-ai/

OpenAI Gym OpenAI Gym https:/ /gym.openai.com/ https:/ /gym.openai.com/

Mario Library for Gym Mario Library for Gym https:/ /github.com/Kautenja/gym-
https:/ /github.com/Kautenja/gym- super-mario-bros super-mario-bros

Basic program Basic program import gym from gym import wrappers
from random_agent import RandomAgent from nes_py.wrappers import BinarySpaceToDiscreteSpaceEnv import gym_super_mario_bros from gym_super_mario_bros.actions import SIMPLE_MOVEMENT agent = RandomAgent() env = gym_super_mario_bros.make('SuperMarioBros-1-1-v0') env = BinarySpaceToDiscreteSpaceEnv(env, SIMPLE_MOVEMENT) env = gym.wrappers.Monitor(env, agent.get_recording_folder(), video_callable=lambda episode_id: episode_id % 2 == 0, force=True) done = True episode = 1 while True: if done: print('Restarting env. Attempt # {}'.format(episode)) state = env.reset() episode += 1 state, reward, done, info = agent.run(env) env.close() 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26

Random Agent Random Agent Duuuhhh Duuuhhh class RandomAgent(Agent): """ Randomly
executes an action from the action space. Has no memory nor intelligence at all. """ def get_recording_folder(self): return './random' def run(self, env): state, reward, done, info = env.step(env.action_space.sample()) return state, reward, done, inf 1 2 3 4 5 6 7 8 9 10 11

Random Agent Random Agent Oops? Oops?

Q-Learning Q-Learning

Q-Learning Q-Learning Is this applicable to the Mario game? Possible
number of states, assuming we feed a single image at once, resized to 84x84 and changed to grayscale: 84*84*4*256 = 7'225,344

Q-Learning Q-Learning Preprocessing steps I followed the approach proposed in
the original and based earlier work on a . deepmind paper toptal post The approach is: 1. Scale to 84x84 2. Take only every 4th frame (frameskip) 3. Take 4 of these frames to create an input of 84x84x4

Q-Learning Q-Learning Preprocessing steps

Deep Q-Learning Deep Q-Learning In short terms, we approximate the
q-table using neural networks Challenge: Input data is highly correlated, breaking the assumption for gradient descent that the input must be independent and identically distributed random variables( i.i.d)

Deep Q-Learning Deep Q-Learning 1. Easy to overﬁt 2. Stuck
into local minimum 3. Quickly forgets previous experiences In general terms, using neural networks with reinforcement learning is an unstable process

Deep Q-Learning Deep Q-Learning Network Architecture class NeuralNetwork(nn.Module): def __init__(self,
number_of_actions): super(NeuralNetwork, self).__init__() self.conv1 = nn.Conv2d(4, 32, 8, 4) self.relu1 = nn.ReLU(inplace=True) self.conv2 = nn.Conv2d(32, 64, 4, 2) self.relu2 = nn.ReLU(inplace=True) self.conv3 = nn.Conv2d(64, 64, 3, 1) self.relu3 = nn.ReLU(inplace=True) self.fc1 = nn.Linear(3136, 512) self.relu4 = nn.ReLU(inplace=True) self.fc2 = nn.Linear(512, self.number_of_actions) 1 2 3 4 5 6 7 8 9 10 11 12 13

Deep Q-Learning Deep Q-Learning

Experience Replay Experience Replay 1. We save every new step
in a buffer of defined length. 2. We train the neural network using random samples of defined length taken from the memory. 3. We discard the oldest entries in the memory when it is full.

Experience Replay Experience Replay

Double Deep Q Double Deep Q (DDQN) (DDQN) DQN suffers
of overconfidence on its estimations. This means that it can propagate noisy estimation values across the entire Q-table. This affects the stability of the algorithm.

Double Deep Q Double Deep Q (DDQN) (DDQN) A proposed
solution named attacks this problem. Double Learning Keep 2 diﬀerent tables. Take the action from table #1, and the value for the proposed action from the table #2.

Double Deep Q Double Deep Q (DDQN) (DDQN)

Double Deep Q Double Deep Q EPOCH 0

Next Steps Next Steps Prioritized Experience Replay Epsilon alternative algorithms
Asynchronous advantage actor critic (A3C) A lot more approaches, like IMPALA

Thank You Thank You

Mario Bros meets reinforcement learning

Mario Bros meets reinforcement learning

Python Pereira

More Decks by Python Pereira

Other Decks in Technology

Featured

Transcript

Mario meets Mario meets reinforcement learning reinforcement learning

Cristian Vargas Lead developer at Swapps Electronic engineer PythonCali and

What is Reinforcement What is Reinforcement Learning? Learning? Environment Agent

Agent Agent

Environment Environment Taken from: https://www.researchgate.net/figure/The-flag-pole-at-the-end-of-Super- Mario-Bros-Level-1-1-Marios-designer-Miyamoto_fig3_271823214

State and Action State and Action Taken from: https://www.primagames.com/tips/super-mario-bros-tips-and-secrets- nes-classic-edition

Reward Reward Taken from: http://pngimg.com/download/36890

Repeat until it works...or Repeat until it works...or you give

OpenAI Gym OpenAI Gym https:/ /gym.openai.com/ https:/ /gym.openai.com/

Mario Library for Gym Mario Library for Gym https:/ /github.com/Kautenja/gym-

Basic program Basic program import gym from gym import wrappers

Random Agent Random Agent Duuuhhh Duuuhhh class RandomAgent(Agent): """ Randomly

Random Agent Random Agent Oops? Oops?

Q-Learning Q-Learning

Q-Learning Q-Learning Is this applicable to the Mario game? Possible

Q-Learning Q-Learning Preprocessing steps I followed the approach proposed in

Q-Learning Q-Learning Preprocessing steps

Deep Q-Learning Deep Q-Learning In short terms, we approximate the

Deep Q-Learning Deep Q-Learning 1. Easy to overﬁt 2. Stuck

Deep Q-Learning Deep Q-Learning Network Architecture class NeuralNetwork(nn.Module): def init(self,

Deep Q-Learning Deep Q-Learning

Experience Replay Experience Replay 1. We save every new step

Experience Replay Experience Replay

Double Deep Q Double Deep Q (DDQN) (DDQN) DQN suﬀers

Double Deep Q Double Deep Q (DDQN) (DDQN) A proposed

Double Deep Q Double Deep Q (DDQN) (DDQN)

Double Deep Q Double Deep Q EPOCH 0

Double Deep Q Double Deep Q EPOCH 10000

Double Deep Q Double Deep Q EPOCH 16000

Next Steps Next Steps Prioritized Experience Replay Epsilon alternative algorithms

Thank You Thank You