Building Autonomous Agents with gym-retro

@Kartones

observation, reward, done, info = environment.step(action)

action = environment.action_space.sample()

paddle_size = 24 paddle_safe_margin = paddle_size/4 if last_info: go_left =
1 if (last_info['player_x_start'] + paddle_safe_margin > last_info['ball_x']) else 0 go_right = 1 if (last_info['player_x_end'] - paddle_safe_margin < last_info['ball_x']) else 0 # ["B", null, "SELECT", "START", "UP", "DOWN", "LEFT", "RIGHT", "A"] action = [0, 0, 0, 0, 0, 0, go_left, go_right, self.env.current_time % 2]

probability = numpy.random.random() if probability < EPSILON: return self._random_movement() else:
if self.env.current_time < self.best_time: action = self.best_actions[self.env.current_time] else: return self._ai_movement(last_info)

def _epsilon_value(self): if self.USE_DECAYING_EPSILON: # with 1 becomes practically random
return 0.1 / (self.env.current_time + 1) else: return self.EPSILON

Why decaying epsilon = 0.1? 0.1 * 5 = 0.02
0.1 * 250 = 0.0004 0.1 * 500 = 0.0002 0.1 * 1000 = 0.0001

soon at : github.com/kartones

github.com/openai/retro-baselines/agents

• Intro documentation: blog.openai.com/gym-retro/ • Example code and documentation: github.com/openai/retro/
github.com/openai/retro-baselines/ "gotta_learn_fast_report.pdf" • Game integration guide: github.com/openai/retro/blob/develop/IntegratorsGuide.md • Contests and other resources: openai.com • JERK sources: www.noob-programmer.com/openai-retro-contest/jerk-agent-algorithm/

Building Autonomous Agents with gym-retro

Building Autonomous Agents with gym-retro

Kartones

More Decks by Kartones

Other Decks in Programming

Featured

Transcript

@Kartones

observation, reward, done, info = environment.step(action)

action = environment.action_space.sample()

paddle_size = 24 paddle_safe_margin = paddle_size/4 if last_info: go_left =

probability = numpy.random.random() if probability < EPSILON: return self._random_movement() else:

def _epsilon_value(self): if self.USE_DECAYING_EPSILON: # with 1 becomes practically random

Why decaying epsilon = 0.1? 0.1 * 5 = 0.02

soon at : github.com/kartones

github.com/openai/retro-baselines/agents

• Intro documentation: blog.openai.com/gym-retro/ • Example code and documentation: github.com/openai/retro/