Goals • Why reinforcement learning? • Understand basics concepts intuitively • How to get started (if you are interested in learning more about reinforcement learning after this talk!) Only Python!
Oct. 2015 - Beats human professional Go player (v. Fan) Mar. 2016 - Beats Lee Sedol (9-dan professional) in five-game match (v. Lee) May 2017 - Beats Ke Jie the world's top Go player (v. Master) October 2017 - AlphaGo Zero beats Alpha Go (v.Lee) (100-0) with an algorithm based solely on reinforcement learning, without human data. Alpha Go "Mastering the game of Go without human knowledge". Nature. 19 October 2017. Retrieved 19 October 2017.
Reinforcement learning An area of machine learning inspired by behaviourist psychology, concerned with how software agents ought to take actions in an environment so as to maximize some notion of cumulative reward. [1] Like a human, our agents learn for themselves to achieve successful strategies that lead to the greatest long-term rewards. This paradigm learning by trial-and-error, solely from rewards or punishments, is known as reinforcement learning (RL). [2] [1] https://en.wikipedia.org/wiki/Reinforcement_learning [2] https://deepmind.com/blog/deep-reinforcement-learning/
Reinforcement learning concepts Agent Actions Environment Reward performs affect generates is observed by the Strategy Goal: Select actions to maximize total future reward
performs Reinforcement learning concepts Agent Actions Environment Reward affect generates is observed by the Strategy Model (of the environment) - Trial-and-error
• OpenAI • Gym: Toolkit for developing and comparing reinforcement learning algorithms. MIT License, Last commit: November 2017 • baselines: high-quality implementations of reinforcement learning algorithms, MIT License, Last commit: November 2017 • TensorForce, A TensorFlow library for applied reinforcement learning, Apache 2, Last commit: November 2017 • DeepRL, Highly modularized implementation of popular deep RL algorithms by PyTorch, Apache 2 License, Last commit: November 2017 • RLlab, a framework for developing and evaluating reinforcement learning algorithms, MIT License, Last commit: July 2017 • AgentNet, Python library for deep reinforcement learning using Theano+Lasagne, MIT License, Last commit: August 2017 • RLPy, the Reinforcement Learning Library for Education and Research, 3-Clause BSD License, Last commit: April 2016. • PyBrain, the Python Machine Learning Library, 3-Clause BSD License, Last commit: March 2016. Python libraries for Reinforcement learning
Gym: environments Agent Actions Environment Reward performs is observed by the State / Observation Gym action_space: The Space object corresponding to valid actions observation_space: The Space object corresponding to valid observations reward_range: A tuple corresponding to the min and max possible rewards Baselines
import gym env = gym.make('CartPole-v0') env.reset() for _ in range(1000): env.render() env.step(env.action_space.sample()) # take a random action CartPole environment example https://gym.openai.com/docs/
CartPole agent example Actions Environment Reward affect ç State / Observation Open AI Gym action_space: left, right observation_space: x, x_dot, theta, theta_dot reward_range: 1 - poll hasn’t felt 0 - poll has felt “The Algorithm” e.g. DeepQ Baselines