Slide 1

Slide 1 text

Reinforcement learning in Python Christine Doig, PyTexas 2017

Slide 2

Slide 2 text

Goals • Why reinforcement learning? • Understand basics concepts intuitively • How to get started (if you are interested in learning more about reinforcement learning after this talk!) Only Python!

Slide 3

Slide 3 text

Agenda • What is Reinforcement Learning? • Python libraries for Reinforcement learning • Cartpole example in Python • Resources

Slide 4

Slide 4 text

What is reinforcement learning?

Slide 5

Slide 5 text

Oct. 2015 - Beats human professional Go player (v. Fan) Mar. 2016 - Beats Lee Sedol (9-dan professional) in five-game match (v. Lee) May 2017 - Beats Ke Jie the world's top Go player (v. Master) October 2017 - AlphaGo Zero beats Alpha Go (v.Lee) (100-0) with an algorithm based solely on reinforcement learning, without human data. Alpha Go "Mastering the game of Go without human knowledge". Nature. 19 October 2017. Retrieved 19 October 2017.

Slide 6

Slide 6 text

Reinforcement learning An area of machine learning inspired by behaviourist psychology, concerned with how software agents ought to take actions in an environment so as to maximize some notion of cumulative reward. [1] Like a human, our agents learn for themselves to achieve successful strategies that lead to the greatest long-term rewards. This paradigm learning by trial-and-error, solely from rewards or punishments, is known as reinforcement learning (RL). [2] [1] https://en.wikipedia.org/wiki/Reinforcement_learning [2] https://deepmind.com/blog/deep-reinforcement-learning/

Slide 7

Slide 7 text

Concepts • machine learning • agents • actions • environment • reward • strategies • trial-and-error

Slide 8

Slide 8 text

Machine learning Supervised learning no labels labels Unsupervised learning Clustering Dimensionality reduction Classification Regression categorical quantitative Reinforcement learning ALGORITHMS K-means Hierarchical clustering PCA T-SNE Logistic Regression SVM Decision trees k-NN Linear Regression Neural Networks Model-free Value-based - Policy-based Model-based Q-learning Policy gradient REINFORCE Dynamic programming MCTS Dyna reward

Slide 9

Slide 9 text

Machine learning Unsupervised learning Supervised learning no labels labels Clustering Exploring Predicting Decision making Classification Regression categorical Reinforcement learning APPLICATIONS Market segmentation Anomaly detection Summarizing information Spam detection Object/face recognition Recommender systems Robotics - Make Humanoid robot walk Games - Defeat Go champion Finance - Trading strategies Dimensionality reduction quantitative Model-free Model-based reward

Slide 10

Slide 10 text

Reinforcement learning concepts Agent Actions Environment Reward performs affect generates is observed by the Observation / State

Slide 11

Slide 11 text

Example: Trading Agent Actions Environment Reward performs affect generates is observed by the buys / sells stocks portfolio win / lose money

Slide 12

Slide 12 text

Example: Go Agent Actions Environment Reward performs affect generates is observed by the Go game win / lose game Make a move

Slide 13

Slide 13 text

Reinforcement learning concepts Agent Actions Environment Reward performs affect generates is observed by the Strategy Goal: Select actions to maximize total future reward

Slide 14

Slide 14 text

performs Reinforcement learning concepts Agent Actions Environment Reward affect generates is observed by the Strategy Model (of the environment) - Trial-and-error

Slide 15

Slide 15 text

Python libraries for reinforcement learning

Slide 16

Slide 16 text

• OpenAI • Gym: Toolkit for developing and comparing reinforcement learning algorithms. MIT License, Last commit: November 2017 • baselines: high-quality implementations of reinforcement learning algorithms, MIT License, Last commit: November 2017 • TensorForce, A TensorFlow library for applied reinforcement learning, Apache 2, Last commit: November 2017 • DeepRL, Highly modularized implementation of popular deep RL algorithms by PyTorch, Apache 2 License, Last commit: November 2017 • RLlab, a framework for developing and evaluating reinforcement learning algorithms, MIT License, Last commit: July 2017 • AgentNet, Python library for deep reinforcement learning using Theano+Lasagne, MIT License, Last commit: August 2017 • RLPy, the Reinforcement Learning Library for Education and Research, 3-Clause BSD License, Last commit: April 2016. • PyBrain, the Python Machine Learning Library, 3-Clause BSD License, Last commit: March 2016. Python libraries for Reinforcement learning

Slide 17

Slide 17 text

Open AI libraries: gym and baselines Agent Actions Environment Reward performs affect generates is observed by the State / Observation Gym Baselines

Slide 18

Slide 18 text

Gym: environments Agent Actions Environment Reward performs is observed by the State / Observation Gym action_space: The Space object corresponding to valid actions observation_space: The Space object corresponding to valid observations reward_range: A tuple corresponding to the min and max possible rewards Baselines

Slide 19

Slide 19 text

Cartpole example in Python

Slide 20

Slide 20 text

Goal: Keep pole vertical The Cartpole environment

Slide 21

Slide 21 text

CartPole environment example Actions Environment Reward State / Observation Gym action_space: left, right observation_space: x, x_dot, theta, theta_dot reward_range: 1 - poll hasn’t fallen 0 - poll has fallen

Slide 22

Slide 22 text

import gym env = gym.make('CartPole-v0') env.reset() for _ in range(1000): env.render() env.step(env.action_space.sample()) # take a random action CartPole environment example https://gym.openai.com/docs/

Slide 23

Slide 23 text

CartPole agent example Actions Environment Reward affect ç State / Observation Open AI Gym action_space: left, right observation_space: x, x_dot, theta, theta_dot reward_range: 1 - poll hasn’t felt 0 - poll has felt “The Algorithm” e.g. DeepQ Baselines

Slide 24

Slide 24 text

import gym from baselines import deepq env = gym.make(“CartPole-v0”) act = deepq.load("cartpole_model.pkl") while True: obs, done = env.reset(), False episode_rew = 0 while not done: env.render() obs, rew, done, _ = env.step(act(obs[None])[0]) episode_rew += rew CartPole agent example https://github.com/openai/baselines/blob/master/baselines/deepq/experiments/enjoy_cartpole.py

Slide 25

Slide 25 text

Summary

Slide 26

Slide 26 text

Goals review • Why reinforcement learning? Python & Decision making applications (Robotics - Make Humanoid robot walk, Games - Defeat Go champion, Finance - Trading strategies) • Understand basics concepts intuitively • machine learning • agents • actions • environment • reward • strategies • trial-and-error • How to get started: • OpenAI: gym, baselines • Cartpole example Agent Actions Environment Reward performs affect generates is observed by the Strategy Goal: Select actions to maximize total future reward Observation / State Model (of the environment) - Trial-and-error

Slide 27

Slide 27 text

Resources

Slide 28

Slide 28 text

Resources • Reinforcement Learning course by David Silver: http://www0.cs.ucl.ac.uk/staff/ d.silver/web/Teaching.html • https://blog.acolyer.org/2017/11/17/mastering-the-game-of-go-without-human- knowledge/ • https://keon.io/deep-q-learning/ • https://rishav1.github.io/rein_learning/2017/01/05/simple-swarm-intelligence- optimization-for-cartpole-balancing-problem.html • AlphaGo Zero's win, what it means, Fast Forward Labs: http:// blog.fastforwardlabs.com/2017/10/25/alphago-zero.html

Slide 29

Slide 29 text

Thank you! @ch_doig Slides at: https://speakerdeck.com/chdoig