RL PyTexas 2017

Reinforcement learning in Python Christine Doig, PyTexas 2017

Goals • Why reinforcement learning? • Understand basics concepts intuitively
• How to get started (if you are interested in learning more about reinforcement learning after this talk!) Only Python!

Agenda • What is Reinforcement Learning? • Python libraries for
Reinforcement learning • Cartpole example in Python • Resources

What is reinforcement learning?

Oct. 2015 - Beats human professional Go player (v. Fan)
Mar. 2016 - Beats Lee Sedol (9-dan professional) in five-game match (v. Lee) May 2017 - Beats Ke Jie the world's top Go player (v. Master) October 2017 - AlphaGo Zero beats Alpha Go (v.Lee) (100-0) with an algorithm based solely on reinforcement learning, without human data. Alpha Go "Mastering the game of Go without human knowledge". Nature. 19 October 2017. Retrieved 19 October 2017.

Reinforcement learning An area of machine learning inspired by behaviourist
psychology, concerned with how software agents ought to take actions in an environment so as to maximize some notion of cumulative reward. [1] Like a human, our agents learn for themselves to achieve successful strategies that lead to the greatest long-term rewards. This paradigm learning by trial-and-error, solely from rewards or punishments, is known as reinforcement learning (RL). [2] [1] https://en.wikipedia.org/wiki/Reinforcement_learning [2] https://deepmind.com/blog/deep-reinforcement-learning/

Concepts • machine learning • agents • actions • environment
• reward • strategies • trial-and-error

Machine learning Supervised learning no labels labels Unsupervised learning Clustering
Dimensionality reduction Classiﬁcation Regression categorical quantitative Reinforcement learning ALGORITHMS K-means Hierarchical clustering PCA T-SNE Logistic Regression SVM Decision trees k-NN Linear Regression Neural Networks Model-free Value-based - Policy-based Model-based Q-learning Policy gradient REINFORCE Dynamic programming MCTS Dyna reward

Machine learning Unsupervised learning Supervised learning no labels labels Clustering
Exploring Predicting Decision making Classiﬁcation Regression categorical Reinforcement learning APPLICATIONS Market segmentation Anomaly detection Summarizing information Spam detection Object/face recognition Recommender systems Robotics - Make Humanoid robot walk Games - Defeat Go champion Finance - Trading strategies Dimensionality reduction quantitative Model-free Model-based reward

Reinforcement learning concepts Agent Actions Environment Reward performs affect generates
is observed by the Observation / State

Example: Trading Agent Actions Environment Reward performs affect generates is
observed by the buys / sells stocks portfolio win / lose money

Example: Go Agent Actions Environment Reward performs affect generates is
observed by the Go game win / lose game Make a move

Reinforcement learning concepts Agent Actions Environment Reward performs affect generates
is observed by the Strategy Goal: Select actions to maximize total future reward

performs Reinforcement learning concepts Agent Actions Environment Reward affect generates
is observed by the Strategy Model (of the environment) - Trial-and-error

Python libraries for reinforcement learning

• OpenAI • Gym: Toolkit for developing and comparing reinforcement
learning algorithms. MIT License, Last commit: November 2017 • baselines: high-quality implementations of reinforcement learning algorithms, MIT License, Last commit: November 2017 • TensorForce, A TensorFlow library for applied reinforcement learning, Apache 2, Last commit: November 2017 • DeepRL, Highly modularized implementation of popular deep RL algorithms by PyTorch, Apache 2 License, Last commit: November 2017 • RLlab, a framework for developing and evaluating reinforcement learning algorithms, MIT License, Last commit: July 2017 • AgentNet, Python library for deep reinforcement learning using Theano+Lasagne, MIT License, Last commit: August 2017 • RLPy, the Reinforcement Learning Library for Education and Research, 3-Clause BSD License, Last commit: April 2016. • PyBrain, the Python Machine Learning Library, 3-Clause BSD License, Last commit: March 2016. Python libraries for Reinforcement learning

Open AI libraries: gym and baselines Agent Actions Environment Reward
performs affect generates is observed by the State / Observation Gym Baselines

Gym: environments Agent Actions Environment Reward performs is observed by
the State / Observation Gym action_space: The Space object corresponding to valid actions observation_space: The Space object corresponding to valid observations reward_range: A tuple corresponding to the min and max possible rewards Baselines

Cartpole example in Python

Goal: Keep pole vertical The Cartpole environment

CartPole environment example Actions Environment Reward State / Observation Gym
action_space: left, right observation_space: x, x_dot, theta, theta_dot reward_range: 1 - poll hasn’t fallen 0 - poll has fallen

import gym env = gym.make('CartPole-v0') env.reset() for _ in range(1000):
env.render() env.step(env.action_space.sample()) # take a random action CartPole environment example https://gym.openai.com/docs/

CartPole agent example Actions Environment Reward affect ç State /
Observation Open AI Gym action_space: left, right observation_space: x, x_dot, theta, theta_dot reward_range: 1 - poll hasn’t felt 0 - poll has felt “The Algorithm” e.g. DeepQ Baselines

import gym from baselines import deepq env = gym.make(“CartPole-v0”) act
= deepq.load("cartpole_model.pkl") while True: obs, done = env.reset(), False episode_rew = 0 while not done: env.render() obs, rew, done, _ = env.step(act(obs[None])[0]) episode_rew += rew CartPole agent example https://github.com/openai/baselines/blob/master/baselines/deepq/experiments/enjoy_cartpole.py

Summary

Goals review • Why reinforcement learning? Python & Decision making
applications (Robotics - Make Humanoid robot walk, Games - Defeat Go champion, Finance - Trading strategies) • Understand basics concepts intuitively • machine learning • agents • actions • environment • reward • strategies • trial-and-error • How to get started: • OpenAI: gym, baselines • Cartpole example Agent Actions Environment Reward performs affect generates is observed by the Strategy Goal: Select actions to maximize total future reward Observation / State Model (of the environment) - Trial-and-error

Resources

Resources • Reinforcement Learning course by David Silver: http://www0.cs.ucl.ac.uk/staff/ d.silver/web/Teaching.html
• https://blog.acolyer.org/2017/11/17/mastering-the-game-of-go-without-human- knowledge/ • https://keon.io/deep-q-learning/ • https://rishav1.github.io/rein_learning/2017/01/05/simple-swarm-intelligence- optimization-for-cartpole-balancing-problem.html • AlphaGo Zero's win, what it means, Fast Forward Labs: http:// blog.fastforwardlabs.com/2017/10/25/alphago-zero.html

Thank you! @ch_doig Slides at: https://speakerdeck.com/chdoig

RL PyTexas 2017

RL PyTexas 2017

Christine Doig

More Decks by Christine Doig

Other Decks in Programming

Featured

Transcript

Reinforcement learning in Python Christine Doig, PyTexas 2017

Goals • Why reinforcement learning? • Understand basics concepts intuitively

Agenda • What is Reinforcement Learning? • Python libraries for

What is reinforcement learning?

Oct. 2015 - Beats human professional Go player (v. Fan)

Reinforcement learning An area of machine learning inspired by behaviourist

Concepts • machine learning • agents • actions • environment

Machine learning Supervised learning no labels labels Unsupervised learning Clustering

Machine learning Unsupervised learning Supervised learning no labels labels Clustering

Reinforcement learning concepts Agent Actions Environment Reward performs affect generates

Example: Trading Agent Actions Environment Reward performs affect generates is

Example: Go Agent Actions Environment Reward performs affect generates is

Reinforcement learning concepts Agent Actions Environment Reward performs affect generates

performs Reinforcement learning concepts Agent Actions Environment Reward affect generates

Python libraries for reinforcement learning

• OpenAI • Gym: Toolkit for developing and comparing reinforcement

Open AI libraries: gym and baselines Agent Actions Environment Reward

Gym: environments Agent Actions Environment Reward performs is observed by

Cartpole example in Python

Goal: Keep pole vertical The Cartpole environment

CartPole environment example Actions Environment Reward State / Observation Gym

import gym env = gym.make('CartPole-v0') env.reset() for _ in range(1000):

CartPole agent example Actions Environment Reward affect ç State /

import gym from baselines import deepq env = gym.make(“CartPole-v0”) act

Summary

Goals review • Why reinforcement learning? Python & Decision making

Resources

Resources • Reinforcement Learning course by David Silver: http://www0.cs.ucl.ac.uk/staff/ d.silver/web/Teaching.html

Thank you! @ch_doig Slides at: https://speakerdeck.com/chdoig