Upgrade to Pro — share decks privately, control downloads, hide ads and more …

RL PyTexas 2017

RL PyTexas 2017

Reinforcement learning in Python, PyTexas 2017.

Christine Doig

November 19, 2017
Tweet

More Decks by Christine Doig

Other Decks in Programming

Transcript

  1. Goals • Why reinforcement learning? • Understand basics concepts intuitively

    • How to get started (if you are interested in learning more about reinforcement learning after this talk!) Only Python!
  2. Agenda • What is Reinforcement Learning? • Python libraries for

    Reinforcement learning • Cartpole example in Python • Resources
  3. Oct. 2015 - Beats human professional Go player (v. Fan)

    Mar. 2016 - Beats Lee Sedol (9-dan professional) in five-game match (v. Lee) May 2017 - Beats Ke Jie the world's top Go player (v. Master) October 2017 - AlphaGo Zero beats Alpha Go (v.Lee) (100-0) with an algorithm based solely on reinforcement learning, without human data. Alpha Go "Mastering the game of Go without human knowledge". Nature. 19 October 2017. Retrieved 19 October 2017.
  4. Reinforcement learning An area of machine learning inspired by behaviourist

    psychology, concerned with how software agents ought to take actions in an environment so as to maximize some notion of cumulative reward. [1] Like a human, our agents learn for themselves to achieve successful strategies that lead to the greatest long-term rewards. This paradigm learning by trial-and-error, solely from rewards or punishments, is known as reinforcement learning (RL). [2] [1] https://en.wikipedia.org/wiki/Reinforcement_learning [2] https://deepmind.com/blog/deep-reinforcement-learning/
  5. Concepts • machine learning • agents • actions • environment

    • reward • strategies • trial-and-error
  6. Machine learning Supervised learning no labels labels Unsupervised learning Clustering

    Dimensionality reduction Classification Regression categorical quantitative Reinforcement learning ALGORITHMS K-means Hierarchical clustering PCA T-SNE Logistic Regression SVM Decision trees k-NN Linear Regression Neural Networks Model-free Value-based - Policy-based Model-based Q-learning Policy gradient REINFORCE Dynamic programming MCTS Dyna reward
  7. Machine learning Unsupervised learning Supervised learning no labels labels Clustering

    Exploring Predicting Decision making Classification Regression categorical Reinforcement learning APPLICATIONS Market segmentation Anomaly detection Summarizing information Spam detection Object/face recognition Recommender systems Robotics - Make Humanoid robot walk Games - Defeat Go champion Finance - Trading strategies Dimensionality reduction quantitative Model-free Model-based reward
  8. Example: Trading Agent Actions Environment Reward performs affect generates is

    observed by the buys / sells stocks portfolio win / lose money
  9. Example: Go Agent Actions Environment Reward performs affect generates is

    observed by the Go game win / lose game Make a move
  10. Reinforcement learning concepts Agent Actions Environment Reward performs affect generates

    is observed by the Strategy Goal: Select actions to maximize total future reward
  11. performs Reinforcement learning concepts Agent Actions Environment Reward affect generates

    is observed by the Strategy Model (of the environment) - Trial-and-error
  12. • OpenAI • Gym: Toolkit for developing and comparing reinforcement

    learning algorithms. MIT License, Last commit: November 2017 • baselines: high-quality implementations of reinforcement learning algorithms, MIT License, Last commit: November 2017 • TensorForce, A TensorFlow library for applied reinforcement learning, Apache 2, Last commit: November 2017 • DeepRL, Highly modularized implementation of popular deep RL algorithms by PyTorch, Apache 2 License, Last commit: November 2017 • RLlab, a framework for developing and evaluating reinforcement learning algorithms, MIT License, Last commit: July 2017 • AgentNet, Python library for deep reinforcement learning using Theano+Lasagne, MIT License, Last commit: August 2017 • RLPy, the Reinforcement Learning Library for Education and Research, 3-Clause BSD License, Last commit: April 2016. • PyBrain, the Python Machine Learning Library, 3-Clause BSD License, Last commit: March 2016. Python libraries for Reinforcement learning
  13. Open AI libraries: gym and baselines Agent Actions Environment Reward

    performs affect generates is observed by the State / Observation Gym Baselines
  14. Gym: environments Agent Actions Environment Reward performs is observed by

    the State / Observation Gym action_space: The Space object corresponding to valid actions observation_space: The Space object corresponding to valid observations reward_range: A tuple corresponding to the min and max possible rewards Baselines
  15. CartPole environment example Actions Environment Reward State / Observation Gym

    action_space: left, right observation_space: x, x_dot, theta, theta_dot reward_range: 1 - poll hasn’t fallen 0 - poll has fallen
  16. import gym env = gym.make('CartPole-v0') env.reset() for _ in range(1000):

    env.render() env.step(env.action_space.sample()) # take a random action CartPole environment example https://gym.openai.com/docs/
  17. CartPole agent example Actions Environment Reward affect ç State /

    Observation Open AI Gym action_space: left, right observation_space: x, x_dot, theta, theta_dot reward_range: 1 - poll hasn’t felt 0 - poll has felt “The Algorithm” e.g. DeepQ Baselines
  18. import gym from baselines import deepq env = gym.make(“CartPole-v0”) act

    = deepq.load("cartpole_model.pkl") while True: obs, done = env.reset(), False episode_rew = 0 while not done: env.render() obs, rew, done, _ = env.step(act(obs[None])[0]) episode_rew += rew CartPole agent example https://github.com/openai/baselines/blob/master/baselines/deepq/experiments/enjoy_cartpole.py
  19. Goals review • Why reinforcement learning? Python & Decision making

    applications (Robotics - Make Humanoid robot walk, Games - Defeat Go champion, Finance - Trading strategies) • Understand basics concepts intuitively • machine learning • agents • actions • environment • reward • strategies • trial-and-error • How to get started: • OpenAI: gym, baselines • Cartpole example Agent Actions Environment Reward performs affect generates is observed by the Strategy Goal: Select actions to maximize total future reward Observation / State Model (of the environment) - Trial-and-error
  20. Resources • Reinforcement Learning course by David Silver: http://www0.cs.ucl.ac.uk/staff/ d.silver/web/Teaching.html

    • https://blog.acolyer.org/2017/11/17/mastering-the-game-of-go-without-human- knowledge/ • https://keon.io/deep-q-learning/ • https://rishav1.github.io/rein_learning/2017/01/05/simple-swarm-intelligence- optimization-for-cartpole-balancing-problem.html • AlphaGo Zero's win, what it means, Fast Forward Labs: http:// blog.fastforwardlabs.com/2017/10/25/alphago-zero.html