Upgrade to Pro — share decks privately, control downloads, hide ads and more …

RL PyTexas 2017

RL PyTexas 2017

Reinforcement learning in Python, PyTexas 2017.

Christine Doig

November 19, 2017
Tweet

More Decks by Christine Doig

Other Decks in Programming

Transcript

  1. Reinforcement learning
    in Python
    Christine Doig, PyTexas 2017

    View Slide

  2. Goals
    • Why reinforcement learning?
    • Understand basics concepts intuitively
    • How to get started (if you are interested in learning more about
    reinforcement learning after this talk!)
    Only Python!

    View Slide

  3. Agenda
    • What is Reinforcement Learning?
    • Python libraries for Reinforcement learning
    • Cartpole example in Python
    • Resources

    View Slide

  4. What is reinforcement learning?

    View Slide

  5. Oct. 2015 - Beats human professional Go
    player (v. Fan)
    Mar. 2016 - Beats Lee Sedol (9-dan
    professional) in five-game match (v. Lee)
    May 2017 - Beats Ke Jie the world's top Go
    player (v. Master)
    October 2017 - AlphaGo Zero beats Alpha Go
    (v.Lee) (100-0) with an algorithm based solely
    on reinforcement learning, without human data.
    Alpha Go
    "Mastering the game of Go without human knowledge". Nature. 19 October 2017. Retrieved 19 October 2017.

    View Slide

  6. Reinforcement learning
    An area of machine learning inspired by behaviourist psychology,
    concerned with how software agents ought to take actions in an
    environment so as to maximize some notion of cumulative reward. [1]
    Like a human, our agents learn for themselves to achieve successful
    strategies that lead to the greatest long-term rewards. This paradigm
    learning by trial-and-error, solely from rewards or punishments, is
    known as reinforcement learning (RL). [2]
    [1] https://en.wikipedia.org/wiki/Reinforcement_learning
    [2] https://deepmind.com/blog/deep-reinforcement-learning/

    View Slide

  7. Concepts
    • machine learning
    • agents
    • actions
    • environment
    • reward
    • strategies
    • trial-and-error

    View Slide

  8. Machine learning
    Supervised
    learning
    no labels labels
    Unsupervised
    learning
    Clustering Dimensionality
    reduction
    Classification Regression
    categorical quantitative
    Reinforcement
    learning
    ALGORITHMS
    K-means
    Hierarchical
    clustering
    PCA
    T-SNE
    Logistic
    Regression
    SVM
    Decision trees
    k-NN
    Linear
    Regression
    Neural
    Networks
    Model-free
    Value-based - Policy-based
    Model-based
    Q-learning
    Policy gradient
    REINFORCE
    Dynamic
    programming
    MCTS
    Dyna
    reward

    View Slide

  9. Machine learning
    Unsupervised
    learning
    Supervised
    learning
    no labels labels
    Clustering
    Exploring Predicting Decision making
    Classification Regression
    categorical
    Reinforcement
    learning
    APPLICATIONS
    Market segmentation
    Anomaly detection
    Summarizing information
    Spam detection
    Object/face recognition
    Recommender systems
    Robotics - Make Humanoid robot walk
    Games - Defeat Go champion
    Finance - Trading strategies
    Dimensionality
    reduction
    quantitative
    Model-free Model-based
    reward

    View Slide

  10. Reinforcement learning concepts
    Agent
    Actions
    Environment
    Reward
    performs
    affect
    generates
    is observed by the
    Observation /
    State

    View Slide

  11. Example: Trading
    Agent
    Actions
    Environment
    Reward
    performs
    affect
    generates
    is observed by the
    buys / sells stocks
    portfolio
    win / lose money

    View Slide

  12. Example: Go
    Agent
    Actions
    Environment
    Reward
    performs
    affect
    generates
    is observed by the
    Go game
    win / lose game
    Make a move

    View Slide

  13. Reinforcement learning concepts
    Agent
    Actions
    Environment
    Reward
    performs
    affect
    generates
    is observed by the
    Strategy
    Goal: Select actions to
    maximize total future
    reward

    View Slide

  14. performs
    Reinforcement learning concepts
    Agent
    Actions
    Environment
    Reward
    affect
    generates
    is observed by the
    Strategy
    Model (of the
    environment)
    - Trial-and-error

    View Slide

  15. Python libraries for

    reinforcement learning

    View Slide

  16. • OpenAI
    • Gym: Toolkit for developing and comparing reinforcement learning algorithms. MIT License, Last commit:
    November 2017
    • baselines: high-quality implementations of reinforcement learning algorithms, MIT License, Last commit: November
    2017
    • TensorForce, A TensorFlow library for applied reinforcement learning, Apache 2, Last commit: November 2017
    • DeepRL, Highly modularized implementation of popular deep RL algorithms by PyTorch, Apache 2 License, Last commit:
    November 2017
    • RLlab, a framework for developing and evaluating reinforcement learning algorithms, MIT License, Last commit: July 2017
    • AgentNet, Python library for deep reinforcement learning using Theano+Lasagne, MIT License, Last commit: August 2017
    • RLPy, the Reinforcement Learning Library for Education and Research, 3-Clause BSD License, Last commit: April 2016.
    • PyBrain, the Python Machine Learning Library, 3-Clause BSD License, Last commit: March 2016.
    Python libraries for Reinforcement learning

    View Slide

  17. Open AI libraries: gym and baselines
    Agent
    Actions
    Environment
    Reward
    performs
    affect
    generates
    is observed by the
    State / Observation Gym
    Baselines

    View Slide

  18. Gym: environments
    Agent
    Actions
    Environment
    Reward
    performs
    is observed by the
    State / Observation Gym
    action_space: The Space object
    corresponding to valid actions
    observation_space: The
    Space object corresponding
    to valid observations
    reward_range: A
    tuple corresponding to the
    min and max possible
    rewards
    Baselines

    View Slide

  19. Cartpole example in Python

    View Slide

  20. Goal: Keep pole vertical
    The Cartpole environment

    View Slide

  21. CartPole environment example
    Actions
    Environment
    Reward State / Observation Gym
    action_space: left, right
    observation_space:
    x, x_dot, theta, theta_dot
    reward_range:
    1 - poll hasn’t fallen
    0 - poll has fallen

    View Slide

  22. import gym
    env = gym.make('CartPole-v0')
    env.reset()
    for _ in range(1000):
    env.render()
    env.step(env.action_space.sample()) # take a random action
    CartPole environment example
    https://gym.openai.com/docs/

    View Slide

  23. CartPole agent example
    Actions
    Environment
    Reward
    affect
    ç
    State / Observation Open AI Gym
    action_space: left, right
    observation_space:
    x, x_dot, theta, theta_dot
    reward_range:
    1 - poll hasn’t felt
    0 - poll has felt
    “The Algorithm” e.g. DeepQ
    Baselines

    View Slide

  24. import gym
    from baselines import deepq
    env = gym.make(“CartPole-v0”)
    act = deepq.load("cartpole_model.pkl")
    while True:
    obs, done = env.reset(), False
    episode_rew = 0
    while not done:
    env.render()
    obs, rew, done, _ = env.step(act(obs[None])[0])
    episode_rew += rew
    CartPole agent example
    https://github.com/openai/baselines/blob/master/baselines/deepq/experiments/enjoy_cartpole.py

    View Slide

  25. Summary

    View Slide

  26. Goals review
    • Why reinforcement learning? Python & Decision making applications (Robotics - Make
    Humanoid robot walk, Games - Defeat Go champion, Finance - Trading strategies)
    • Understand basics concepts intuitively
    • machine learning
    • agents
    • actions
    • environment
    • reward
    • strategies
    • trial-and-error
    • How to get started:
    • OpenAI: gym, baselines
    • Cartpole example
    Agent
    Actions
    Environment
    Reward
    performs
    affect
    generates
    is observed by the
    Strategy
    Goal: Select actions to
    maximize total future
    reward
    Observation /
    State
    Model (of the
    environment)
    - Trial-and-error

    View Slide

  27. Resources

    View Slide

  28. Resources
    • Reinforcement Learning course by David Silver: http://www0.cs.ucl.ac.uk/staff/
    d.silver/web/Teaching.html
    • https://blog.acolyer.org/2017/11/17/mastering-the-game-of-go-without-human-
    knowledge/
    • https://keon.io/deep-q-learning/
    • https://rishav1.github.io/rein_learning/2017/01/05/simple-swarm-intelligence-
    optimization-for-cartpole-balancing-problem.html
    • AlphaGo Zero's win, what it means, Fast Forward Labs: http://
    blog.fastforwardlabs.com/2017/10/25/alphago-zero.html

    View Slide

  29. Thank you! @ch_doig
    Slides at:
    https://speakerdeck.com/chdoig

    View Slide