Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Reinforcement Learning

0b0ba94d014c694b932ada74f1c9f1af?s=47 forLoop
August 22, 2016

Reinforcement Learning

Tomiwa Ijaware showed the forLoop Machine learning attendence what reinforcement learning is

0b0ba94d014c694b932ada74f1c9f1af?s=128

forLoop

August 22, 2016
Tweet

Transcript

  1. Reinforcement Learning First steps with Q-Learning

  2. Who am I? Tomiwa Ijaware Software Engineer (Konga) Udacity ML

    nanodegree student e911miri@gmail.com
  3. • It involves allowing machines and software agents to automatically

    determine the ideal behaviour within a specific context, in order to maximize its performance. • It involves learning how to map situations to actions so as to maximize a numerical reward signal. • It explores how software agents should act to maximize cummulative reward What is reinforcement learning
  4. Two notable mentions

  5. Alpha GO • Achieved 99.8% win rate against other go

    programs • Combined Deep neural networks with Reinforcement learning • Defeated a Human professional 4 - 1
  6. Google Self Driving Car • Over 1.5million miles driven by

    a trained model • It uses real sensors to tell where it is and what is around it • It reads what to do next from its navigation planner
  7. Where does it shine? • Robotics • Control systems •

    Operations Research • Games • Economics
  8. Q-Learning

  9. What is Q-Learning Q-learning is a model-free reinforcement learning technique.

    Q-learning can be used to find an optimal action-selection policy for any given (finite) Markov decision process (MDP)
  10. A few terms Some theory to get us started

  11. States • A state represents information about the agent’s position

    in its environment. • A good state contains the location of the agent and its observable environment • In a game of tic tac toe, a state would be all xes and ooes on the board • In a car, a state could contain traffic information, the direction on google maps, traffic lights and more
  12. Action • This is a step you take. • It

    leads you to another state • It gives you a reward
  13. Reward • Numerical value representing what you get for taking

    an action. • It could be positive or negative.
  14. Long term reward • The cumulative reward for taking an

    action given that you were in a state divided by the number of times • Q(s, a) = sum[Q(s, a)] / t • t is the number of times a was taken from s
  15. Optimal Action • The action that gives the most long

    term reward • For example, obeying traffic rules means that lastma does not catch you
  16. Policies • This is a set of optimal actions •

    If we were training a car to drive in Lagos, it would be your personal rule book for surviving Lagos traffic
  17. Some code goes here... https://github.com/e911miri/forloop-qlearning

  18. Consider a game of blackjack Player’s Hand Dealer’s hand 10

    + 2 = 12 3 + 10 = 13 Should I hit or miss?
  19. Lets model that problem as a markovian decision process •

    States = My hand and the dealer’s hand • Actions = Hit or miss • Rewards (1 for win, 0 for draw, -1 for lose) • Ideal Policy should show tell me what to do given a particular state.
  20. Learning process

  21. To train it,

  22. Learning rate • This determines how much the new information

    overrides the old information. • It ranges from 0 to 1 • 0 means do not learn anything • 1 means the agent should only use the most recent rewards.
  23. Discount Factor • This regulates the importance of future rewards.

    • It is a value from 0 to 1 • 0 means consider only current rewards • nearing 1 will make the model strive for long term rewards. • 1 and beyond means the model will not converge
  24. Q-Learning in the real world Considering real world challenges and

    how to deal with those problems
  25. References • Brandon’s blogpost: http://outlace.com/Reinforcement-Learning-Part-1/ • Udacity reinforcement learning course:

    https://www.udacity.com/course/machine-learning-reinforcement-learning --ud820 • Deepmind Alpha Go: https://deepmind.com/alpha-go • Google self-driving car: https://www.google.com/selfdrivingcar/ • https://github.com/e911miri/forloop-qlearning