Introduction to reinforcement learning

A not rigid introduction to reinforcement learning A not rigid
introduction to reinforcement learning Xudong Sun, [email protected] July 28, 2016

A not rigid introduction to reinforcement learning Outline 1 Introduction
Introductory example Basic concepts 2 Temporal dierence learning Sarsa Q-learning 3 Value function approximation The mountain car problem 4 Gaussian Process Temporal Dierence learning Introductory example and GP Octpus control 5 Reference

A not rigid introduction to reinforcement learning Introduction Outline 1
Introduction Introductory example Basic concepts 2 Temporal dierence learning Sarsa Q-learning 3 Value function approximation The mountain car problem 4 Gaussian Process Temporal Dierence learning Introductory example and GP Octpus control 5 Reference

A not rigid introduction to reinforcement learning Introduction Introductory example
Introductary example

A not rigid introduction to reinforcement learning Introduction Introductory example
Introductary example grey: Mouse yellow: Cheese orange: cat

A not rigid introduction to reinforcement learning Introduction Basic concepts
Preliminary state,action, transition(system), reward between supervised and unsupervised: learning reaction policy(Look up table) when labeled training example are not available Agent environment interaction as a closed loop system Reward function r: Adaptive to Environment Returns: accumulated gain of reward R = ∞ i=1 γirt+1 Markov decision process(MDP) in RL:

value function-long term consideration state value function Vπ(s) = Eπ[Rt|St = s] = Eπ[ ∞ i=1 γirt+1|st = s] = Eπ[rt + γVπ(st+1)|St = s] = s ∈S T(s− > s , π)[rt + γVπ(st = s )] (Bellman equation) π∗ = argmaxπ[rt + γVπ(st+1)|St = s ], ∀s ∈ S action value function Q(st, at) = r(st, at) + γV∗(st+1) V∗(s) = argmaxπ[rt + γVπ(st+1)|St = s ] Q(st, at) = r(st, at) + argmax a {Q(st+1, at)}

Policy iteration

Computation approach Dynamic programing Monte Carlo Temporal dierence

A not rigid introduction to reinforcement learning Temporal dierence learning
Outline 1 Introduction Introductory example Basic concepts 2 Temporal dierence learning Sarsa Q-learning 3 Value function approximation The mountain car problem 4 Gaussian Process Temporal Dierence learning Introductory example and GP Octpus control 5 Reference

Dynamic Programming and Monte Carlo

Q-learning Q(state, action) = R(state, action) + γ Max[Q(next state, all actions)], Q(1, 5) = R(1, 5) + 0.8 * Max[Q(5, 1), Q(5, 4), Q(5, 5)] = 100 + 0.8 * 0 = 100

A not rigid introduction to reinforcement learning Value function approximation
Outline 1 Introduction Introductory example Basic concepts 2 Temporal dierence learning Sarsa Q-learning 3 Value function approximation The mountain car problem 4 Gaussian Process Temporal Dierence learning Introductory example and GP Octpus control 5 Reference

The mountain car problem xt+1 = [xt + ˙ xt+1] ˙ xt+1 = [ ˙ xt + 0.001at − 0.0025cos(3xt)]

The mountain car problem θt+1 = θt + αδtet et = γλet−1 + θt Qt(st, at) δt = rt+1 + γQt(st+1, at+1) − Qt(st, at)

The mountain car problem

A not rigid introduction to reinforcement learning Gaussian Process Temporal
Dierence learning Outline 1 Introduction Introductory example Basic concepts 2 Temporal dierence learning Sarsa Q-learning 3 Value function approximation The mountain car problem 4 Gaussian Process Temporal Dierence learning Introductory example and GP Octpus control 5 Reference

Dierence learning Introductory example and GP

Dierence learning Introductory example and GP GPs Gaussian processes (GPs) provide a consistent and principled probabilistic framework for ranking functions according to their plausibility by dening a corresponding probability distribution over functions (Rasmussen and Williams, 2006) Y(x)=F(x)+N(x) Yt = (Y(x1), Y(x2), ..Y(xt)) Ft = (F(x1), F(x2), ..F(xt)) Nt = (N(x1), N(x2), ..N(xt)) F(x) Yt ˜N f0(x) f0 , k(x, x) kt(x) kt(x) Kt + σ2I (F(.)|Yt)˜N{ˆ F(.), Pt(., .)}

Dierence learning Introductory example and GP GPs for regression

Dierence learning Introductory example and GP GPs for reinforcement learning vt ∝ N(0, k(.)), E[v(x), v(x )] = k(x, x ) vt = Eπ[rt + γvπ(st+1)|St = s] GPs model for RL: rt = vt − γvt+1 + Nt Dene sample vector: Rt = [r(x1), r(x2), ...r(xt)], Vt = [v(x1), v(x2), ...v(xt)],Nt = [n(x1), n(x2), ...n(xt)] Rt−1 = HtVt + Nt−1 , (t − 1) × 1 = [(t − 1) × t] × [t × 1] Rt−1 Vt ˜N 0 0 , ∗ ∗ ∗ ∗ (V(.)|Rt−1 = r)˜N{ˆ v(.), Pt(., .)}, where mean and covariance depends on the award vector, state vector and kernel function.

Dierence learning Octpus control Octpus control:22 point masses(x,y,xd,yd)

A not rigid introduction to reinforcement learning Reference Some pictures
are adopted from the following source: 1.Reinforcement Learning: An Introduction, Richard S. Sutton and Andrew G. Barto, A Bradford Book, The MIT Press ,Cambridge, Massachusetts ,London, England 2.John McCullock, mnemstudio tutorial on path nding. 3.Travis DeWolf tutorial on reinforcement learning. 4.Engel, Y., Szabo, P., Volkinshtein, D. (2005). Learning to Control an Octopus Arm with Gassuian Process Temporal Dierence Methods. Proc. NIPS, c. 5.Reinforcement Learning By: Chandra Prakash IIITM Gwalior Thanks Xudong Sun, [email protected] Linkedin: Xudong Sun, LMU github:https://github.com/smilesun

Introduction to reinforcement learning

Introduction to reinforcement learning

MunichDataGeeks

More Decks by MunichDataGeeks

Other Decks in Science

Featured

Transcript

A not rigid introduction to reinforcement learning A not rigid

A not rigid introduction to reinforcement learning Outline 1 Introduction

A not rigid introduction to reinforcement learning Introduction Outline 1

A not rigid introduction to reinforcement learning Introduction Introductory example

A not rigid introduction to reinforcement learning Introduction Introductory example

A not rigid introduction to reinforcement learning Introduction Basic concepts

A not rigid introduction to reinforcement learning Introduction Basic concepts

A not rigid introduction to reinforcement learning Introduction Basic concepts

A not rigid introduction to reinforcement learning Introduction Basic concepts

A not rigid introduction to reinforcement learning Temporal dierence learning

A not rigid introduction to reinforcement learning Temporal dierence learning

A not rigid introduction to reinforcement learning Temporal dierence learning

A not rigid introduction to reinforcement learning Value function approximation

A not rigid introduction to reinforcement learning Value function approximation

A not rigid introduction to reinforcement learning Value function approximation

A not rigid introduction to reinforcement learning Value function approximation

A not rigid introduction to reinforcement learning Gaussian Process Temporal

A not rigid introduction to reinforcement learning Gaussian Process Temporal

A not rigid introduction to reinforcement learning Gaussian Process Temporal

A not rigid introduction to reinforcement learning Gaussian Process Temporal

A not rigid introduction to reinforcement learning Gaussian Process Temporal

A not rigid introduction to reinforcement learning Gaussian Process Temporal

A not rigid introduction to reinforcement learning Reference Some pictures