Slide 1

Slide 1 text

Derivative-­‐Free  Trajectory  Optimization  with Unscented  Dynamic  Programming Zac  Manchester  and  Scott  Kuindersma Harvard  Agile  Robotics  Lab

Slide 2

Slide 2 text

Trajectory  Optimization xgoal x0

Slide 3

Slide 3 text

Trajectory  Optimization 2 xk+1 = f ( xk, uk) subject to: min x,u J = LN ( xN ) + N 1 X k =1 Lk ( xk, uk )

Slide 4

Slide 4 text

Dynamic  Programming  Solution 3 Vk( x ) = min u L ( x, u ) + Vk+1( f ( x, u )) min x,u J = LN ( xN ) + N 1 X k =1 Lk ( xk, uk ) VN = LN ( x ) VN 1( x ) = min u L ( x, u ) + VN ( f ( x, u ))

Slide 5

Slide 5 text

DDP/SLQ/iLQR Algorithm 4 xgoal x0

Slide 6

Slide 6 text

DDP/SLQ/iLQR Algorithm 5 Vk(x) ⇡ xT Hk x + gT k x Lk( x ) ⇡ x T Wkx + w T k x + u T Rku + r T k u

Slide 7

Slide 7 text

DDP/SLQ/iLQR Algorithm 6 f ( x, u ) ⇡ Akx + Bku + @ 2 f @x 2 ( x ⌦ x ) + @ 2 f @u 2 ( u ⌦ u ) + @ 2 f @x@u ( x ⌦ u ) Vk( x ) = min u L ( x, u ) + Vk+1( f ( x, u ))

Slide 8

Slide 8 text

DDP/SLQ/iLQR Algorithm 7 f ( x, u ) ⇡ Akx + Bku + @ 2 f @x 2 ( x ⌦ x ) + @ 2 f @u 2 ( u ⌦ u ) + @ 2 f @x@u ( x ⌦ u ) Vk( x ) = min u L ( x, u ) + Vk+1( f ( x, u ))

Slide 9

Slide 9 text

Extended  Kalman Filter  and  Duality 8 EKF SLQ/iLQR Quadratic  Log-­‐Likelihood ℒ Quadratic  Cost-­‐To-­‐Go Covariance Σ% Inverse Hessian ' () Forward Riccati Recursion Backward  Riccati Recursion

Slide 10

Slide 10 text

Extended  Kalman Filter  and  Duality 9 EKF SLQ/iLQR Quadratic  Log-­‐Likelihood ℒ Quadratic  Cost-­‐To-­‐Go Covariance Σ% Inverse Hessian ' () Forward Riccati Recursion Backward  Riccati Recursion f(⌃k) ⇡ A⌃kAT

Slide 11

Slide 11 text

Extended  Kalman Filter  and  Duality 10 EKF SLQ/iLQR Quadratic  Log-­‐Likelihood ℒ Quadratic  Cost-­‐To-­‐Go Covariance Σ% Inverse Hessian ' () Forward Riccati Recursion Backward  Riccati Recursion f(⌃k) ⇡ A⌃kAT V ( f ( x )) ⇡ A T Pk+1A

Slide 12

Slide 12 text

Extended  Kalman Filter  and  Duality 11 EKF SLQ/iLQR Quadratic  Log-­‐Likelihood ℒ Quadratic  Cost-­‐To-­‐Go Covariance Σ% Inverse Hessian ' () Forward Riccati Recursion Backward  Riccati Recursion f(⌃k) ⇡ A⌃kAT V ( f ( x )) ⇡ A T Pk+1A = A 1 P 1 k+1A T 1

Slide 13

Slide 13 text

Unscented  Transform 12 s xk+1 = f (s xk)

Slide 14

Slide 14 text

Unscented  DP  Algorithm 13 Vk( x ) = min u L ( x, u ) + Vk+1( f ( x, u )) s xk = f (s xk+1, s uk)

Slide 15

Slide 15 text

Algorithm  Summary 14 1. Initialize  with  * = * 2. Perform  unscented backward  recursion to   compute  ' and  ' = '  −  ' 3. Perform  forward  pass with  line  search  to   compute  new  ' and  ' trajectories 4. Repeat until  convergence

Slide 16

Slide 16 text

Pendulum  Swing  Up 15 0 10 20 30 40 50 60 70 80 Iteration 50 100 150 Total Cost UDP DDP iLQR 0 10 20 30 40 50 60 70 80 Iteration 0 10 20 Running Time (s)

Slide 17

Slide 17 text

Pendulum  Swing  Up 16 0 10 20 30 40 50 60 70 80 Iteration 50 100 150 Total Cost UDP DDP iLQR 0 10 20 30 40 50 60 70 80 Iteration 0 10 20 Running Time (s)

Slide 18

Slide 18 text

Cart  Pole  Swing  Up 17 Algorithm Cost Iterations Time  (s) UDP 131.78 183 78.4 DDP 131.76 67 173.1 iLQR 135.40 54 26.6

Slide 19

Slide 19 text

Airplane  Barrel  Roll 18

Slide 20

Slide 20 text

Airplane  Barrel  Roll 19 Algorithm Cost Iterations Time  (s) UDP 37.80 30 11.6 DDP 37.80 31 100.2 iLQR 37.81 36 12.1

Slide 21

Slide 21 text

20 agile.seas.harvard.edu [email protected] Conclusions • Dynamics  derivatives  can  be  eliminated from  the   classical  DDP  algorithm • Computational  cost  is  comparable  to  SLQ/iLQR • Convergence  rate  is  comparable  to  or  better  than   SLQ/iLQR