Pro Yearly is on sale from $80 to $50! »

Unscented Dynamic Programming

Unscented Dynamic Programming

Presented at the 2016 IEEE Conference on Decision and Control (CDC). Paper and code are available here: http://agile.seas.harvard.edu/publications/derivative-free-trajectory-optimization-unscented-dynamic-programming

6348c7765745abd70d165eb3e34eca1b?s=128

Zac Manchester

December 13, 2016
Tweet

Transcript

  1. Derivative-­‐Free  Trajectory  Optimization  with Unscented  Dynamic  Programming Zac  Manchester  and

     Scott  Kuindersma Harvard  Agile  Robotics  Lab
  2. Trajectory  Optimization xgoal x0

  3. Trajectory  Optimization 2 xk+1 = f ( xk, uk) subject

    to: min x,u J = LN ( xN ) + N 1 X k =1 Lk ( xk, uk )
  4. Dynamic  Programming  Solution 3 Vk( x ) = min u

    L ( x, u ) + Vk+1( f ( x, u )) min x,u J = LN ( xN ) + N 1 X k =1 Lk ( xk, uk ) VN = LN ( x ) VN 1( x ) = min u L ( x, u ) + VN ( f ( x, u ))
  5. DDP/SLQ/iLQR Algorithm 4 xgoal x0

  6. DDP/SLQ/iLQR Algorithm 5 Vk(x) ⇡ xT Hk x + gT

    k x Lk( x ) ⇡ x T Wkx + w T k x + u T Rku + r T k u
  7. DDP/SLQ/iLQR Algorithm 6 f ( x, u ) ⇡ Akx

    + Bku + @ 2 f @x 2 ( x ⌦ x ) + @ 2 f @u 2 ( u ⌦ u ) + @ 2 f @x@u ( x ⌦ u ) Vk( x ) = min u L ( x, u ) + Vk+1( f ( x, u ))
  8. DDP/SLQ/iLQR Algorithm 7 f ( x, u ) ⇡ Akx

    + Bku + @ 2 f @x 2 ( x ⌦ x ) + @ 2 f @u 2 ( u ⌦ u ) + @ 2 f @x@u ( x ⌦ u ) Vk( x ) = min u L ( x, u ) + Vk+1( f ( x, u ))
  9. Extended  Kalman Filter  and  Duality 8 EKF SLQ/iLQR Quadratic  Log-­‐Likelihood

    ℒ Quadratic  Cost-­‐To-­‐Go Covariance Σ% Inverse Hessian ' () Forward Riccati Recursion Backward  Riccati Recursion
  10. Extended  Kalman Filter  and  Duality 9 EKF SLQ/iLQR Quadratic  Log-­‐Likelihood

    ℒ Quadratic  Cost-­‐To-­‐Go Covariance Σ% Inverse Hessian ' () Forward Riccati Recursion Backward  Riccati Recursion f(⌃k) ⇡ A⌃kAT
  11. Extended  Kalman Filter  and  Duality 10 EKF SLQ/iLQR Quadratic  Log-­‐Likelihood

    ℒ Quadratic  Cost-­‐To-­‐Go Covariance Σ% Inverse Hessian ' () Forward Riccati Recursion Backward  Riccati Recursion f(⌃k) ⇡ A⌃kAT V ( f ( x )) ⇡ A T Pk+1A
  12. Extended  Kalman Filter  and  Duality 11 EKF SLQ/iLQR Quadratic  Log-­‐Likelihood

    ℒ Quadratic  Cost-­‐To-­‐Go Covariance Σ% Inverse Hessian ' () Forward Riccati Recursion Backward  Riccati Recursion f(⌃k) ⇡ A⌃kAT V ( f ( x )) ⇡ A T Pk+1A = A 1 P 1 k+1A T 1
  13. Unscented  Transform 12 s xk+1 = f (s xk)

  14. Unscented  DP  Algorithm 13 Vk( x ) = min u

    L ( x, u ) + Vk+1( f ( x, u )) s xk = f (s xk+1, s uk)
  15. Algorithm  Summary 14 1. Initialize  with  * = * 2.

    Perform  unscented backward  recursion to   compute  ' and  ' = '  −  ' 3. Perform  forward  pass with  line  search  to   compute  new  ' and  ' trajectories 4. Repeat until  convergence
  16. Pendulum  Swing  Up 15 0 10 20 30 40 50

    60 70 80 Iteration 50 100 150 Total Cost UDP DDP iLQR 0 10 20 30 40 50 60 70 80 Iteration 0 10 20 Running Time (s)
  17. Pendulum  Swing  Up 16 0 10 20 30 40 50

    60 70 80 Iteration 50 100 150 Total Cost UDP DDP iLQR 0 10 20 30 40 50 60 70 80 Iteration 0 10 20 Running Time (s)
  18. Cart  Pole  Swing  Up 17 Algorithm Cost Iterations Time  (s)

    UDP 131.78 183 78.4 DDP 131.76 67 173.1 iLQR 135.40 54 26.6
  19. Airplane  Barrel  Roll 18

  20. Airplane  Barrel  Roll 19 Algorithm Cost Iterations Time  (s) UDP

    37.80 30 11.6 DDP 37.80 31 100.2 iLQR 37.81 36 12.1
  21. 20 agile.seas.harvard.edu zmanchester@seas.harvard.edu Conclusions • Dynamics  derivatives  can  be  eliminated

    from  the   classical  DDP  algorithm • Computational  cost  is  comparable  to  SLQ/iLQR • Convergence  rate  is  comparable  to  or  better  than   SLQ/iLQR