Unscented Dynamic Programming

Unscented Dynamic Programming

Presented at the 2016 IEEE Conference on Decision and Control (CDC). Paper and code are available here: http://agile.seas.harvard.edu/publications/derivative-free-trajectory-optimization-unscented-dynamic-programming

6348c7765745abd70d165eb3e34eca1b?s=128

Zac Manchester

December 13, 2016
Tweet

Transcript

  1. 3.

    Trajectory  Optimization 2 xk+1 = f ( xk, uk) subject

    to: min x,u J = LN ( xN ) + N 1 X k =1 Lk ( xk, uk )
  2. 4.

    Dynamic  Programming  Solution 3 Vk( x ) = min u

    L ( x, u ) + Vk+1( f ( x, u )) min x,u J = LN ( xN ) + N 1 X k =1 Lk ( xk, uk ) VN = LN ( x ) VN 1( x ) = min u L ( x, u ) + VN ( f ( x, u ))
  3. 6.

    DDP/SLQ/iLQR Algorithm 5 Vk(x) ⇡ xT Hk x + gT

    k x Lk( x ) ⇡ x T Wkx + w T k x + u T Rku + r T k u
  4. 7.

    DDP/SLQ/iLQR Algorithm 6 f ( x, u ) ⇡ Akx

    + Bku + @ 2 f @x 2 ( x ⌦ x ) + @ 2 f @u 2 ( u ⌦ u ) + @ 2 f @x@u ( x ⌦ u ) Vk( x ) = min u L ( x, u ) + Vk+1( f ( x, u ))
  5. 8.

    DDP/SLQ/iLQR Algorithm 7 f ( x, u ) ⇡ Akx

    + Bku + @ 2 f @x 2 ( x ⌦ x ) + @ 2 f @u 2 ( u ⌦ u ) + @ 2 f @x@u ( x ⌦ u ) Vk( x ) = min u L ( x, u ) + Vk+1( f ( x, u ))
  6. 9.

    Extended  Kalman Filter  and  Duality 8 EKF SLQ/iLQR Quadratic  Log-­‐Likelihood

    ℒ Quadratic  Cost-­‐To-­‐Go Covariance Σ% Inverse Hessian ' () Forward Riccati Recursion Backward  Riccati Recursion
  7. 10.

    Extended  Kalman Filter  and  Duality 9 EKF SLQ/iLQR Quadratic  Log-­‐Likelihood

    ℒ Quadratic  Cost-­‐To-­‐Go Covariance Σ% Inverse Hessian ' () Forward Riccati Recursion Backward  Riccati Recursion f(⌃k) ⇡ A⌃kAT
  8. 11.

    Extended  Kalman Filter  and  Duality 10 EKF SLQ/iLQR Quadratic  Log-­‐Likelihood

    ℒ Quadratic  Cost-­‐To-­‐Go Covariance Σ% Inverse Hessian ' () Forward Riccati Recursion Backward  Riccati Recursion f(⌃k) ⇡ A⌃kAT V ( f ( x )) ⇡ A T Pk+1A
  9. 12.

    Extended  Kalman Filter  and  Duality 11 EKF SLQ/iLQR Quadratic  Log-­‐Likelihood

    ℒ Quadratic  Cost-­‐To-­‐Go Covariance Σ% Inverse Hessian ' () Forward Riccati Recursion Backward  Riccati Recursion f(⌃k) ⇡ A⌃kAT V ( f ( x )) ⇡ A T Pk+1A = A 1 P 1 k+1A T 1
  10. 14.

    Unscented  DP  Algorithm 13 Vk( x ) = min u

    L ( x, u ) + Vk+1( f ( x, u )) s xk = f (s xk+1, s uk)
  11. 15.

    Algorithm  Summary 14 1. Initialize  with  * = * 2.

    Perform  unscented backward  recursion to   compute  ' and  ' = '  −  ' 3. Perform  forward  pass with  line  search  to   compute  new  ' and  ' trajectories 4. Repeat until  convergence
  12. 16.

    Pendulum  Swing  Up 15 0 10 20 30 40 50

    60 70 80 Iteration 50 100 150 Total Cost UDP DDP iLQR 0 10 20 30 40 50 60 70 80 Iteration 0 10 20 Running Time (s)
  13. 17.

    Pendulum  Swing  Up 16 0 10 20 30 40 50

    60 70 80 Iteration 50 100 150 Total Cost UDP DDP iLQR 0 10 20 30 40 50 60 70 80 Iteration 0 10 20 Running Time (s)
  14. 18.

    Cart  Pole  Swing  Up 17 Algorithm Cost Iterations Time  (s)

    UDP 131.78 183 78.4 DDP 131.76 67 173.1 iLQR 135.40 54 26.6
  15. 20.

    Airplane  Barrel  Roll 19 Algorithm Cost Iterations Time  (s) UDP

    37.80 30 11.6 DDP 37.80 31 100.2 iLQR 37.81 36 12.1
  16. 21.

    20 agile.seas.harvard.edu zmanchester@seas.harvard.edu Conclusions • Dynamics  derivatives  can  be  eliminated

    from  the   classical  DDP  algorithm • Computational  cost  is  comparable  to  SLQ/iLQR • Convergence  rate  is  comparable  to  or  better  than   SLQ/iLQR