Derivative-‐Free
Trajectory
Optimization
with
Unscented
Dynamic
Programming
Zac
Manchester
and
Scott
Kuindersma
Harvard
Agile
Robotics
Lab
Slide 2
Slide 2 text
Trajectory
Optimization
xgoal
x0
Slide 3
Slide 3 text
Trajectory
Optimization
2
xk+1 =
f
(
xk, uk)
subject to:
min
x,u
J
=
LN
(
xN
) +
N
1
X
k
=1
Lk
(
xk, uk
)
Slide 4
Slide 4 text
Dynamic
Programming
Solution
3
Vk(
x
) = min
u L
(
x, u
) +
Vk+1(
f
(
x, u
))
min
x,u
J
=
LN
(
xN
) +
N
1
X
k
=1
Lk
(
xk, uk
)
VN =
LN (
x
)
VN 1(
x
) = min
u L
(
x, u
) +
VN (
f
(
x, u
))
Slide 5
Slide 5 text
DDP/SLQ/iLQR Algorithm
4
xgoal
x0
Slide 6
Slide 6 text
DDP/SLQ/iLQR Algorithm
5
Vk(x) ⇡ xT Hk
x + gT
k
x
Lk(
x
) ⇡
x
T
Wkx
+
w
T
k x
+
u
T
Rku
+
r
T
k u
Slide 7
Slide 7 text
DDP/SLQ/iLQR Algorithm
6
f
(
x, u
) ⇡
Akx
+
Bku
+ @
2
f
@x
2
(
x
⌦
x
) + @
2
f
@u
2
(
u
⌦
u
) + @
2
f
@x@u
(
x
⌦
u
)
Vk(
x
) = min
u L
(
x, u
) +
Vk+1(
f
(
x, u
))
Slide 8
Slide 8 text
DDP/SLQ/iLQR Algorithm
7
f
(
x, u
) ⇡
Akx
+
Bku
+ @
2
f
@x
2
(
x
⌦
x
) + @
2
f
@u
2
(
u
⌦
u
) + @
2
f
@x@u
(
x
⌦
u
)
Vk(
x
) = min
u L
(
x, u
) +
Vk+1(
f
(
x, u
))
Extended
Kalman Filter
and
Duality
10
EKF SLQ/iLQR
Quadratic
Log-‐Likelihood
ℒ
Quadratic
Cost-‐To-‐Go
Covariance
Σ%
Inverse Hessian
'
()
Forward Riccati Recursion Backward
Riccati Recursion
f(⌃k) ⇡ A⌃kAT
V
(
f
(
x
)) ⇡
A
T
Pk+1A
Slide 12
Slide 12 text
Extended
Kalman Filter
and
Duality
11
EKF SLQ/iLQR
Quadratic
Log-‐Likelihood
ℒ
Quadratic
Cost-‐To-‐Go
Covariance
Σ%
Inverse Hessian
'
()
Forward Riccati Recursion Backward
Riccati Recursion
f(⌃k) ⇡ A⌃kAT
V
(
f
(
x
)) ⇡
A
T
Pk+1A
=
A
1
P
1
k+1A
T 1
Slide 13
Slide 13 text
Unscented
Transform
12
s
xk+1 =
f
(s
xk)
Slide 14
Slide 14 text
Unscented
DP
Algorithm
13
Vk(
x
) = min
u L
(
x, u
) +
Vk+1(
f
(
x, u
))
s
xk =
f
(s
xk+1,
s
uk)
Slide 15
Slide 15 text
Algorithm
Summary
14
1. Initialize
with
*
= *
2. Perform
unscented backward
recursion to
compute
'
and
'
= '
−
'
3. Perform
forward
pass with
line
search
to
compute
new
' and
' trajectories
4. Repeat until
convergence
Slide 16
Slide 16 text
Pendulum
Swing
Up
15
0 10 20 30 40 50 60 70 80
Iteration
50
100
150
Total Cost
UDP
DDP
iLQR
0 10 20 30 40 50 60 70 80
Iteration
0
10
20
Running Time (s)
Slide 17
Slide 17 text
Pendulum
Swing
Up
16
0 10 20 30 40 50 60 70 80
Iteration
50
100
150
Total Cost
UDP
DDP
iLQR
0 10 20 30 40 50 60 70 80
Iteration
0
10
20
Running Time (s)
Slide 18
Slide 18 text
Cart
Pole
Swing
Up
17
Algorithm Cost Iterations Time
(s)
UDP 131.78 183 78.4
DDP 131.76 67 173.1
iLQR 135.40 54 26.6
Slide 19
Slide 19 text
Airplane
Barrel
Roll
18
Slide 20
Slide 20 text
Airplane
Barrel
Roll
19
Algorithm Cost Iterations Time
(s)
UDP 37.80 30 11.6
DDP 37.80 31 100.2
iLQR 37.81 36 12.1
Slide 21
Slide 21 text
20
agile.seas.harvard.edu
[email protected]
Conclusions
• Dynamics
derivatives
can
be
eliminated from
the
classical
DDP
algorithm
• Computational
cost
is
comparable
to
SLQ/iLQR
• Convergence
rate
is
comparable
to
or
better
than
SLQ/iLQR