(HKSTS Outstanding Student Paper Award) Physically Consistent Differential-Game Surrogates for Interaction-Aware AV Trajectory Planning

Physically consistent Differential-Game surrogates for interaction-aware AV trajectory planning The
29th HKSTS Hong Kong Dec. 8th , 2025 The University of Tokyo, Japan Fumihito Furuhashi & Eiji Hato

Trajectory Planning 2 F. Furuhashi and E. Hato, Physically consistent
Differential-Game surrogates for interaction-aware AV trajectory planning, The 29th HKSTS (2025), Hong Kong. YouTube / Waymo is a key technology for autonomous vehicle.

Decision making in AV 3 F. Furuhashi and E. Hato,
Physically consistent Differential-Game surrogates for interaction-aware AV trajectory planning, The 29th HKSTS (2025), Hong Kong. Time Scale Macro Micro Route Planning Motion Planning Trajectory Planning Target Problem 0~1 [sec] 1~10 [sec] 1 min~1 hour Challenge: balancing efficiency safety real-time under V2V interaction.

AV-AV interaction AV0 AV1 AV0 AV1 V2V communication AI AI
Autonomous Traffic Flow 4 F. Furuhashi and E. Hato, Physically consistent Differential-Game surrogates for interaction-aware AV trajectory planning, The 29th HKSTS (2025), Hong Kong. Strong Interaction Endogenous & AV-HV interaction Autonomous Vehicle Human-driven Vehicle obstacle avoid Exogenous AI

CVAE MPC MCTS MARL RL Position of this study 5
F. Furuhashi and E. Hato, Physically consistent Differential-Game surrogates for interaction-aware AV trajectory planning, The 29th HKSTS (2025), Hong Kong. quality safety & efficiency effort speed & scalability high low large small Differential Game Trade-off!! Endogenous: Exogenous: Paden+(2016) Ye+(2021) Hoogendoorn+ (2009) Mo+ (2021) Lee+(2017) Essalmi+ (2025) Shalev- Shwartz+ (2021)

Part 1: Differential Game (DG) Concept

Concept: Differential Game 7 Differential Game (Isaacs, 1955) A type
of a dynamic game with continuous time and state and a finite number of players. Γ𝑧0 𝑇 ≔ 𝑁, 𝑢𝑖 , 𝐽𝑖 𝑇 , 𝑓 Players Control (actions) Cost function Dynamics Time 𝑡0 𝑇2 𝑇1 F. Furuhashi and E. Hato, Physically consistent Differential-Game surrogates for interaction-aware AV trajectory planning, The 29th HKSTS (2025), Hong Kong.

Concept: Differential Game 8 Γ𝑧0 𝑇 ≔ 𝑁, 𝑢𝑖 ,
𝐽𝑖 𝑇 , 𝑓 Players Control (actions) Cost function Dynamics Time 𝑇2 𝑇1 𝑡0 F. Furuhashi and E. Hato, Physically consistent Differential-Game surrogates for interaction-aware AV trajectory planning, The 29th HKSTS (2025), Hong Kong. Initial state: 𝒛𝟎 Differential Game (Isaacs, 1955) A type of a dynamic game with continuous time and state and a finite number of players. YOU Planning from 𝑡0 to 𝑇1

𝐽𝑖 𝑇 , 𝑓 Players Control (actions) Cost function Dynamics Time Initial state: 𝒛𝟎 𝑇2 𝐽 𝑡, 𝑧, 𝑢 = ׬ 𝑡 𝑇1 𝐿 𝑠, 𝑧, 𝑢 𝑑𝑠 𝑉 𝑡, 𝑧 = min 𝑢𝑖 𝐽 𝑡, 𝑧, 𝑢𝑖 , 𝑢−𝑖 𝑇1 𝑡0 Cost Function: Value Function: F. Furuhashi and E. Hato, Physically consistent Differential-Game surrogates for interaction-aware AV trajectory planning, The 29th HKSTS (2025), Hong Kong. We evaluate... Stage cost Get • state: 𝑧 • control: 𝑢 Differential Game (Isaacs, 1955) A type of a dynamic game with continuous time and state and a finite number of players. YOU

𝐽𝑖 𝑇 , 𝑓 Players Control (actions) Cost function Dynamics Time 𝑇2 𝑇1 𝑡0 F. Furuhashi and E. Hato, Physically consistent Differential-Game surrogates for interaction-aware AV trajectory planning, The 29th HKSTS (2025), Hong Kong. Get • state: 𝑧 • control: 𝑢 𝐽 𝑡, 𝑧, 𝑢 = ׬ 𝑡 𝑇1 𝐿 𝑠, 𝑧, 𝑢 𝑑𝑠 Cost Function: We evaluate... Stage cost Initial state: 𝒛𝟎 𝑉 𝑡, 𝑧 = min 𝑢𝑖 𝐽 𝑡, 𝑧, 𝑢𝑖 , 𝑢−𝑖 Value Function: Differential Game (Isaacs, 1955) A type of a dynamic game with continuous time and state and a finite number of players.

𝐽𝑖 𝑇 , 𝑓 Players Control (actions) Cost function Dynamics Time 𝑇2 𝜕 𝜕𝑡 𝑉 𝑡, 𝑧 + min 𝑢 𝐿 𝑡, 𝑧, 𝑢 + 𝜕 𝜕𝑡 𝑉 𝑡, 𝑧 𝑓 𝑧, 𝑢 = 0 𝐽 𝑡, 𝑧∗, 𝑢∗ ≤ 𝐽 𝑡, 𝑧𝑖 , 𝑧−𝑖 ∗ , 𝑢𝑖 , 𝑢−𝑖 ∗ 𝑇1 𝑡0 𝑢∗ = arg min 𝑢𝑖 𝐽 𝑡, 𝑧, 𝑢𝑖 , 𝑢−𝑖 Optimal Control Inputs: Hamilton Jacobi Bellman Eq.: Nash Eq.: F. Furuhashi and E. Hato, Physically consistent Differential-Game surrogates for interaction-aware AV trajectory planning, The 29th HKSTS (2025), Hong Kong. Get • state: 𝑧 • control: 𝑢 𝐽 𝑡, 𝑧, 𝑢 = ׬ 𝑡 𝑇1 𝐿 𝑠, 𝑧, 𝑢 𝑑𝑠 Cost Function: We evaluate... Stage cost Initial state: 𝒛𝟎 𝑉 𝑡, 𝑧 = min 𝑢𝑖 𝐽 𝑡, 𝑧, 𝑢𝑖 , 𝑢−𝑖 Value Function: Solution Differential Game (Isaacs, 1955) A type of a dynamic game with continuous time and state and a finite number of players.

𝐽𝑖 𝑇 , 𝑓 Players Control (actions) Cost function Dynamics 𝑡0 𝑇1 𝑇2 Time Time Evolution: Bicycle model ሶ 𝑧 = ሶ 𝑥 ሶ 𝑦 ሶ 𝜃 ሶ 𝑣 = 𝑓 𝑧, 𝑢∗ = 𝑣 cos 𝜃 𝑣 sin 𝜃 𝑣 𝐿 tan 𝛿 𝑎 F. Furuhashi and E. Hato, Physically consistent Differential-Game surrogates for interaction-aware AV trajectory planning, The 29th HKSTS (2025), Hong Kong. Differential Game (Isaacs, 1955) A type of a dynamic game with continuous time and state and a finite number of players. Initial state: 𝒛𝟎 Nonlinear

𝐽𝑖 𝑇 , 𝑓 Players Control (actions) Cost function Dynamics 𝑡0 𝑇1 𝑇2 Time F. Furuhashi and E. Hato, Physically consistent Differential-Game surrogates for interaction-aware AV trajectory planning, The 29th HKSTS (2025), Hong Kong. Differential Game (Isaacs, 1955) A type of a dynamic game with continuous time and state and a finite number of players. Initial state: 𝒛𝟎

𝐽𝑖 𝑇 , 𝑓 Players Control (actions) Cost function Dynamics 𝑡0 𝑇1 𝑇2 Time F. Furuhashi and E. Hato, Physically consistent Differential-Game surrogates for interaction-aware AV trajectory planning, The 29th HKSTS (2025), Hong Kong. Time Evolution: 𝒇 Differential Game (Isaacs, 1955) A type of a dynamic game with continuous time and state and a finite number of players. Initial state: 𝒛𝟎

Initial state: 𝒛𝟎 15 Γ𝑧0 𝑇 ≔ 𝑁, 𝑢𝑖 ,
𝐽𝑖 𝑇 , 𝑓 Players Control (actions) Cost function Dynamics 𝑡0 𝑇1 𝑇2 Differential Game A type of a dynamic game with continuous time and state and a finite number of players. Time Initial state F. Furuhashi and E. Hato, Physically consistent Differential-Game surrogates for interaction-aware AV trajectory planning, The 29th HKSTS (2025), Hong Kong. Time Evolution: 𝒇 Challenge • Real-time solution of DG • Designing Safety-guaranteed cost func. • Local minima in multi-agent DG Fruits Endogenous trajectory planning Concept: Differential Game How to solve DG?

Part 2: Proposed Framework

How to solve DG? 17 F. Furuhashi and E. Hato,
Physically consistent Differential-Game surrogates for interaction-aware AV trajectory planning, The 29th HKSTS (2025), Hong Kong. Conventional Approach (Hoogendoorn+, 2009) Two-point boundary value problem State eq.: 𝑑 𝑑𝑡 𝑧 𝑖 = 𝑓 𝑧 𝑖 , 𝑢 𝑧 𝑖 , Λ 𝑖−1 Co-state eq. (≈HJB eq.): − 𝑑 𝑑𝑡 𝜆 𝑖 = 𝜕 𝜕𝑧 𝐻 𝑧 𝑖 , 𝑢 𝑧 𝑖 , Λ 𝑖−1 Iterative updates Iterative updates High calculation cost (139 [sec/case]) ቤ 𝜕𝐻 𝜕𝛿 𝑢∗ = 𝜆𝜙 − 𝑣 𝑙 cos2 𝛿 + 𝛿 = 0 Because of the bicycle model... No closed-form solution... Notation 𝐻: Hamiltonian 𝛿: steering angle 𝑢∗: optimal control 𝜆𝜙 : co-state of variables 𝑣: speed, 𝑙: wheelbase

Deep Learning DG Surrogate 18 F. Furuhashi and E. Hato,
Physically consistent Differential-Game surrogates for interaction-aware AV trajectory planning, The 29th HKSTS (2025), Hong Kong. Proposed Framework We build a surrogate model to approximate the HJB solution process. But... Is the solution trajectory always safe? Loss function: ℒ = 𝑀𝑆𝐸 𝐻𝐽𝐵 𝑒𝑞. Deep Learning ... 0.02 [sec/case]

Key: Safety Layer 19 F. Furuhashi and E. Hato, Physically
consistent Differential-Game surrogates for interaction-aware AV trajectory planning, The 29th HKSTS (2025), Hong Kong. ො 𝑢 𝑖 𝑠𝑎𝑓𝑒 𝑡 = arg min 𝑢𝑖 1 2 𝑢𝑖 𝑡 −ෝ 𝑢𝑖 𝑡 2 ሶ 𝑧 = 𝑓 𝑧, 𝑢 , ℎ 𝑧𝑖 𝑡 + 1 , 𝑧−𝑖 (𝑡 + 1) = 𝑦𝑚𝑎𝑥 − 𝑦𝑖 𝑦𝑖 − 𝑦𝑚𝑖𝑛 𝑑𝑖𝑗 − (𝑙𝑖 + 𝑙𝑗 ) ≥ 𝟎, s.t. 𝑢𝑚𝑖𝑛 ≤ 𝑢 ≤ 𝑢𝑚𝑎𝑥 Taylor Expansion Safety space (feasible) Collision Path (unfeasible) 𝑎 𝛿 ℝ2 ො 𝑢𝑖 ො 𝑢 𝑖 𝑠𝑎𝑓𝑒 𝐴𝑢 ≥ 𝑏, s.t. 𝑢𝑚𝑖𝑛 ≤ 𝑢 ≤ 𝑢𝑚𝑎𝑥 Quadratic Problem ො 𝑢 𝑖 𝑠𝑎𝑓𝑒 𝑡 = arg min 𝑢𝑖 1 2 𝑢𝑖 𝑡 −ෝ 𝑢𝑖 𝑡 2 Projects unsafe controls into the safety set. Nonlinear...

How to train? 20 F. Furuhashi and E. Hato, Physically
consistent Differential-Game surrogates for interaction-aware AV trajectory planning, The 29th HKSTS (2025), Hong Kong. Evolutionary Model Merge inspired by LLM training (Akiba+2025) Genetic optimization for large deep learning models Aggregate cross-over & mutation Parents models ... 𝜃1 𝜃2 𝜃𝑘 ... 𝜃1 ′ 𝜃2 ′ 𝜃𝑘 ′ Children models Local minima in multi-agent DG → gradient-based methods (Adam, LBFGS, etc.) often fail to local minima... minima! minima!

Part 3: Evaluation and Results

Evaluation Metrics and Benchmarks 22 F. Furuhashi and E. Hato,
Physically consistent Differential-Game surrogates for interaction-aware AV trajectory planning, The 29th HKSTS (2025), Hong Kong. Benchmarks Models • Model Predictive Control (MPC) • Differential Game (Iterative) • Differential Game (Surrogate) • Differential Game (Surrogate) w/ Safety Layer Optimization algorithms • Adam (gradient based) • TIES (genetic based) MPC DG (Iterative) DG (Surrogate) DG (Surrogate) w/ Safety Layer Endogeneity Calculation Cost Safety Metrics HJB residual (MSE) 𝐿𝑜𝑠𝑠 Ƹ 𝑧, ො 𝑢 = 𝑀𝑆𝐸 𝜕 𝜕𝑡 𝑉 𝑡, 𝑧 + min 𝑢 𝐿 𝑡, 𝑧, 𝑢 + 𝜕 𝜕𝑡 𝑉 𝑡, 𝑧 𝑓 𝑧, 𝑢 Proposed ※ Same training epochs

Cost func. design and Data 23 F. Furuhashi and E.
Hato, Physically consistent Differential-Game surrogates for interaction-aware AV trajectory planning, The 29th HKSTS (2025), Hong Kong. Dataset Stage cost function Assumption. 𝐿 𝑧𝑖 , 𝑢𝑖 = 𝑎𝑖 2 + 𝜆1 𝛿𝑖 2 +𝜆2 𝑣𝑖 − 𝑣𝑑 2 + 𝜆3 𝜃𝑖 2 +𝜆4 𝐿𝑙𝑎𝑛𝑒 𝑦𝑖 +𝜆5 𝐿𝑠𝑎𝑓𝑒 𝑧𝑖 Comfort Efficiency Stability Safety • Road geometry: Straight 3-lane highway • Traffic type: AV-only • Scenario length: 1–10 vehicles per scenario • Initial conditions: • Vehicle positions sampled across all lanes • Random initial speeds and desired speeds in [0, 30] m/s • Prediction horizon: 5 seconds • Hyperparameters: Pre-tuned before experiments ※ 𝝀𝟏 , … , 𝝀𝟓 = 𝟏 in this experiment. • Train: 1~1,000 randomly generated scenarios • Test: 100 common scenarios shared across all methods

ut ’ qu t 24 F. Furuhashi and E. Hato,
Physically consistent Differential-Game surrogates for interaction-aware AV trajectory planning, The 29th HKSTS (2025), Hong Kong. Key Findings: Impacts of EMM EMM (genetic based) Adam (gradient based) Iterative up to 97% accuracy gain (↓ HJB du ) EMM

ut ’ qu t 25 F. Furuhashi and E. Hato,
Physically consistent Differential-Game surrogates for interaction-aware AV trajectory planning, The 29th HKSTS (2025), Hong Kong. Key Findings: Impacts of EMM Impacts of Safety layer up to 97% accuracy gain (↓ HJB du ) performance du t dd d t c t t EMM Safety Layer Gets worse w/ Safety Layer w/o Safety Layer

Safety Layer: Trade-off in Performance, Gain in Realism 26 F.
Furuhashi and E. Hato, Physically consistent Differential-Game surrogates for interaction-aware AV trajectory planning, The 29th HKSTS (2025), Hong Kong. • Performance gets worse w t t t (HJB du ↑). • Vanilla surrogate may output unrealistic / unsafe paths. Key Findings: D tt d: w/ t Straight: w/o Safety Layer Safety Layer fixes these and enforces realistic, safe trajectories.

Concluding Remarks 27 j ct g A c : DG
Formulation for Endogenous AV-AV Interactions Limitations • t slows c (0.02 → 0.08 [ c/c ]) • t gu t approximate, not complete What’s next? • Apply to complex geometries (intersections, ramps, lane-free) • Evaluate safety approximation accuracy & develop complete safety guarantees F. Furuhashi and E. Hato, Physically consistent Differential-Game surrogates for interaction-aware AV trajectory planning, The 29th HKSTS (2025), Hong Kong. Concept • unrealistic t j ct x d → safe & feasible Surrogate EMM t • fast D x t (138 → 0.02 [sec/case]) • c ↑ (u t 97%), multiple equilibria handled

References 28 • Akiba, T., Shing, M., Tang, Y., Sun,
Q., & Ha, D. (2025). Evolutionary optimization of model merging recipes. Nature Machine Intelligence, 1–10. • Essalmi, K., Garrido, F., & Nashashibi, F. (2025). An extended horizon tactical decision-making for automated driving based on Monte Carlo tree search. arXiv preprint arXiv:2504.15869. • Hoogendoorn, S. P., & Bovy, P. (2009). Generic driving behavior modeling by differential game theory. In Traffic and Granular Flow’07 (pp. 321–331). Springer. • Isaacs, R. (1955). The problem of aiming and evasion. RAND Corporation. • Jond, H. B., & Platoš, J. (2022). Differential game-based optimal control of autonomous vehicle convoy. IEEE Transactions on Intelligent Transportation Systems, 24(3), 2903–2919. • Katrakazas, C., Quddus, M., Chen, W. H., & Deka, L. (2015). Real-time motion planning methods for autonomous on-road driving: State-of-the-art and future research directions. Transportation Research Part C, 60, 416–442. • Lee, N., Choi, W., Vernaza, P., Choy, C. B., Torr, P. H., & Chandraker, M. (2017). DESIRE: Distant future prediction in dynamic scenes with interacting agents. In CVPR (pp. 336–345). • Mo, S., Pei, X., & Wu, C. (2021). Safe reinforcement learning for autonomous vehicle using Monte Carlo tree search. IEEE Transactions on Intelligent Transportation Systems, 23(7), 6766–6773. • Paden, B., Čáp, M., Yong, S. Z., Yershov, D., & Frazzoli, E. (2016). A survey of motion planning and control techniques for self-driving urban vehicles. IEEE Transactions on Intelligent Vehicles, 1(1), 33–55. • Shalev-Shwartz, S., Shammah, S., & Shashua, A. (2016). Safe, multi-agent, reinforcement learning for autonomous driving. arXiv preprint arXiv:1610.03295. • The Waymo Driver. (2025, January 27). The Waymo Driver navigating freeways [Video]. YouTube. https://www.youtube.com/watch?v=tgX7yzyfQ6E • Yadav, P., Tam, D., Choshen, L., Raffel, C. A., & Bansal, M. (2023). Ties-merging: Resolving interference when merging models. NeurIPS, 36, 7093–7115. • Ye, F., Zhang, S., Wang, P., & Chan, C. Y. (2021). A survey of deep reinforcement learning algorithms for motion planning and control of autonomous vehicles. In IV Symposium (pp. 1073–1080). IEEE. F. Furuhashi and E. Hato, Physically consistent Differential-Game surrogates for interaction-aware AV trajectory planning, The 29th HKSTS (2025), Hong Kong.

Thank you! [email protected] Slides

Appendix. 30

b 1 31 F. Furuhashi and E. Hato, Physically consistent
Differential-Game surrogates for interaction-aware AV trajectory planning, The 29th HKSTS (2025), Hong Kong.

A g t IE (Y d +, 2023) 32 F.
Furuhashi and E. Hato, Physically consistent Differential-Game surrogates for interaction-aware AV trajectory planning, The 29th HKSTS (2025), Hong Kong. 1. 𝐟𝐨𝐫𝐚𝐥𝐥 𝑖 𝐢𝐧 1, … , 𝑛 𝐝𝐨: # Select the 𝑘 parameters with the largest absolute values መ 𝜃 ← topk 𝜃 2. 𝐟𝐨𝐫𝐚𝐥𝐥 𝑗 𝐢𝐧 1, … , 𝑘 𝐝𝐨: # Determine the update sign 1. 𝑃𝑗 = σ𝑖 max መ 𝜃𝑗 , 0 2. 𝑁𝑗 = σ𝑖 max − መ 𝜃𝑗 , 0 3. 𝑠𝑗 ← 1 if 𝑃𝑗 ≥ 𝑁𝑗 else − 1 3. 𝐟𝐨𝐫𝐚𝐥𝐥 𝑗 𝐢𝐧 1, … , 𝑘 𝐝𝐨: # Average only the parameters with the consistent sign 𝑆𝑗 ← መ 𝜃𝑗 | sgn መ 𝜃𝑗 = 𝑠𝑗 1. 𝜃𝑗 ′ = mean 𝑆𝑗 4. update: 𝜃 ← 𝜃 + 𝛼𝜃′ Naive averaging → “ t t c ” 𝜃3,𝑖 = 𝜃1,𝑖 + 𝜃2,𝑖 2 ≈ 0 𝜃1,𝑖 𝜃1,𝑖 𝜃3,𝑖 example 𝜃2,𝑗 update 𝜃1,𝑗 𝜃2,𝑗 trim

33 F. Furuhashi and E. Hato, Physically consistent Differential-Game surrogates
for interaction-aware AV trajectory planning, The 29th HKSTS (2025), Hong Kong.

34 F. Furuhashi and E. Hato, Physically consistent Differential-Game surrogates
for interaction-aware AV trajectory planning, The 29th HKSTS (2025), Hong Kong.

Key: Safety Layer 35 F. Furuhashi and E. Hato, Physically
consistent Differential-Game surrogates for interaction-aware AV trajectory planning, The 29th HKSTS (2025), Hong Kong. ො 𝑢 𝑖 𝑠𝑎𝑓𝑒 𝑡 = arg min 𝑢𝑖 1 2 𝑢𝑖 𝑡 −ෝ 𝑢𝑖 𝑡 2 ሶ 𝑧 = 𝑓 𝑧, 𝑢 , ℎ 𝑧𝑖 𝑡 + 1 , 𝑧−𝑖 (𝑡 + 1) = 𝑦𝑚𝑎𝑥 − 𝑦𝑖 𝑦𝑖 − 𝑦𝑚𝑖𝑛 𝑑𝑖𝑗 − (𝑙𝑖 + 𝑙𝑗 ) ≥ 𝟎, s.t. 𝑢𝑚𝑖𝑛 ≤ 𝑢 ≤ 𝑢𝑚𝑎𝑥 Taylor Expansion Safety space (feasible) Collision Path (unfeasible) 𝑎 𝛿 ℝ2 ො 𝑢𝑖 ො 𝑢 𝑖 𝑠𝑎𝑓𝑒 𝐴𝑢 ≥ 𝑏, s.t. 𝑢𝑚𝑖𝑛 ≤ 𝑢 ≤ 𝑢𝑚𝑎𝑥 Quadratic Problem ො 𝑢 𝑖 𝑠𝑎𝑓𝑒 𝑡 = arg min 𝑢𝑖 1 2 𝑢𝑖 𝑡 −ෝ 𝑢𝑖 𝑡 2 j ct unsafe c t t t safety t. Trick: Approximate Gradients Straight Through Estimator (STE): 𝜕 𝜕ො 𝑢 𝑖 𝑠𝑎𝑓𝑒 𝐿𝑜𝑠𝑠 ≈ 𝜕 𝜕ො 𝑢𝑖 𝐿𝑜𝑠𝑠 Computational graph is broken. Bypass!

(HKSTS Outstanding Student Paper Award) Physica...

(HKSTS Outstanding Student Paper Award) Physically Consistent Differential-Game Surrogates for Interaction-Aware AV Trajectory Planning

More Decks by FuruhashiFumihito

Other Decks in Science

Featured

Transcript