Slide 15
Slide 15 text
Model Size: 2.5B (MST: 1.3B, VisDiT: 1.2B, TrajDiT: 50M)
Training Data: nuPlan, nuScenes (700 scenes), image resolution 512 × 1024
Training: NVIDIA A100 48 GPUs, 2 weeks, 600K iterations, batch size 96
Chain-of-Forward Training: every 10 steps, 3 forward passes each time
Training & Implementation Details
Evaluation on Video Generation
Dataset: nuPlan test: 1,628 scenes, nuScenes val: 1,646 scenes
Metrics: Frechet Video Distance (FVD), Frechet Inception Distance (FID)
Evaluation on Trajectory Planning
Benchmarks: nuScenes (L2 distance, collision rate), NAVSIM
12