Pieter Abbeel. "Denoising diffusion probabilistic models." NeurIPS 2020. ⽣成時は、完全なノイズから始めてノイズを少しずつ取り除く t = T t = T-1 t = T-2 t = 1 t = 2 時刻をTから少しずつ減らしながら、ノイズ予測とそのノイズの除去を繰り返す
language models for decision making in autonomous driving." arXiv preprint arXiv:2312.06351 (2023). 物体検出器で検出した物体と、ユーザーからの運転指⽰をプロンプトとして与え、運転操作コマンドの選 択と、その理由の記述をLLMに⾏わせて、実際にその操作に従って運転を⾏う試み
for Autonomous Driving." arXiv preprint arXiv:2408.10845 (2024) 画像 - ⾔語 - アクションを統合したデータセットを作成 ポスターも出しています([S5-P04]) Language Action “ The ego vehicle is moving slowly and turning right. There is a traffic light displaying a green signal … “ Frame-level captions Future trajectories Object of concern Scene recognition Reasoning captions Rule-based algorithm Behavior captions Sensor fusion Reconstructed trajectory Sensor signals Control information Throttle/brake position Steering angle Turn signal Vision 30s x 10,000 videos Radar Leading vehicle Position Speed Position Signal Object detection model Traffic light VLM
Dataset for Autonomous Driving." arXiv preprint arXiv:2408.10845 (2024) Ground truth caption: The ego vehicle is moving straight at a moderate speed following leading car with acceleration. There is a traffic light near the ego vehicle displaying a green signal. … Predicted caption: The ego vehicle is moving at a moderate speed and turning right. There is a traffic light near the ego vehicle displaying a green signal. … VLAMが予測した軌跡 VLAモデルを使って、⾔語で運転判断をさせ、さらにそれを実際の操作にまで反映させる 実際の軌跡
Improving generative masked language models with diffusion models." ACL 2023. Discrete Diffusion Model Continuous Diffusion Model マスクを徐々に減らすように⽣成 トークンEmbeddingのノイズ除去で⽣成 Li, Xiang, et al. "Diffusion-lm improves controllable text generation." NeurIPS 2022. トークン間の依存関係を反映しづらく精度では劣るが⻑い系列の⽣成速度が強み