論文解説 DiLu: A Knowledge-Driven Approach to Autonomous Driving with Large Language Models

Slide 1

Slide 1 text

論⽂解説 DiLu: A Knowledge-Driven Approach to Autonomous Driving with Large Language Models Takehiro Matsuda

Slide 2

Slide 2 text

2 論⽂情報タイトル： DiLu: A Knowledge-Driven Approach to Autonomous Driving with Large Language Models • 論⽂： https://arxiv.org/html/2309.16292v3 • コード： https://github.com/PJLab-ADG/DiLu • 投稿学会： ICLR2024 • 著者： Licheng Wen, Daocheng Fu1, Xin Li, Xinyu Cai, Tao Ma, Pinlong Cai, Min Dou, Botian Shi, Liang He, Yu Qia • 所属： Shanghai Artificial Intelligence Laboratory, East China Normal University, The Chinese University of Hong Kong 選んだ理由： • knowledge-driven approachと名付けられているがどのように実現しているか知りたいため

Slide 3

Slide 3 text

3 knowledge-driven Drawing inspiration from the profound question posed by LeCun (2022): “Why can an adolescent learn to drive a car in about 20 hours of practice and know how to act in many situations he/she has never encountered before?” 画像認識など⾃動運転に関する技術について、⼤量データとDNNによるdata-drivenで⾼い性能が⽰されてきた。ただし、学習していないシーン(エッジケース/レアケース)では性能がでないこともあり、⼈間の学習と違う弱点もある。

Slide 4

Slide 4 text

4 LLM is as embodiment of human knowledge https://palm-e.github.io/ PaLM-E https://github.com/OpenGVLab/Instruct2Ac Instruct2Act Put the polka dot block into the green container. https://voyager.minedojo.org/ Voyager 現在LLM(Large Language Model)が⼈間の知識を最も汎化して所持しているとして、その応⽤をする研究がある。

Slide 5

Slide 5 text

5 Empower LLM to Autonomous driving (1) an environment with which an agent can interact; (2) a driver agent with recall, reasoning, and reflection abilities; (3) a memory component to persist experiences. ただし、LLMにdriving taskをそのまま解かせようとしてもそれほど良い性能にならない。 LLMによりknowledge-drivenなdecision-makingを実現するために以下のコンポーネントを設計した。

Slide 6

Slide 6 text

6 The framework of DiLu 交通状況を⾔語化して、LLMに与えられるようにする。 Memory Moduleから過去の似た状況をとりだし、付帯する。⾃⾞の動きの指⽰を得る。指⽰に従った動作により問題が⽣じた場合は何が問題でどう修正するのがよいか考察させ、修正した内容でMemoryに格納する。

Slide 7

Slide 7 text

7 Demo screen Highway-env

Slide 8

Slide 8 text

8 Memory module Initialization Memory recall Memory storage We select a few scenarios and manually outline the correct reasoning and decision-making processes for these situations to form the initial memory. 公道に出る前に教習所で学ぶようにいくつのシナリオについて、マニュアルで正しい reasoningとdecision-makingを作ってinitial memoryとして保存する。 Before making a decision, the current driving scenario is embedded into a vector, which serves as the memory key. This key is then clustered and searched to find the closest scenarios in the memory module and their corresponding reasoning processes, or memories. 過去のシナリオをvectorとして埋め込み、似たシナリオを検索可能にする。正しいreasoningとdecisionされたシナリオを保存する。運転経験の蓄積過去の運転経験の活⽤ベースとなる運転⽅法を教える

Slide 9

Slide 9 text

9 Reasoning module (1) encode the scenario by a descriptor; (2) recall several experience from the Memory module; (3) generate the prompt; (4) feed the prompt into the LLM; (5) decode the action from the LLMʼs response Memory moduleからの経験とLLMのcommon-sense knowledgeを利⽤して、traffic scenarioの decision-makingを⾏う。

Slide 10

Slide 10 text

10 Reflection module However, our goal is to make the autonomous driving system learn from mistakes on its own. We discover that LLM can effectively act as a mistake rectifier. 衝突などを起こしてdecision-makingに間違いがあった場合は、LLMによりその状況を説明させ、修正内容を⽰させる。修正した内容をMemoryに格納することで似た状況で間違いが起こりづらくなる。

Slide 11

Slide 11 text

11 Experiments Closed-loopのsimulation environmentとしてHighway-envを使う。 • Memory moduleから引き出すshot数の違いを⽐較 0-shot, 1-shot, 3-shots, 5-shots • Memory initialization 5 human-crafted experiences • Memory stored experiencesの違いを⽐較 5, 20, 40 experiences 10 times with different seedsで実験する. https://github.com/Farama-Foundation/HighwayEnv

Slide 12

Slide 12 text

12 Experiments GPT-3.5 GPT-4 Chroma ベクトルDB Chromaの使い⽅について https://note.com/mahlab/ n/nb6677d0fc7c2 OpenAIのtext-embedding-ada- 002 modelを使ってvectorに変換され格納される。 Highway-env 各⾞のposition, speed, accelerationが与えられる。

Slide 13

Slide 13 text

13 Reasoning module prompt example LLM(GPT)に対してのtaskの説明 (固定の内容) ⼊⼒や望ましい出⼒形式など

Slide 14

Slide 14 text

14 Reasoning module prompt example Highway-envの現在フレームの状況を記述したtext ベクトル化してqueryとして Memoryに与え、保管されているシナリオから類似するものを取り出す。運転の指針：衝突を避け安全運転など (変更することもできる) 選択できる⾏動； IDLE, Turn-right, Acceleration, Deceleration, ・・・ COT(Chain of Thouhght)として、System promptsに続いてLLMに与えられる。

Slide 15

Slide 15 text

15 Example of extracted similar experiences from Memory 過去の似たシナリオとして抽出された 3shotsで２つがIdle, 1つがDeceleration を選択している。

Slide 16

Slide 16 text

16 Reflection module prompt example LLM(GPT)に対してのtaskの説明 (固定の内容)

Slide 17

Slide 17 text

17 Case study1 Reasoning 前⽅⾞との距離はあり、⾃⾞より少しだけ速い右レーンは前⽅⾞と距離は少しあり、⾃⾞より結構速い右レーンに移動するという決定

Slide 18

Slide 18 text

18 Case study2 Reasoning Driving intensionをHighwayから出るために、⼀番右のレーンに移動する必要があると変更右レーンの前⽅⾞との距離はある、⾃⾞よりは遅い。右レーンに移動すると決定

Slide 19

Slide 19 text

19 Case study3 Reflection このシナリオについて、もとの Decision-makingは右レーンに移動して衝突してしまっている。

Slide 20

Slide 20 text

20 Case study3 Reflection 衝突の解析と教訓右レーンにいる⾞との相対距離と速度が考慮されていない。 (計算はしているが、 Appropriateという判断がされている) 右レーンにいる⾞との相対距離と速度、Time to collisionの計算がされ、右レーンへの移動は危険と判断し、減速と決定する。

Slide 21

Slide 21 text

21 Results 30 stepsでひとつのdriving-taskはcompleteになる。 20 experiences以上からの5-shotsは driving taskを完了できている。 40 experiencesではどのshot数でも中央値が25を超えている。 0-shotでは中央値が5以下

Slide 22

Slide 22 text

22 Compare with Reinforcement learning method Highway-envでSOTAのReinforcement Learning(RL) methodのGRAD(Graph Representation for Autonomous Driving)と⽐較する。 it generates a global scene representation that includes estimated future trajectories of other vehicles. • lane-4-density-2で両⼿法をtrainingする。 • lane-4-density-2, lane-5-density-2.5, lane-5-density-3の3つの環境でテストする。 • DiLu: 40 experience in Memory, GRAD 600.000 training episodes GRADは異なる環境での性能劣化が⼤きい。失敗の多くは時間内にブレーキをかけられずに前⽅⾞に衝突してしまう。

Slide 23

Slide 23 text

23 Experiments on Generalization lane-4-density-2の環境での20 experiencesからlane-5-density-3の環境で適応できるか。中央値:13→5 中央値:30→23 それなりに低下は⾒られるが、利⽤できる shots数が多ければ低下度合いは⼩さい。

Slide 24

Slide 24 text

24 Experiments on Transformation Memory moduleに格納されるシナリオは⾃然⾔語で記載されており、環境が変わってもOKなはず。 Highway-envとCitySimの２つの環境でそれぞれ20experiencesを取得し、 lane-4-density-2とlane-5-density-3のシナリオでテストする。シナリオによる成功Step数のばらつきは⼤きめに⾒えるが、CitySimの実世界の⾞の軌跡が lane-5-density-3のような複雑なシナリオにも効果があるようにも⾒える。 https://github.com/UCF-SST- Lab/UCF-SST-CitySim1-Dataset CitySim: ドローンから実際の道路状況を撮影したデータをもとにしている

Slide 25

Slide 25 text

25 Effectiveness of Reflection module ベースラインとしての20個のexperiences +12個のsuccessと6個のcorrection experienc +12個のsuccess experiences + 6個のcorrection experiences memoryにexperiencesを追加する効果が⾒られる。少数でも訂正したexperiencesを加える効果がある。

Slide 26

Slide 26 text

26 所感 GPTをAPI経由で使っているので、latencyは遅い。(5-10秒かかる) memory数をもっと⼤きめの設定の実験は難しい？(さらにlatencyが遅くなる？) 実験のHighway-envのsteps数や利⽤するmemory数は少なめ本当の⾃動運転のdecision-makingまでは課題もある。 data-drivenに対するknowledge-drivenだが、⼤量のデータを学習しているGTPを使ってはいる。 (task-specificなデータ・学習は少ないため、Generalized knowledge) GPTを⼈のGeneralized knowledgeとしてフル活⽤