LLMとプランニングの未来

株式会社 Carnot Data is beautiful. LLMとプランニングの未来 Next in LLM 〜⼤規模⾔語モデルの研究トレンドから未来を考える〜
2023/8/17 Future of Procedual Planning in LLM

⾃⼰紹介 1994年⽣まれ．AI (深層学習)の研究開発及び，デジタルコンサルティングに従事．学部⽣時代に脳波計のスタートアップで0からプロダクト設計および深層学習モデルの実装をリード．⼤学での研究内容がアクセラレータプログラムに採択され，ソーシャルロボットのスタートアップを設⽴．複数の企業・⼤学・⾃治体に対してPoCを実施．慶應義塾⼤学理⼯学研究科で博⼠ (⼯学) を取得(⾶び級)．Vision and Languageの研究に従事．⼈⼯知能分野における難関国際会議ICCV等に筆頭著者として論⽂が採択．
データドリブンな環境保全をテーマとしたDAOのコンセプトがWIRED CGC特別賞受賞．⼈⼯知能学会・認知科学会会員．松森匠哉 Shoya MATSUMORI, Ph.D. (2018.02-2022.03) PGV (株) Lead Machine Learning Researcher リードエンジニアとして0からプロダクト設計および深層学習モデルの実装をリード． AIによる認知症診断アルゴリズム，睡眠ステージの判別アルゴリズムの研究開発を⾏い筆頭著者として学術論⽂誌に採択． (2018.08-2022.09) 慶應義塾先端科学技術研究センター特任研究員内閣府SIP 特任研究員．深層学習による英語の⾃動作問技術 (特許出願中) の研究開発をリード．都内の⾼校にてPoCを実施． (2019.02-2020.07) (株) BLUEM 代表取締役 (株) dipのAIアクセラレータプログラムに採択．複数の企業・⼤学・⾃治体に対してAI ソリューションを提供．豊⽥市などでソーシャルロボットのPoCを実施． (2020.12-2022.08) (株) STANDARD Lead Researcher デジタルコンサルタントとして，複数の⼀部上場企業にAIソリューションを提供． (2021.04-2022.09) ⽇本学術振興会特別研究員 (DC) 視覚と⾔語の統合的理解を⽬指すVision and Languageの研究に従事．難関国際会議 ICCV等に採択．主な経歴受賞歴 • WIRED CGC INTERSPACE UTOKYO-IIS AWARD 受賞 • HCI研究会奨励賞受賞主な研究業績 • Matsumori, Shoya, et al. "Unified questioner transformer for descriptive question generation in goal-oriented visual dialogue." Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021. • Matsumori, Shoya, et al. "LatteGAN: Visually Guided Language Attention for Multi- Turn Text-Conditioned Image Manipulation." IEEE Access 9 (2021): 160521-160532. • Matsumori, Shoya, et al. "Predictive Diagnostic Approach to Dementia and Dementia Subtypes Using Wireless and Mobile Electroencephalography: A Pilot Study." Bioelectricity 4.1 (2022): 3-11. 株式会社Carnot (カルノー) Founder & CEO Carnot Inc. Carnot Inc. 2023. All rights reserved. Do not distribute.

ChatGPTは本当に使われているのか︖ Carnot Inc. 2023. All rights reserved. Do not distribute.
アーリアドプターを急速に刈り取ったが，まだ利⽤していない⼈が⼤半 https://www.demandsage.com/chatgpt-statistics/ Signal 1. ChatGPTのトラフィックの伸びは10%低下 Signal 2. ⽇本全体の利⽤率も12%にとどまる出所）NRI「インサイトシグナル調査」2023年4⽉15~16⽇

なぜ使われないのか︖ Carnot Inc. 2023. All rights reserved. Do not distribute.
プロンプトエンジニアリングが”実は難しい” • プロンプトエンジニアリングで求められる要素 • (1) LLMハック • CoTやインジェクション • (2) ドメイン知識 • 課題を解くのに必要⼗分な背景説明 • Few shot examples • タスク実⾏時のTips • 実は (2) が圧倒的にめんどい & 難しい • そもそも解こうとしている問題に対しての事前知識があれば，指⽰出しを考える時間 vs ⾃分で実⾏する時間の戦いになる． • ⾃分でやったほうが早く感じる技術⾯接を考えています。現在の候補者はウェブ開発の経験が少しありますが、当社の仕事ではバックエンドの開発が必要です。ウェブ開発に関する詳細な質問をすることなく、候補者が適格かどうかを知りたいですいくつかの質問例とその答えを教えてください。 • 注意︓ • 質問はシナリオベースで、ハイレベルな実装戦略を問うものにしてください。 • ライブラリやサービス（bootstrap, CDNやKafkaなど）に関する特定の知識が必要な質問はなるべく避けてください。 • 各シナリオをできるだけ詳しく説明してください。 • 答えを導くのに有⽤なヒントを与えてください。 • "undoとredoの機能はどのように実装しますか︖"のような質問から始めましょう。

この指⽰の感覚はなにか︖ Carnot Inc. 2023. All rights reserved. Do not distribute.
社員のレベルに応じて仕事を任せるときの指⽰と似ている Goal Plan / Procedure Execution Associate (新卒社員) 採⽤範囲外 Junior Senior ChatGPT ここの部分がプロンプトエンジニアリング

この指⽰の感覚はなにか︖ Carnot Inc. 2023. All rights reserved. Do not distribute.
社員のレベルに応じて仕事を任せるときの指⽰と似ている Goal Plan / Procedure Execution Associate (新卒社員) 採⽤範囲外 Junior Senior ChatGPT 本当はこうしたい必要なのはプランニング

プランニングとはなにか︖ Carnot Inc. 2023. All rights reserved. Do not distribute.
ゴールを提⽰されたときに，どのような順番でどの⾏動をするか分解できる能⼒ e.g., Goal 『映画を⾒る』 Goal Plan step 1 step 2 step 3

プランニングとLLM • LLM+Embodied Agent • LLMを使って環境とインタラクションするエージェントをつくる (実⾏まで含む) • e.g.,
Language Models as Zero-Shot Planners [Huang+22] • e.g., Do As I Can, Not As I Say [Ahn+22] • LLM Only • LLM単体でプランニングの精度を上げる • e.g.,Language Models of Code are Few- Shot Commonsense Learners [Madaan+22] • e.g., Tree of Thoughts [Yao+23] • e.g., PLASMA [Brahman+23] • e.g., ToolLLaMA [Qin+23] Carnot Inc. 2023. All rights reserved. Do not distribute. LLMでプランニングをする研究が増えつつある LLMでプランニングし，エージェントをシミュレーション環境で動かす研究 [Huang+22]

Fine-tuning • プランニングの性能が低い︕GPT-4コスパが悪い︕ Carnot Inc. 2023. All rights reserved. Do
not distribute. データセットを⽤意して学習するアプローチ問題点解決策 • ⽤途別データセットからチューニング • ToolLLaMA: マルチステップのAPIコールに対応 [Qin+23] • PLASMA: 知識蒸留のアプローチ [Brahman+23] • 6.7Bのモデル(text-curie)を3B, 770Mに蒸留 • 教師モデルを超える性能を確保図. ToolLaMAの性能．LLaMAのチューニングでもChatGPTと同等の精度まで到達図．ToolLLMの概要．左がデータ⽣成・API検索モデルの訓練・ ToolLLaMAの訓練のダイアグラム，右が推論パイプライン．図．知識蒸留図. PLASMAの性能．知識蒸留によってより⼩さいモデルでも⼤きな性能を発揮．

Parameter/dataset is all you need? Carnot Inc. 2023. All rights
reserved. Do not distribute. 安⼼してください，(まだ研究スペースが) 空いてますよ

プランニングの歴史 • 古き良き⼈⼯知能 (1957-1969) • いわゆる Good old fashioned AI
• プランニング，推論や探索が重点的に扱われた時代 • NewellやSimonがGeneral problem solverを⽤いてアプローチ Carnot Inc. 2023. All rights reserved. Do not distribute. プランニングは⼈⼯知能の始まりだった︖ e.g., 積み⽊の世界 https://www.rose-hulman.edu/class/cs/archive/other-old/archive/fall06/materials/search_project.htm (Start (On Table Red-1) (On Table Yellow-1) (On Yellow-1 Blue-1) (On Table Green-1) (On Green-1 Red-2) (On Red-2 Blue-2)) (Goal (On Table Blue-x1) (On Table Green-x2) (On Table Red-x3) (On Red-x3 Red-x4) (On Red-x4 Blue-x5) (On Table Yellow-x6)) 初期状態ゴール状態 Herbert A. Simon Allen Newell

古典的AIでできることが最新のAIではできない Carnot Inc. 2023. All rights reserved. Do not distribute.
GPT-4はハノイの塔が解けないハノイの塔とは︖ 円盤をルールに従って動かすゲーム． 3本の柱(ロッド)があり，⼀番左の柱には，⼤中⼩の円盤が刺してある．最短の⼿順で，この円盤を右の柱に移動するためにはどうすればよいか．ただし，以下の条件がある． o (条件1) ⼀度に⼀枚の円盤しか動かせない o (条件2) ⼩さい円盤の上に⼤きい円盤を載せられないスタートゴール (参考) 古典的AIのアプローチ GPT-4の推論結果各状態をグラフ構造として表現することでコンピュータが扱えるようにする．グラフ上を最短で移動できるアルゴリズムを基に，最⼩回数での移動⽅法を導き出す．他にも様々な解法が存在． 3つの円盤のグラフ構造の⼀部通常とは異なる初期状態でスタートさせると上⼿く⾏かない (本質的には難易度は変わらないことに注意) 最⼩の円盤（半径1）をロッドAからロッドCに移動させる。ロッドBからロッドAへ中ディスク (半径2)を移動させる。最⼩の円盤（半径1）をロッドCからロッドAへ移動させる。最⼤のディスク(半径3)をロッドA からロッドCに移動させる。 (条件1)に反して実⾏できない︕ To solve this Tower of Hanoi problem with three rods (A, B, and C) and three disks of radii 1, 2, and 3, follow these steps: Move the smallest disk (radius 1) from Rod A to Rod C. Move the medium disk (radius 2) from Rod B to Rod A. Move the smallest disk (radius 1) from Rod C to Rod A. … A B C A B C スタート Newell, A., Shaw, J. C., & Simon, H. A. (1959, June). Report on a general problem solving program. In IFIP congress (Vol. 256, p. 64).

Inference time algorithms • プランニングの際の問題 • Hallucination, Repeating the same
action, Inappropriate order… • 単⼀の出⼒だけ保持するのが良くないのでは︖ Carnot Inc. 2023. All rights reserved. Do not distribute. 推論時に⼯夫を加え，LLMの性能を引き出す探索と評価による最適解の模索問題点解決策 • (探索) LLMの出⼒を複数持ち探索 • Tree of Thoughts: ⽊構造などでプランの系列を保持 • c.f., Beam search • (評価) 探索したパスを外部機構で評価 • 外部機構で評価 [Yao+23] • モデルで評価 [Brahman+23] • BFSやDFSなどのアルゴリズムで評価 [Yao+23] 図. 2番⼿以降の候補の世界線も保持する図. ToTは様々なタスクで性能を発揮

External planner • LLMはコンテクストが与えられてもっともらしいテキストを⽣成するモデルに過ぎないので，そもそもプランニングに向いていない Carnot Inc. 2023. All
rights reserved. Do not distribute. プランニングを外部の機構に任せる LLM+P: 外部機構によるプランニングの実⾏ [Liu+23] 問題点解決策 • Planning Domain Definition Language (PDDL)での記述 • 1998年に作られたプランニングの⾔語 • STRIPSなどに影響を受けた図. PDDLで記述された問題をプログラムで解くことができる GPT-4によるPDDLの記述

(Planner) LLMの今後 Carnot Inc. 2023. All rights reserved. Do not
distribute. シンボリックAIと融合し記号処理とパターン処理を扱えるAIへ 1950 1960 1970 1980 1990 2000 2010 2020 2023- 1946 ENIAC 1956 Dartmouth Conference Symbolic AI (1956-1974) Rule-based AI (1980-1987) Deep neural network (2006- ) LLM (2020-) Neuro-symbolic AI (2023-) 記号・⾔語の知識体系などが得意判断が曖昧なパターン処理が得意シンボリックAIとニューラルネットワークの融合 Expert system Ontology CNN Transformer RBM GAN

まとめ • LLMとプランニングの未来 • AssociateGPTをJuniorGPTにするためにはPlanningが必要 • 今後は，温故知新でGen1/2のAIの研究をもとに新しいアルゴリズムが⽣まれていく • 冬を耐え抜いてきた⼈が復活する︕
• 宣伝 • ワークフロー⾃動化サービスのβテスターを募集しています • ⽇経新聞にも掲載されました︕ • Plannerエージェントを使ったチャットベースの⾃動化ツール • https://usepromptflow.com/ • 機械学習エンジニア・プロダクトエンジニア募集中 • https://carnot.ai • お気軽にお問い合わせ(DMでもなんでも)ください︕ • インターンも募集中です︕ Carnot Inc. 2023. All rights reserved. Do not distribute. twitterでも最新情報を発信しています！ @pineforesta お気軽にDMください！

References 1) Radford, A., Narasimhan, K., Salimans, T., & Sutskever,
I. (2018). Improving language understanding by generative pre-training. 2) Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., & Sutskever, I. (2019). Language models are unsupervised multitask learners. OpenAI blog, 1(8), 9. 3) Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariwal, P., ... & Amodei, D. (2020). Language models are few-shot learners. Advances in neural information processing systems, 33, 1877-1901. 4) Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C. L., Mishkin, P., ... & Lowe, R. (2022). Training language models to follow instructions with human feedback. arXiv preprint arXiv:2203.02155. 5) Weizenbaum, J. (1966). ELIZA̶a computer program for the study of natural language communication between man and machine. Communications of the ACM, 9(1), 36-45. 6) Nye, M., Tessler, M., Tenenbaum, J., & Lake, B. M. (2021). Improving coherence and consistency in neural sequence models with dual-system, neuro-symbolic reasoning. Advances in Neural Information Processing Systems, 34, 25192-25204. 7) Frederick, S. (2005). Cognitive reflection and decision making. Journal of Economic perspectives, 19(4), 25-42. 8) Kahneman, D. (2011). Thinking, fast and slow. macmillan. 9) Wu, S., Irsoy, O., Lu, S., Dabravolski, V., Dredze, M., Gehrmann, S., ... & Mann, G. (2023). BloombergGPT: A Large Language Model for Finance. arXiv preprint arXiv:2303.17564. 10) Wei, J., Wang, X., Schuurmans, D., Bosma, M., Chi, E., Le, Q., & Zhou, D. (2022). Chain of thought prompting elicits reasoning in large language models. arXiv preprint arXiv:2201.11903. 11) Shen, Y., Song, K., Tan, X., Li, D., Lu, W., & Zhuang, Y. (2023). HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in HuggingFace. arXiv preprint arXiv:2303.17580. 12) Qin, Y., Liang, S., Ye, Y., Zhu, K., Yan, L., Lu, Y., ... & Sun, M. (2023). ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs. arXiv preprint arXiv:2307.16789. 13) Liu, B., Jiang, Y., Zhang, X., Liu, Q., Zhang, S., Biswas, J., & Stone, P. (2023). Llm+ p: Empowering large language models with optimal planning proficiency. arXiv preprint arXiv:2304.11477. Carnot Inc. Carnot Inc. 2023. All rights reserved. Do not distribute.

LLMとプランニングの未来

LLMとプランニングの未来

Shoya Matsumori

More Decks by Shoya Matsumori

Other Decks in Research

Featured

Transcript

株式会社 Carnot Data is beautiful. LLMとプランニングの未来 Next in LLM 〜⼤規模⾔語モデルの研究トレンドから未来を考える〜

ChatGPTは本当に使われているのか︖ Carnot Inc. 2023. All rights reserved. Do not distribute.

なぜ使われないのか︖ Carnot Inc. 2023. All rights reserved. Do not distribute.

この指⽰の感覚はなにか︖ Carnot Inc. 2023. All rights reserved. Do not distribute.

この指⽰の感覚はなにか︖ Carnot Inc. 2023. All rights reserved. Do not distribute.

プランニングとはなにか︖ Carnot Inc. 2023. All rights reserved. Do not distribute.

プランニングとLLM • LLM+Embodied Agent • LLMを使って環境とインタラクションするエージェントをつくる (実⾏まで含む) • e.g.,

Fine-tuning • プランニングの性能が低い︕GPT-4コスパが悪い︕ Carnot Inc. 2023. All rights reserved. Do

Parameter/dataset is all you need? Carnot Inc. 2023. All rights

プランニングの歴史 • 古き良き⼈⼯知能 (1957-1969) • いわゆる Good old fashioned AI

古典的AIでできることが最新のAIではできない Carnot Inc. 2023. All rights reserved. Do not distribute.

Inference time algorithms • プランニングの際の問題 • Hallucination, Repeating the same

External planner • LLMはコンテクストが与えられてもっともらしいテキストを⽣成するモデルに過ぎないので，そもそもプランニングに向いていない Carnot Inc. 2023. All

(Planner) LLMの今後 Carnot Inc. 2023. All rights reserved. Do not

まとめ • LLMとプランニングの未来 • AssociateGPTをJuniorGPTにするためにはPlanningが必要 • 今後は，温故知新でGen1/2のAIの研究をもとに新しいアルゴリズムが⽣まれていく • 冬を耐え抜いてきた⼈が復活する︕

References 1) Radford, A., Narasimhan, K., Salimans, T., & Sutskever,