SSII2023 [OS1] GPT-4とVision-and-Languageの未来

Slide 1

Slide 1 text

GPT-4と Vision-and-Languageの未来 2023.6.14 ⻄⽥京介（NTT⼈間情報研究所）

Slide 41

Slide 41 text

1. OpenAI, Introducing ChatGPT. Available at https://openai.com/blog/chatgpt, November 30, 2022 2. OpenAI: GPT-4 Technical Report. CoRR abs/2303.08774 (2023) 3. Tom B. Brown et al.: Language Models are Few-Shot Learners. NeurIPS 2020 / CoRR abs/2005.14165 (2020) 4. Long Ouyang et al.: Training language models to follow instructions with human feedback. CoRR abs/2203.02155 (2022) 5. Sébastien Bubeck et al.: Sparks of Artificial General Intelligence: Early experiments with GPT-4. CoRR abs/2303.12712 (2023) 6. Daniel Martin Katz+, GPT-4 Passes the Bar Exam. http://dx.doi.org/10.2139/ssrn.4389233, March 15, 2023 7. Alexander Pan et al.: Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior in the MACHIAVELLI Benchmark. CoRR abs/2304.03279 (2023) 8. Baolin Peng et al.: Instruction Tuning with GPT-4. CoRR abs/2304.03277 (2023) 9. Yizhong Wang et al.: Self-Instruct: Aligning Language Model with Self Generated Instructions. CoRR abs/2212.10560 (2022) 10. Ofir Press et al: Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation. ICLR 2022 11. MosaicML, Introducing MPT-7B: A New Standard for Open-Source, Commercially Usable LLMs, https://www.mosaicml.com/blog/mpt-7b 12. Geewook Kim et al.: OCR-Free Document Understanding Transformer. ECCV (28) 2022: 498-517 13. Shaohan Huang et al.: Language Is Not All You Need: Aligning Perception with Language Models. CoRR abs/2302.14045 (2023) 14. Wayne Xin Zhao et al.: A Survey of Large Language Models. CoRR abs/2303.18223 (2023) 15. Hugo Touvron et al.: LLaMA: Open and Efficient Foundation Language Models. CoRR abs/2302.13971 (2023) 16. OpenAI: ChatGPT plugins. https://openai.com/blog/chatgpt-plugins, March 23, 2023. 17. Ryota Tanaka et al.: SlideVQA: A Dataset for Document Visual Question Answering on Multiple Images. AAAI 2023. 18. ⽥中涼太 et al., SlideVQA: 複数の⽂書画像に対する質問応答, NLP 2023. 19. Microsoft: Introducing Microsoft 365 Copilot – your copilot for work. https://blogs.microsoft.com/blog/2023/03/16/introducing-microsoft-365- copilot-your-copilot-for-work/, March 16, 2023. 20. Microsoft: Bringing the power of AI to Windows 11 – unlocking a new era of productivity for customers and developers with Windows Copilot and Dev Home, https://blogs.windows.com/windowsdeveloper/2023/05/23/bringing-the-power-of-ai-to-windows-11-unlocking-a-new-era-of- productivity-for-customers-and-developers-with-windows-copilot-and-dev-home/, May 23, 2023. 21. Auto-GPT. https://github.com/Significant-Gravitas/Auto-GPT 22. Haotian Liu et al.: Visual Instruction Tuning. CoRR abs/2304.08485 (2023) 23. Alec Radford et al.: Learning Transferable Visual Models From Natural Language Supervision. ICML 2021: 8748-8763 24. Deyao Zhu et al.: MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models. CoRR abs/2304.10592 (2023) 25. Junnan Li et al.: BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models. CoRR abs/2301.12597 (2023) 26. UC Berkeley, CMU, Stanford, MBZUAI, and UC San Diego: Vicuna: An Open-Source Chatbot Impressing GPT-4 with 90%* ChatGPT Quality, https://vicuna.lmsys.org/, 2023/03/19 27. Guanzhi Wang et al.: Voyager: An Open-Ended Embodied Agent with Large Language Models. CoRR abs/2305.16291 28. Peter Shaw et al.: From Pixels to UI Actions: Learning to Follow Instructions via Graphical User Interfaces. CoRR abs/2306.00245 (2023) 29. Rohit Girdhar et al.: ImageBind: One Embedding Space To Bind Them All. CoRR abs/2305.05665 (2023) 30. Hang Zhang et al.: Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding. CoRR abs/2306.02858 (2023) 41 参考⽂献

Slide 1

Slide 1 text

Slide 2

Slide 2 text

Slide 3

Slide 3 text

Slide 4

Slide 4 text

Slide 5

Slide 5 text

Slide 6

Slide 6 text

Slide 7

Slide 7 text

Slide 8

Slide 8 text

Slide 9

Slide 9 text

Slide 10

Slide 10 text

Slide 11

Slide 11 text

Slide 12

Slide 12 text

Slide 13

Slide 13 text

Slide 14

Slide 14 text

Slide 15

Slide 15 text

Slide 16

Slide 16 text

Slide 17

Slide 17 text

Slide 18

Slide 18 text

Slide 19

Slide 19 text

Slide 20

Slide 20 text

Slide 21

Slide 21 text

Slide 22

Slide 22 text

Slide 23

Slide 23 text

Slide 24

Slide 24 text

Slide 25

Slide 25 text

Slide 26

Slide 26 text

Slide 27

Slide 27 text

Slide 28

Slide 28 text

Slide 29

Slide 29 text

Slide 30

Slide 30 text

Slide 31

Slide 31 text

Slide 32

Slide 32 text

Slide 33

Slide 33 text

Slide 34

Slide 34 text

Slide 35

Slide 35 text

Slide 36

Slide 36 text

Slide 37

Slide 37 text

Slide 38

Slide 38 text

Slide 39

Slide 39 text

Slide 40

Slide 40 text