2. OpenAI: GPT-4 Technical Report. CoRR abs/2303.08774 (2023) 3. Tom B. Brown et al.: Language Models are Few-Shot Learners. NeurIPS 2020 / CoRR abs/2005.14165 (2020) 4. Long Ouyang et al.: Training language models to follow instructions with human feedback. CoRR abs/2203.02155 (2022) 5. Sébastien Bubeck et al.: Sparks of Artificial General Intelligence: Early experiments with GPT-4. CoRR abs/2303.12712 (2023) 6. Daniel Martin Katz+, GPT-4 Passes the Bar Exam. http://dx.doi.org/10.2139/ssrn.4389233, March 15, 2023 7. Alexander Pan et al.: Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior in the MACHIAVELLI Benchmark. CoRR abs/2304.03279 (2023) 8. Baolin Peng et al.: Instruction Tuning with GPT-4. CoRR abs/2304.03277 (2023) 9. Yizhong Wang et al.: Self-Instruct: Aligning Language Model with Self Generated Instructions. CoRR abs/2212.10560 (2022) 10. Ofir Press et al: Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation. ICLR 2022 11. MosaicML, Introducing MPT-7B: A New Standard for Open-Source, Commercially Usable LLMs, https://www.mosaicml.com/blog/mpt-7b 12. Geewook Kim et al.: OCR-Free Document Understanding Transformer. ECCV (28) 2022: 498-517 13. Shaohan Huang et al.: Language Is Not All You Need: Aligning Perception with Language Models. CoRR abs/2302.14045 (2023) 14. Wayne Xin Zhao et al.: A Survey of Large Language Models. CoRR abs/2303.18223 (2023) 15. Hugo Touvron et al.: LLaMA: Open and Efficient Foundation Language Models. CoRR abs/2302.13971 (2023) 16. OpenAI: ChatGPT plugins. https://openai.com/blog/chatgpt-plugins, March 23, 2023. 17. Ryota Tanaka et al.: SlideVQA: A Dataset for Document Visual Question Answering on Multiple Images. AAAI 2023. 18. ⽥中涼太 et al., SlideVQA: 複数の⽂書画像に対する質問応答, NLP 2023. 19. Microsoft: Introducing Microsoft 365 Copilot – your copilot for work. https://blogs.microsoft.com/blog/2023/03/16/introducing-microsoft-365- copilot-your-copilot-for-work/, March 16, 2023. 20. Microsoft: Bringing the power of AI to Windows 11 – unlocking a new era of productivity for customers and developers with Windows Copilot and Dev Home, https://blogs.windows.com/windowsdeveloper/2023/05/23/bringing-the-power-of-ai-to-windows-11-unlocking-a-new-era-of- productivity-for-customers-and-developers-with-windows-copilot-and-dev-home/, May 23, 2023. 21. Auto-GPT. https://github.com/Significant-Gravitas/Auto-GPT 22. Haotian Liu et al.: Visual Instruction Tuning. CoRR abs/2304.08485 (2023) 23. Alec Radford et al.: Learning Transferable Visual Models From Natural Language Supervision. ICML 2021: 8748-8763 24. Deyao Zhu et al.: MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models. CoRR abs/2304.10592 (2023) 25. Junnan Li et al.: BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models. CoRR abs/2301.12597 (2023) 26. UC Berkeley, CMU, Stanford, MBZUAI, and UC San Diego: Vicuna: An Open-Source Chatbot Impressing GPT-4 with 90%* ChatGPT Quality, https://vicuna.lmsys.org/, 2023/03/19 27. Guanzhi Wang et al.: Voyager: An Open-Ended Embodied Agent with Large Language Models. CoRR abs/2305.16291 28. Peter Shaw et al.: From Pixels to UI Actions: Learning to Follow Instructions via Graphical User Interfaces. CoRR abs/2306.00245 (2023) 29. Rohit Girdhar et al.: ImageBind: One Embedding Space To Bind Them All. CoRR abs/2305.05665 (2023) 30. Hang Zhang et al.: Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding. CoRR abs/2306.02858 (2023) 41 参考⽂献