UNDERSTAND WHICH PYRAMID YOU MEAN. (「PYRAMID」についての説明要求) Person: FIND A BLOCK WHICH IS TALLER THAN THE ONE YOU ARE HOLDING AND PUT IT INTO THE BOX. (説明の追加) Computer: BY “IT”, I ASSUME YOU MEAN THE BLOCK WHICH IS TALLER THAN THE ONE I AM HOLDING. (「IT」が示す内容の確認) Computer: OK. (ユーザから返答が無かったので確認内容を承認) Person: WHAT DOES THE BOX CONTAIN? (質問) Computer: THE BLUE PYRAMID AND THE BLUE BLOCK. (質問に対する回答) SHRDLU [Winograd, 1972] 機械が自然言語を理解して、人間と協調して知的な作業をしてほしいという研究の草分け • 環境は、パターン(形や色、位置)の限定的な積み木の世界 • 自然言語理解部は人手で作りこまれている(例えば、”the ◦◦”, “it”に対応するために 前の発話で言及された物体名を一時記憶するモジュールなどが用意されている) 10/31
WHICH PYRAMID YOU MEAN. (「PYRAMID」についての説明要求) Person: FIND A BLOCK WHICH IS TALLER THAN THE ONE YOU ARE HOLDING AND PUT IT INTO THE BOX. (説明の追加) Computer: BY “IT”, I ASSUME YOU MEAN THE BLOCK WHICH IS TALLER THAN THE ONE I AM HOLDING. (「IT」が示す内容の確認) Computer: OK. (ユーザから返答が無かったので確認内容を承認) Person: WHAT DOES THE BOX CONTAIN? (質問) Computer: THE BLUE PYRAMID AND THE BLUE BLOCK. (質問に対する回答) 情報の不足を検出して 聞き返す能力 複数候補があった時に 仮説を確認する能力 ユーザの指示文を理解 して処理を行う能力 質問に答える能力 与えられた環境の中で、自然言語による情報伝達を 行いながら処理を行うために必要な能力が示されて いる 11/31
body and facing left and body is under water” Text-to-imageのモデルで生成した結果 “this bird has yellow beak and is facing left and long brown neck and black body most of which is under the water” “test” どう入力したらどういう出力が返ってくるか、人間にはよくわからない 何時間も使ってみてようやくコツがわかってくる(これは望ましいといえる?) →人間同士だったら、対話を行って意図の擦り合わせができるのに・・・ 17/31
Huang, Lucy Vanderwende, Jacob Devlin, Michel Galley, Margaret Mitchell. A Survey of Current Datasets for Vision and Language Research. EMNLP2015. [Kafle+, 2019] Kushal Kafle, Robik Shrestha, Christopher Kanan. Challenges and Prospects in Vision and Language Research. ArXiv. [Mogadala+, 2020] Aditya Mogadala, Marimuthu Kalimuthu, Dietrich Klakow. Trends in Integration of Vision and Language Research: A Survey of Tasks, Datasets, and Methods. ArXiv. P.9 [Okada, 1980] Naoyuki Okada. Conceptual taxonomy of Japanese verbs for understanding natural language and picture patterns. COLING1980. [Hiyoshi+, 1994] Mayumi Hiyoshi and Hideo Shimazu. Drawing pictures with natural language and direct manipulation. COLING1994. P.10 [Winograd, 1972] Terry Winograd. Understanding natural language. Cognitive psychology, 3(1):1-191, 1972. 29/31
Anna! Visual Navigation with Natural Multimodal Assistance via Retrospective Curiosity-Encouraging Imitation Learning. EMNLP2019. P.22 [Mogadala+, 2020] Aditya Mogadala, Marimuthu Kalimuthu, Dietrich Klakow. Trends in Integration of Vision and Language Research: A Survey of Tasks, Datasets, and Methods. ArXiv. P.23 [Goyal+, 2017] Yash Goyal, Tejas Khot, Douglas Summers-Stay, Dhruv Batra, Devi Parikh. Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering. CVPR2017. [Massiceti+, 2018] Daniela Massiceti, Puneet K. Dokania, N. Siddharth, Philip H.S. Torr. Visual Dialogue without Vision or Dialogue. NeurIPS2018 workshop. [Das+, 2019] Abhishek Das, Devi Parikh, Dhruv Batra. Response to "Visual Dialogue without Vision or Dialogue" (Massiceti et al., 2018). ArXiv. [Agarwal+, 2020] Shubham Agarwal, Trung Bui, Joon-Young Lee, Ioannis Konstas, Verena Rieser. History for Visual Dialog: Do we really need it? ACL2020. 31/31