Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Tell-and-Answer: Towards Explainable Visual Que...
Search
Sponsored
·
Your Podcast. Everywhere. Effortlessly.
Share. Educate. Inspire. Entertain. You do you. We'll handle the rest.
→
onizuka laboratory
December 18, 2018
Research
0
72
Tell-and-Answer: Towards Explainable Visual Question Answering using Attributes and Captions
弊研究室で行なったEMNLP2018読み会の発表資料です。
onizuka laboratory
December 18, 2018
Tweet
Share
More Decks by onizuka laboratory
See All by onizuka laboratory
Phrase-Based & Neural Unsupervised Machine Translation
onilab
0
120
Card-660: A Reliable Evaluation Framework for Rare Word Representation Models
onilab
0
37
A Word-Complexity Lexicon and A Neural Readability Ranking Model for Lexical Simplification
onilab
0
130
Integrating Transformer and Paraphrase Rules for Sentence Simplification
onilab
0
61
An Auto-Encoder Matching Model for Learning Utterance-Level Semantic Dependency in Dialogue Generation
onilab
0
57
Generating More Interesting Responses in Neural Conversation Models with Distributional Constraints
onilab
0
100
Modeling Multi-turn Conversation with Deep Utterance Aggregation
onilab
0
98
Learning Semantic Sentence Embeddings using Pair-wise Discriminator
onilab
0
120
SGM: Sequence Generation Model for Multi-Label Classification
onilab
0
80
Other Decks in Research
See All in Research
さまざまなAgent FrameworkとAIエージェントの評価
ymd65536
1
420
J-RAGBench: 日本語RAGにおける Generator評価ベンチマークの構築
koki_itai
0
1.3k
SkySense V2: A Unified Foundation Model for Multi-modal Remote Sensing
satai
3
500
AI in Enterprises - Java and Open Source to the Rescue
ivargrimstad
0
1.1k
ウェブ・ソーシャルメディア論文読み会 第36回: The Stepwise Deception: Simulating the Evolution from True News to Fake News with LLM Agents (EMNLP, 2025)
hkefka385
0
160
【SIGGRAPH Asia 2025】Lo-Fi Photograph with Lo-Fi Communication
toremolo72
0
120
自動運転におけるデータ駆動型AIに対する安全性の考え方 / Safety Engineering for Data-Driven AI in Autonomous Driving Systems
ishikawafyu
0
130
社内データ分析AIエージェントを できるだけ使いやすくする工夫
fufufukakaka
1
900
LiDARセキュリティ最前線(2025年)
kentaroy47
0
140
SREのためのテレメトリー技術の探究 / Telemetry for SRE
yuukit
13
3k
【NICOGRAPH2025】Photographic Conviviality: ボディペイント・ワークショップによる 同時的かつ共生的な写真体験
toremolo72
0
170
[Devfest Incheon 2025] 모두를 위한 친절한 언어모델(LLM) 학습 가이드
beomi
2
1.4k
Featured
See All Featured
Testing 201, or: Great Expectations
jmmastey
46
8.1k
Art, The Web, and Tiny UX
lynnandtonic
304
21k
Practical Tips for Bootstrapping Information Extraction Pipelines
honnibal
25
1.7k
Pawsitive SEO: Lessons from My Dog (and Many Mistakes) on Thriving as a Consultant in the Age of AI
davidcarrasco
0
67
Rebuilding a faster, lazier Slack
samanthasiow
85
9.4k
Leveraging Curiosity to Care for An Aging Population
cassininazir
1
170
Marketing to machines
jonoalderson
1
4.6k
The B2B funnel & how to create a winning content strategy
katarinadahlin
PRO
1
280
Un-Boring Meetings
codingconduct
0
200
The SEO identity crisis: Don't let AI make you average
varn
0
330
Winning Ecommerce Organic Search in an AI Era - #searchnstuff2025
aleyda
1
1.9k
YesSQL, Process and Tooling at Scale
rocio
174
15k
Transcript
Tell-and-Answer: Towards Explainable Visual Question Answering using Attributes
and Captions Q. Li, J. Fu, D. Yu et al. EMNLP 2018 20181218
VQA CNN RNN ;* end-to-end !" → <2AE+H
C0 A9#5 2 4G' C0 <2I!".(AE)= >'1 %FI6?B,?:$D/-8 3@ VQA end-to-end &7 2 4G' <2AE)= 1
Visual Q&A Q: where is
the man swinging the racket? A: tennis court 2
Visual Q&A Q: what kind
of drink is in the glass? A: water 3
Visual Q&A Q: what is
walking next to the bus? A: cow 4
Visual Q&A Q: does the
man need a haircut? A: yes 5
Tell-and-Answer: Towards Explainable Visual Question Answering using Attributes and
Captions 6
"# where is the man swinging
the racket? yes no water tennis court ⋮ CNN RNN $ ! 7
7# 9+ D >(. <& D (CB D *
"6* D -2,4 end-to-end 2 3A $% @8181;?':/4-2 0),4 =C !581;?B 8
Tell-and-Answer: Towards Explainable Visual Question Answering using Attributes and
Captions 9
( ,# &+ Ø#'! Ø#(" ) ,( %*&$ ! Ø'
ver. Ø( ver. Ø ver. ( ) 10
11
@,→H3L0 !%$ %)=1M ;F$ % @, I87 H3L0 ResNet152
'#(' .? ØBAK6 ED "&'&( .?9 /$ % -NC> .? G+ ;F$ % cos N*2 J17 H3L0 cos N*2.? 5:4< H3)= 12
/ →3*) / ResNet152 LSTM 1 .%0
1', (e.g. BLEU) 4") .%) 2$5! cos 6# (+&- / 3*).% 13
7*='?8.-?9(-→64 #2(> 8.-9(- LSTM !" Ø LSTM &7!!)<%/
;5 $5') 3: softmax ,0+1 64#2 14
Tell-and-Answer: Towards Explainable Visual Question Answering using Attributes and
Captions 15
-VQA-real Ø1 & 3 ,1 10
)', *!#-min(#)*+,-. /010-/ 2),2 ,-.345 6 , 1) Ø%)' 10 " 3 ( $) + * 16
VQA vs.
17
vs.
18
VQA *!-# +$ .% -#"0,)( 0'/ Ø& -#" Ø
-#" ØNULL 1 -# $ 19
* '" Ø+! & )%
Ø+! & )% Ø+! & )% Ø+! & )% $ ,# - +! ( VQA & 20
tennis, ball, man, racket, hit, court, play, player,
swing, hold a man holding a tennis racket on a tennis court. tennis court & Q: where is the man swinging the racket? A: tennis court 21
bicycle, man, sit, eat, bike, look, outside, food,
person, table a man sitting at a table with a plate of food. beer & Q: what kind of drink is in the glass? A: water 22
street, bus, cow, city, walk, car, drive, stand,
road, white a cow that is walking in the street. car & Q: what is walking next to the bus? A: cow 23
woman, bear, teddy, hold, sit, glass, animal, large,
lady a woman holding a sandwich in her hands. yes & Q: does the man need a haircut? A: yes 24
30%
65% yes/no 80% 25
VQA 26
Tell-and-Answer: Towards Explainable Visual Question Answering using Attributes and
Captions 27
>1=9 5 2 3B# VQA #2 Ø7!6*A-"?&.% / Ø7!
<+; 0',4:(C $ 8 VQA =@ ) = 28