Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Multimodal Grounding for Language Processing
Search
onizuka laboratory
October 17, 2018
Research
0
51
Multimodal Grounding for Language Processing
弊研究室で行なったCOLING2018読み会の発表資料です。
onizuka laboratory
October 17, 2018
Tweet
Share
More Decks by onizuka laboratory
See All by onizuka laboratory
Phrase-Based & Neural Unsupervised Machine Translation
onilab
0
110
Tell-and-Answer: Towards Explainable Visual Question Answering using Attributes and Captions
onilab
0
71
Card-660: A Reliable Evaluation Framework for Rare Word Representation Models
onilab
0
33
A Word-Complexity Lexicon and A Neural Readability Ranking Model for Lexical Simplification
onilab
0
120
Integrating Transformer and Paraphrase Rules for Sentence Simplification
onilab
0
59
An Auto-Encoder Matching Model for Learning Utterance-Level Semantic Dependency in Dialogue Generation
onilab
0
55
Generating More Interesting Responses in Neural Conversation Models with Distributional Constraints
onilab
0
100
Modeling Multi-turn Conversation with Deep Utterance Aggregation
onilab
0
95
Learning Semantic Sentence Embeddings using Pair-wise Discriminator
onilab
0
120
Other Decks in Research
See All in Research
Pix2Poly: A Sequence Prediction Method for End-to-end Polygonal Building Footprint Extraction from Remote Sensing Imagery
satai
3
480
[輪講] SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features
nk35jk
2
520
When Submarine Cables Go Dark: Examining the Web Services Resilience Amid Global Internet Disruptions
irvin
0
210
大規模な2値整数計画問題に対する 効率的な重み付き局所探索法
mickey_kubo
1
260
20250502_ABEJA_論文読み会_スライド
flatton
0
170
線形判別分析のPU学習による朝日歌壇短歌の分析
masakat0
0
130
SatCLIP: Global, General-Purpose Location Embeddings with Satellite Imagery
satai
3
220
利用シーンを意識した推薦システム〜SpotifyとAmazonの事例から〜
kuri8ive
1
200
チャッドローン:LLMによる画像認識を用いた自律型ドローンシステムの開発と実験 / ec75-morisaki
yumulab
1
450
時系列データに対する解釈可能な 決定木クラスタリング
mickey_kubo
2
710
2025年度人工知能学会全国大会チュートリアル講演「深層基盤モデルの数理」
taiji_suzuki
24
15k
SSII2025 [SS2] 横浜DeNAベイスターズの躍進を支えたAIプロダクト
ssii
PRO
7
3.5k
Featured
See All Featured
Into the Great Unknown - MozCon
thekraken
39
1.9k
"I'm Feeling Lucky" - Building Great Search Experiences for Today's Users (#IAC19)
danielanewman
229
22k
Rebuilding a faster, lazier Slack
samanthasiow
82
9.1k
Visualization
eitanlees
146
16k
Easily Structure & Communicate Ideas using Wireframe
afnizarnur
194
16k
Creating an realtime collaboration tool: Agile Flush - .NET Oxford
marcduiker
30
2.1k
Large-scale JavaScript Application Architecture
addyosmani
512
110k
A Modern Web Designer's Workflow
chriscoyier
694
190k
Documentation Writing (for coders)
carmenintech
72
4.9k
Building Applications with DynamoDB
mza
95
6.5k
The Web Performance Landscape in 2024 [PerfNow 2024]
tammyeverts
8
680
The Cult of Friendly URLs
andyhume
79
6.5k
Transcript
Multimodal Grounding for Language Processing 20181017 NomotoEriko 1
### # &0 NLP +$ 65#5- Contents: Ø"!
Ø # &0 '8 Ø # 3/*2 Ø # NLP Multimodal Grounding for Language Processing 2 %7. 4( )35- ,1
Multimodal Grounding for Language Processing 3
HF>T NLP A>O… "/(# '*!.@dQaBX g6O _QR>O… KIEWLS 8e^?2JO
Conceptual grounding: ]`QZ3fDZb7P14N: U ↑ S<),%+/$,M=0 ]` [c 4 !.&- YZC;hVZC;h\ZC;h9ZC;GDZ56M=
Multimodal Grounding for Language Processing 5
# 3 - ($ '+ . ,! &" 3
- ,)* (Cross-modal transfer) ,'+ (Cross-modal interpretation) %# (Joint multimodal processing) 6 #-
@>? (Cross-modal transfer) E*D8F &E*A):F -+ 1! 92
C "G Ø07E;=→4# '45F ØB,(.E→B6F ØE< /#→F 7 $3%C
GAF (Cross-modal interpretation) $)=( /,>3D+ G@6 . K &)B"IL?*B"142;<M !N
Ø9#80LHE 5:M Ø LJ&C-5:M 8 %7'K
;$ 3 (Joint multimodal processing) =- %/ F *'
#27 : $(> I Ø+*CDG6B<* "4H Ø@A7E&)8G5? 6B 3H 9 3 !F 09.1,
Multimodal Grounding for Language Processing 10
,(+ 3 0 ,( $ * 3
.- Ø%#,( '" (Concept representations) Ø !'" (Projection) Ø&/),( '" (Compositional representations) 11 ,(+
.)50/( (Concept representations) .)50 7&;.)9 #, '!( Ø<:.)50 6,:1 +50
'!( Ø 2 8 *"3 %- 12 50$4
,*:21) (Concept representations) (&!+.F Ø"6BA7C%#/D Ø↑ “cat” ;<5 “panther”
>@5 “dog” : E 8$ : Ø04 = 3 ?=- 13 :2'9
2394 (Projection) 86>:4 D 5. &(" 86 +$) A &("@0-
(Mapping) /?>:<@ (Joint representation space) 7;86>:4 B*'!#1= ;C 14 %) &,)>:1= ←Mapping ↓ Joint learning
./40 (Projection) !"=)( (Mapping) Ø!"+!" )( !: #$% → #$'
-9 Ø87>13 @6(*? '& %$2<,: maximize sim(#$% , #$' ), minimize sim(#$% , ./0123$' ) 15 #!%#;5-9
-/70 (Projection) &@?8;B (Joint representation space) Ø%#$ :9C2+35A( ü) ?8'6<E1.
F üB 5A =*E1.F 4D 2 B !,> 16 " ?8,>
;NB KA @5 OE*KA6L/- E* I-6L CD=P U$#
< 8>7: Ø( E* RFS3.90&TJ Ø," MG? E*'U 94 Q2 )+% 17 !KA1H
NLP Multimodal Grounding for Language Processing
18
NLP #&$'&,> 957.A=<4 ØD(6' (simLex-999 etc…) 1 7.A=@2 B*
Ø/0E #&$'&7.A=%!$'&- +)?95 8C?95 :; Ø#&$'&7.A=<4" Ø#&$'&7.A=8C?953 19 #&$'& NLP
NLP !%"'% +1 3,47 Ø0) 3, 6/.( Ø0)-# $&
' 2)-,5 ' (imSitu) *8 20 !%"'% NLP imSitu*- (http://imsitu.org/ )
+. )&$ Ø " >< !,% #&$ Ø!,% #'(%
" >< - % Ø* &$ / 21 NLP
Multimodal Grounding for Language Processing 22
NLP NLP " # !
NLP 23