Athiwaratkun, Wilson, Anandkumar - ACL 2018 - Probabilistic FastText for Multi-Sense Word Embeddings

Probabilistic FastText for Multi-Sense Word Embeddings Ben Athiwaratkun, Andrew Gordon
Wilson, Anima Anandkumar ACL 2018 : Tosho Hirasawa Tokyo Metropolitan University Komachi Lab

1. Overview • FastText +-$5)8'( / 9%0#, ).05
• 36& • "4 30 • 3027% !1*

2. Word Embedding • x km • x s G
d kmN • x B kmO LT n • xh l km • B e T Ea c h t • F gi rpM E W Vo • ,- 2 5 ( 2 3 ),- 3 B0 CB

3. Background: a 11, • , 1, 4 , 5
, • [ b ] D ] b • ] 9b • a H ] [ b H • 9b ] D [ b

3. Background: NNLM [Bengio+, 2003] • Neural Network Language Model
• NN • " • • Brown !

3. Background: CBOW [Milkov+, 2013a] • CBOW: Continuous bag-of-words •
NNLM (2>@&6) • NNLM7 • B= :'<8#$ • 1A<8*4/? • . skip-gram Word2Vec ; • 0 skip-gram • "51 skip-gram -, "!% 351+9 => .C

3. Background: skip-gram [Milkov+, 2013b] • #0) '40)%1
• "$ • /* • Hierarchical Softmax • Negative Sampling • (,"$ • Subsampling of Frequent Words • 235)+.& - / 5!

3. Background: GloVe [Pennington+, 2014] • GloVe: Global Vector for
Word Embeddings • ! " • • !#

3. Background: ELMo [Peters, 2018] • ELMo: Embeddings from Language
Models • L , bi-LSTM (#)* • • &, bi-LSTM % • '$ "$!+

3. Background: FastText [Bojanowski, 2018] • skip-gram with negative sampling
subword-level +) • .!& s #!&SGNS • !& n-gram *'- $% • G_w = {2, …, G}, G = 2, … 6 • ",( g n-gram

3. Background • *04 • Word2Gaussian [Vilnis+, 2014], Word2GaussianMixture [Athiwaratkun+,
2018] • W2GWord Embedding &36!% • W2GMW2G +7$) +.06"'- • Subword-level Probabilistic Word Embedding 12( • Subword-level: /8 05,0# • Probabilistic: +.0# Dictionary-level Subword-level Determinative CBOW Skip-gram GloVe ELMo FastText Probabilistic Word2Gaussian Word2GaussianMixture Probabilistic FastText

4. Proposed Model: Probabilistic FastText • K- r o p
• Probabilistic Subword Representation • p G p x g N • i n-gram • e i=1 N o b • _ i=2,…,K i b , d p

4. Proposed Model: Probabilistic FastText • Hirbelt
• ξ partial energy • 1 f=rock, g=pop partial energy

4. Proposed Model: Probabilistic FastText • % • Mikolov+, 2013b
& negative sampling • U unigram * • )-% • ,% +!#")- • K = 1 )- '( $(

5. Experiment • Word Similarity Dataset -!($.* 2, •
• UKWAC, WACKYPEDIA (EN), FRWAC (FR), DEWAC (DE), ITWAC (IT) • • EN % 4 +0 3' • ! • 7& • )5/ • 6# • "8 *: K = 2 • 91: l = 10 • subsampling thres: t = 10^-5 • n-gram: n = 3, 4, 5, 6

6. Results: Nearest Neighbors 1 : PFT-GM (K=2) PFT-G (K=1)
0

6. Results: Word Similarity Dataset

6. Results: Word Similarity Dataset
FastText/W2G/M

6. Results: Multi-Prototype Models • SCWS Dataset • •
Dim=300 SOTA • … • NEELAKANTAN skip-gram

6. Result: FR, DE, IT • •

6. Result: Subword Decomposition • #1% -& • subword
3'(" subword - • #1% . $ • 0/ top-5 / bottom-5 • abnormality / abnormal • abnorm 0/ • , autobiographer2)! circumenavigations *+ hypersensitivity

6. Result: #=6,. • K = 2 AB •
K > 2 +2/!% [Athiwaratkun and Wilson, 2017] • K = 1 $"?.*;F • (“cell”, “jail”), (“cell”, “biology”), (“cell”, “phone”) • 058 #>(-491"(- [Arora+, 2016] ): • 3D @&7E' <C'7

7. Conclusion • A8#&4C .% 3 • +9(0;6:B' •
.% 17 • " /D, "=*- • 5E'? multi-prototype embedding • Further Works • ;6"=)< $!>@,,2 • multi-prototype multi-lingual embedding

8. UL+DK • WT'%)<>?GI • =FH"&$#(*E7 • /1N =FH,P6:4Q8;
• ELMo CV.9 -2B • C20MS3: • 5JRO@ A!

Athiwaratkun, Wilson, Anandkumar - ACL 2018 - P...

Athiwaratkun, Wilson, Anandkumar - ACL 2018 - Probabilistic FastText for Multi-Sense Word Embeddings

tosho

More Decks by tosho

Other Decks in Research

Featured

Transcript

Probabilistic FastText for Multi-Sense Word Embeddings Ben Athiwaratkun, Andrew Gordon

1. Overview • FastText +-$5)8'( / 9%0#, ).05

2. Word Embedding • x km • x s G

3. Background: a 11, • , 1, 4 , 5

3. Background: NNLM [Bengio+, 2003] • Neural Network Language Model

3. Background: CBOW [Milkov+, 2013a] • CBOW: Continuous bag-of-words •

3. Background: skip-gram [Milkov+, 2013b] • #0) '40)%1

3. Background: GloVe [Pennington+, 2014] • GloVe: Global Vector for

3. Background: ELMo [Peters, 2018] • ELMo: Embeddings from Language

3. Background: FastText [Bojanowski, 2018] • skip-gram with negative sampling

3. Background • *04 • Word2Gaussian [Vilnis+, 2014], Word2GaussianMixture [Athiwaratkun+,

4. Proposed Model: Probabilistic FastText • K- r o p

4. Proposed Model: Probabilistic FastText • Hirbelt

4. Proposed Model: Probabilistic FastText • % • Mikolov+, 2013b

5. Experiment • Word Similarity Dataset -!($.* 2, •

6. Results: Nearest Neighbors 1 : PFT-GM (K=2) PFT-G (K=1)

6. Results: Word Similarity Dataset

6. Results: Word Similarity Dataset

6. Results: Multi-Prototype Models • SCWS Dataset • •

6. Result: FR, DE, IT • •

6. Result: Subword Decomposition • #1% -& • subword

6. Result: #=6,. • K = 2 AB •

7. Conclusion • A8#&4C .% 3 • +9(0;6:B' •

8. UL+DK • WT'%)<>?GI • =FH"&$#(*E7 • /1N =FH,P6:4Q8;