Athiwaratkun, Wilson, Anandkumar - ACL 2018 - Probabilistic FastText for Multi-Sense Word Embeddings

F16d24f8c3767910d0ef9dd3093ae016?s=47 tosho
November 26, 2018

Athiwaratkun, Wilson, Anandkumar - ACL 2018 - Probabilistic FastText for Multi-Sense Word Embeddings

Probabilistic FastText for Multi-Sense Word Embeddings

ACL Anthology:
http://aclweb.org/anthology/P18-1001

F16d24f8c3767910d0ef9dd3093ae016?s=128

tosho

November 26, 2018
Tweet

Transcript

  1. Probabilistic FastText for Multi-Sense Word Embeddings Ben Athiwaratkun, Andrew Gordon

    Wilson, Anima Anandkumar ACL 2018 : Tosho Hirasawa Tokyo Metropolitan University Komachi Lab
  2. 1. Overview • FastText +-$5)8'( / 9%0#, ).05  

    •  36& • "4 30  • 3027% !1*
  3. 2. Word Embedding • x km • x s G

    d kmN • x B kmO LT n • xh l km • B e T Ea c h t • F gi rpM E W Vo • ,- 2 5 ( 2 3 ),- 3 B0 CB
  4. 3. Background: a 11, • , 1, 4 , 5

    , • [ b ] D ] b • ] 9b • a H ] [ b H • 9b ] D [ b
  5. 3. Background: NNLM [Bengio+, 2003] • Neural Network Language Model

    • NN  • "   •   • Brown   !
  6. 3. Background: CBOW [Milkov+, 2013a] • CBOW: Continuous bag-of-words •

    NNLM (2>@&6)  • NNLM7 • B= :'<8#$  • 1A<8*4/? • . skip-gram   Word2Vec ; • 0 skip-gram • "51 skip-gram -,  "!% 351+9  => .C
  7. 3. Background: skip-gram [Milkov+, 2013b] • #0)  '40)%1 

    • "$ •  /* • Hierarchical Softmax • Negative Sampling • (,"$ • Subsampling of Frequent Words • 235)+.& - / 5!
  8. 3. Background: GloVe [Pennington+, 2014] • GloVe: Global Vector for

    Word Embeddings • ! "  •  •      !#
  9. 3. Background: ELMo [Peters, 2018] • ELMo: Embeddings from Language

    Models • L , bi-LSTM (#)* •   • &, bi-LSTM  % •  '$ "$!+
  10. 3. Background: FastText [Bojanowski, 2018] • skip-gram with negative sampling

     subword-level +) • .!& s #!&SGNS • !& n-gram   *'- $%  • G_w = {2, …, G}, G = 2, … 6 • ",( g  n-gram  
  11. 3. Background • *04 • Word2Gaussian [Vilnis+, 2014], Word2GaussianMixture [Athiwaratkun+,

    2018] • W2GWord Embedding &36!%  • W2GMW2G +7$) +.06"'- • Subword-level  Probabilistic  Word Embedding 12( • Subword-level: /8 05,0# • Probabilistic: +.0# Dictionary-level Subword-level Determinative CBOW Skip-gram GloVe ELMo FastText Probabilistic Word2Gaussian Word2GaussianMixture Probabilistic FastText
  12. 4. Proposed Model: Probabilistic FastText • K- r o p

    • Probabilistic Subword Representation • p G p x g N • i n-gram • e i=1 N o b • _ i=2,…,K i b , d p
  13. 4. Proposed Model: Probabilistic FastText •  Hirbelt  

    • ξ  partial energy •    1  f=rock, g=pop   partial energy
  14. 4. Proposed Model: Probabilistic FastText • % • Mikolov+, 2013b

    &  negative sampling  • U unigram * • )-% • ,% +!#")-  • K = 1 )-   '( $(
  15. 5. Experiment • Word Similarity Dataset -!($.* 2, • 

    • UKWAC, WACKYPEDIA (EN), FRWAC (FR), DEWAC (DE), ITWAC (IT) •   • EN  % 4 +0 3' • ! • 7& • )5/ • 6#  • "8 *: K = 2 • 91: l = 10 • subsampling thres: t = 10^-5 • n-gram: n = 3, 4, 5, 6
  16. 6. Results: Nearest Neighbors 1 : PFT-GM (K=2) PFT-G (K=1)

    0
  17. 6. Results: Word Similarity Dataset

  18. 6. Results: Word Similarity Dataset     

      FastText/W2G/M 
  19. 6. Results: Multi-Prototype Models • SCWS Dataset •  •

    Dim=300 SOTA •    … •  NEELAKANTAN skip-gram 
  20. 6. Result: FR, DE, IT •   • 

    
  21. 6. Result: Subword Decomposition • #1% -& • subword 

      3'(" subword -  • #1% . $ • 0/ top-5 / bottom-5 • abnormality / abnormal •  abnorm 0/ • , autobiographer2)!  circumenavigations *+   hypersensitivity
  22. 6. Result: #=6,. • K = 2  AB •

    K > 2 +2/!% [Athiwaratkun and Wilson, 2017] • K = 1 $"?.*;F • (“cell”, “jail”), (“cell”, “biology”), (“cell”, “phone”) • 058  #>(-491"(- [Arora+, 2016] ): • 3D @&7E' <C'7
  23. 7. Conclusion • A8#&4C .% 3 • +9(0;6:B'  •

    .% 17  • "  /D, "=*- • 5E'? multi-prototype embedding • Further Works • ;6"=)< $!>@,,2 • multi-prototype multi-lingual embedding
  24. 8. UL+DK • WT'%)<>?GI  • =FH"&$#(*E7 • /1N =FH,P6:4Q8;

    • ELMo CV.9 -2B • C20MS3:   • 5JRO@ A!