[論文紹介][ACL2014]Don't count, predict! #gunosydm

Slide 1

Slide 1 text

[論文紹介] Don’t count, predict! A systema7c comparison of context-‐coun7ng vs. context-‐predic7ng seman7c vector Marco Baroni, Georgiana Dinu, German Kruszewski (Center for Mind/Brain Sciences(University of Trento, Italy)) ACL 2014 Yoshifumi Seki 2014.09.02 @Gunosy研究会

Slide 2

Slide 2 text

概要 •  word2vecなどのようなcontext-‐predic7ng modelが流行しているが，それらとよく知られている頻度ベースのベクトルモデルを体系的に比較することは行われていない •  様々なlexical seman7cs taskをいろんなパラメータを用いて評価した •  結果としてcontext-‐predic7ng modelがcount-‐ based modelより優れた結果を示した

Slide 3

Slide 3 text

distribu7onal seman7c models(DSMs) •  Using vector that keep track of context •  for decades, raw co-‐occurrence counts don’t work that well •  archive higher performance, when various transforma7on are applied to the raw vector. –  ex. by reweigh7ng the count for context informa7veness and smoothing them with dimensionality reduc7on techniques. –  this vector op7miza7on process is generally unsupervised

Slide 4

Slide 4 text

new genera7on of DSMs •  The last few years have seen the development of a new genera7on of DSMs – the vector directly es7ma7on problem •  the weights in a word vector are set to maximize the probability of the contexts in the corpus

Slide 5

Slide 5 text

new genera7on DSMs is aYrac7ve! •  replaces the essen7ally heuris7c stacking of vector transforms •  no manual annota7on cost •  some of the relevant methods can eﬃciently scale up to process very large amounts of input data

Slide 6

Slide 6 text

Dataset •  a corpus of about 2.8 billion tokens – ukWaC – English Wikipedia – Bri7sh Na7onal Corpus •  the top 300k most frequent words in the corpus as target and context elements

Slide 7

Slide 7 text

Count models •  using DISSECT tookit –  hYp://clic.cimec.unitn.it/composes/toolkit/ •  count vectors from symmetric context windows of two and ﬁve words •  two weigh7ng scheme –  pointwise mutual informa7on(PMI) –  Local mutual Informa7on •  full and compressed vectors –  Singular Value Decomposi7on –  Non-‐nega7ve Matrix Factoriza7on –  ranging 200 to 500 in steps of 100 •  In total, 36 count model were evaluated.

Slide 8

Slide 8 text

Predict models •  using word2vec toolkit •  context windows of 2 and 5 words •  vector dimensionality 200 to 500 range in steps of 100 •  k: number of nega7ve samples –  5 and 10 •  t: words that occur with higher frequency than t are aggressively subsampled –  without subsampled –  t = exp(-‐5) •  we evaluate 48 predict models

Slide 9

Slide 9 text

Out-‐of-‐the-‐box models •  Distribu7onal Memory(dm) –  Using “linguis7cally rich” data –  Baroni and Lenci (2010) –  hYp://clic.cimec.unitn.it/dm/ •  Collobert and Weston vectors(cw) –  100 dimensional vectors trained for two months on the wikipedia –  The vector were trained to op7mize the task of choosing the right word over a random alterna7ve in middle of an 11 word context window –  Collobert et al. (2011) –  hYp://ronan.collobert.com/senna/

Slide 10

Slide 10 text

Evalua7on Materials •  Seman7c relatedness •  Synonym detec7on •  Concept categoriza7on •  Selec7on preferences •  Analogy

Slide 11

Slide 11 text

Seman7c relatedness •  Rate the degree of seman7c similarity or relatedness between two words on numerical scale •  Rubenstein and Goodenough(1965)(rg) –  Consists of 65 noun pairs –  state of the art: Hassan and Mihalcea(2011) •  Exploits wikipedia linking structure and word sense disambigua7on technique •  WordSim353(ws) –  Finkelstein et al.(2002) –  Consists of 353 pairs –  State of the art: Halawi et al.(2014) •  Predict models using WordNet •  Agirre et al.(2009) split ws set into similarity(wss) and relatedness(wsr) •  MEN(men) –  1000 word pairs –  Bruni et al.(2014)

Slide 12

Slide 12 text

Synonym detec7on •  TOEFL set •  80 mul7ple-‐choices ques7on that pair a target term with 4 synonym candidate •  Bullinaria and Levy(2012) archive 100% accuracy

Slide 13

Slide 13 text

Concept categoriza7on •  The task is to group nominal concepts into natural categories. •  Using CLUTO toolkit –  hYp://glaros.dtc.umn.edu/gkhome/views/cluto •  Almuhared-‐Poesio benchmark(ap) –  Almuhared(2006) –  492 concepts into 21 category •  The ESLLI 2008 Distribu7onal Seman7c Workshop shared-‐task set(esslli) –  44 concepts into 6 category

Slide 14

Slide 14 text

Selec7on preferences •  Verb-‐noun pairs that where rated by subject for the typicality of the noun as a subject or object of the verb •  Ulrike Pado(2007) (up) – 211 pairs •  Macrae – 100 noun-‐verb pairs

Slide 15

Slide 15 text

Analogy •  Speciﬁcally to test predict model •  9K seman7cs and 10.5K syntac7c analogy ques7on –  Example •  Brother-‐sister, Grandson-‐? –  Ans: granddaughter •  work-‐works, spreak-‐? –  Ans: speaks •  En7re dataset (an) –  Syntac7c subset(ansyn) –  Seman7c subset(ansem)

Slide 16

Slide 16 text

Result

Slide 17

Slide 17 text

パラメータの選び方にカウントモデルは強く依存する

Slide 18

Slide 18 text

Conclusion •  全体として見た時にCount modelよりpredict modelのほうがよい – 精度が全体的に高い – パラメータ設定・データセットの違いに対してロバストである •  Seman7cs, synonym両方に強い