[論文紹介][ACL2014]Don't count, predict! #gunosydm

[論文紹介] Don’t count, predict! A systema7c comparison
of context-‐coun7ng vs. context-‐predic7ng seman7c vector Marco Baroni, Georgiana Dinu, German Kruszewski (Center for Mind/Brain Sciences(University of Trento, Italy)) ACL 2014 Yoshifumi Seki 2014.09.02 @Gunosy研究会

概要 •  word2vecなどのようなcontext-‐predic7ng modelが流行しているが，それらとよく知られている頻度ベースのベクトルモデルを体系的に比較することは行われていない •  様々なlexical
seman7cs taskをいろんなパラメータを用いて評価した •  結果としてcontext-‐predic7ng modelがcount-‐ based modelより優れた結果を示した

distribu7onal seman7c models(DSMs) •  Using vector that keep track
of context •  for decades, raw co-‐occurrence counts don’t work that well •  archive higher performance, when various transforma7on are applied to the raw vector. –  ex. by reweigh7ng the count for context informa7veness and smoothing them with dimensionality reduc7on techniques. –  this vector op7miza7on process is generally unsupervised

new genera7on of DSMs •  The last few years have
seen the development of a new genera7on of DSMs – the vector directly es7ma7on problem •  the weights in a word vector are set to maximize the probability of the contexts in the corpus

new genera7on DSMs is aYrac7ve! •  replaces the essen7ally heuris7c
stacking of vector transforms •  no manual annota7on cost •  some of the relevant methods can eﬃciently scale up to process very large amounts of input data

Dataset •  a corpus of about 2.8 billion tokens
– ukWaC – English Wikipedia – Bri7sh Na7onal Corpus •  the top 300k most frequent words in the corpus as target and context elements

Count models •  using DISSECT tookit –  hYp://clic.cimec.unitn.it/composes/toolkit/
•  count vectors from symmetric context windows of two and ﬁve words •  two weigh7ng scheme –  pointwise mutual informa7on(PMI) –  Local mutual Informa7on •  full and compressed vectors –  Singular Value Decomposi7on –  Non-‐nega7ve Matrix Factoriza7on –  ranging 200 to 500 in steps of 100 •  In total, 36 count model were evaluated.

Predict models •  using word2vec toolkit •  context windows
of 2 and 5 words •  vector dimensionality 200 to 500 range in steps of 100 •  k: number of nega7ve samples –  5 and 10 •  t: words that occur with higher frequency than t are aggressively subsampled –  without subsampled –  t = exp(-‐5) •  we evaluate 48 predict models

Out-‐of-‐the-‐box models •  Distribu7onal Memory(dm) –  Using “linguis7cally rich”
data –  Baroni and Lenci (2010) –  hYp://clic.cimec.unitn.it/dm/ •  Collobert and Weston vectors(cw) –  100 dimensional vectors trained for two months on the wikipedia –  The vector were trained to op7mize the task of choosing the right word over a random alterna7ve in middle of an 11 word context window –  Collobert et al. (2011) –  hYp://ronan.collobert.com/senna/

Evalua7on Materials •  Seman7c relatedness •  Synonym detec7on
•  Concept categoriza7on •  Selec7on preferences •  Analogy

Seman7c relatedness •  Rate the degree of seman7c similarity or
relatedness between two words on numerical scale •  Rubenstein and Goodenough(1965)(rg) –  Consists of 65 noun pairs –  state of the art: Hassan and Mihalcea(2011) •  Exploits wikipedia linking structure and word sense disambigua7on technique •  WordSim353(ws) –  Finkelstein et al.(2002) –  Consists of 353 pairs –  State of the art: Halawi et al.(2014) •  Predict models using WordNet •  Agirre et al.(2009) split ws set into similarity(wss) and relatedness(wsr) •  MEN(men) –  1000 word pairs –  Bruni et al.(2014)

Synonym detec7on •  TOEFL set •  80 mul7ple-‐choices ques7on
that pair a target term with 4 synonym candidate •  Bullinaria and Levy(2012) archive 100% accuracy

Concept categoriza7on •  The task is to group nominal concepts
into natural categories. •  Using CLUTO toolkit –  hYp://glaros.dtc.umn.edu/gkhome/views/cluto •  Almuhared-‐Poesio benchmark(ap) –  Almuhared(2006) –  492 concepts into 21 category •  The ESLLI 2008 Distribu7onal Seman7c Workshop shared-‐task set(esslli) –  44 concepts into 6 category

Selec7on preferences •  Verb-‐noun pairs that where rated by subject
for the typicality of the noun as a subject or object of the verb •  Ulrike Pado(2007) (up) – 211 pairs •  Macrae – 100 noun-‐verb pairs

Analogy •  Speciﬁcally to test predict model •  9K
seman7cs and 10.5K syntac7c analogy ques7on –  Example •  Brother-‐sister, Grandson-‐? –  Ans: granddaughter •  work-‐works, spreak-‐? –  Ans: speaks •  En7re dataset (an) –  Syntac7c subset(ansyn) –  Seman7c subset(ansem)

Result

パラメータの選び方にカウントモデルは強く依存する

Conclusion •  全体として見た時にCount modelよりpredict modelのほうがよい – 精度が全体的に高い – パラメータ設定・データセットの違いに対してロバ
ストである •  Seman7cs, synonym両方に強い

[論文紹介][ACL2014]Don't count, predict! #gunosydm

[論文紹介][ACL2014]Don't count, predict! #gunosydm

ysekky

More Decks by ysekky

Other Decks in Research

Featured

Transcript

[論文紹介] Don’t count, predict! A systema7c comparison

概要 •  word2vecなどのようなcontext-‐predic7ng modelが流行しているが，それらとよく知られている頻度ベースのベクトルモデルを体系的に比較することは行われていない •  様々なlexical

distribu7onal seman7c models(DSMs) •  Using vector that keep track

new genera7on of DSMs •  The last few years have

new genera7on DSMs is aYrac7ve! •  replaces the essen7ally heuris7c

Dataset •  a corpus of about 2.8 billion tokens

Count models •  using DISSECT tookit –  hYp://clic.cimec.unitn.it/composes/toolkit/

Predict models •  using word2vec toolkit •  context windows

Out-‐of-‐the-‐box models •  Distribu7onal Memory(dm) –  Using “linguis7cally rich”

Evalua7on Materials •  Seman7c relatedness •  Synonym detec7on

Seman7c relatedness •  Rate the degree of seman7c similarity or

Synonym detec7on •  TOEFL set •  80 mul7ple-‐choices ques7on

Concept categoriza7on •  The task is to group nominal concepts

Selec7on preferences •  Verb-‐noun pairs that where rated by subject

Analogy •  Speciﬁcally to test predict model •  9K

Result

パラメータの選び方にカウントモデルは強く依存する

Conclusion •  全体として見た時にCount modelよりpredict modelのほうがよい – 精度が全体的に高い – パラメータ設定・データセットの違いに対してロバ