of context-‐coun7ng vs. context-‐predic7ng seman7c vector Marco Baroni, Georgiana Dinu, German Kruszewski (Center for Mind/Brain Sciences(University of Trento, Italy)) ACL 2014 Yoshifumi Seki 2014.09.02 @Gunosy研究会
of context • for decades, raw co-‐occurrence counts don’t work that well • archive higher performance, when various transforma7on are applied to the raw vector. – ex. by reweigh7ng the count for context informa7veness and smoothing them with dimensionality reduc7on techniques. – this vector op7miza7on process is generally unsupervised
seen the development of a new genera7on of DSMs – the vector directly es7ma7on problem • the weights in a word vector are set to maximize the probability of the contexts in the corpus
stacking of vector transforms • no manual annota7on cost • some of the relevant methods can efficiently scale up to process very large amounts of input data
• count vectors from symmetric context windows of two and five words • two weigh7ng scheme – pointwise mutual informa7on(PMI) – Local mutual Informa7on • full and compressed vectors – Singular Value Decomposi7on – Non-‐nega7ve Matrix Factoriza7on – ranging 200 to 500 in steps of 100 • In total, 36 count model were evaluated.
of 2 and 5 words • vector dimensionality 200 to 500 range in steps of 100 • k: number of nega7ve samples – 5 and 10 • t: words that occur with higher frequency than t are aggressively subsampled – without subsampled – t = exp(-‐5) • we evaluate 48 predict models
data – Baroni and Lenci (2010) – hYp://clic.cimec.unitn.it/dm/ • Collobert and Weston vectors(cw) – 100 dimensional vectors trained for two months on the wikipedia – The vector were trained to op7mize the task of choosing the right word over a random alterna7ve in middle of an 11 word context window – Collobert et al. (2011) – hYp://ronan.collobert.com/senna/
relatedness between two words on numerical scale • Rubenstein and Goodenough(1965)(rg) – Consists of 65 noun pairs – state of the art: Hassan and Mihalcea(2011) • Exploits wikipedia linking structure and word sense disambigua7on technique • WordSim353(ws) – Finkelstein et al.(2002) – Consists of 353 pairs – State of the art: Halawi et al.(2014) • Predict models using WordNet • Agirre et al.(2009) split ws set into similarity(wss) and relatedness(wsr) • MEN(men) – 1000 word pairs – Bruni et al.(2014)