2016/10/28 @ Kasuga Area himkt <[email protected]> all figures are extracted from https://www.aclweb.org/anthology/P/P16/P16-1226.pdf Vered Shwartz Yoav Goldberg Ido Dagan ( Bar-Ilan University )
(LSTM) to encode dependency-path features • Use word-embeddings as word features (GloVe) • Ability to take into account both distributional and dependency-path information • Comparable result to state-of-the-art (Path-based LSTM only) • Significantly improved upon the state-of-the-art ! (Integrated method) 2
• given words x and y, decide whether x is hyponym of y (isa(x, y)) • distributional vector are commonly used • Path-based method (Snow+[4], Nakashole+[5]) • hyponymy relations are expressed by dependency paths 3
pos: part of speech, dep: dependency label, dir: dependency direction lemma: GloVe others including out-of vocabulary lemma: Randomly initialize • Path representation • are composed of (where are edges) • : a vector from LSTM encoder 5 op p ve1 , ve2 , . . . , vek e1, e2, . . . , ek v e = [v l , v pos , v dep , v dir ]
of p in paths(x,y) • ɹ: LSTM output • Integrated network • : word embedding of x, y (GloVe) • Common classifier: 6 vxy = vpaths ( x,y ) = P p 2 paths ( x,y ) fp, ( x,y ) · op P p 2 paths ( x,y ) fp, ( x,y ) v xy = ⇥ v wx , v paths ( x,y ) , v wy ⇤ f p, ( x,y ) op v x , v y c = softmax ( W · vxy )
use of an already existing database • WordNet, DBPedia, Wikidata, Yago • Corpus to train LSTM path encoder • wikipedia dump (May 2015) • dependency parser: spaCy 8
state-of-the-art (path-based) • Improve state-of-the-art (combined) • And there were some lexical memorization • the score of lexical split are worse than that of random split • this means some kind of overfit are occurred (called lexical memorization) 9
Shan. 2012. Entailment above the word level in distributional semantics. In EACL, pages 23–32. [2] Stephen Roller, Katrin Erk, and Gemma Boleda. 2014. Inclusive yet selective: Supervised distributional hypernymy detection. In COLING, pages 1025–1036 [3] Julie Weeds, Daoud Clarke, Jeremy Reffin, David Weir, and Bill Keller. 2014. Learning to distinguish hypernyms and co-hyponyms. In COLING, pages 2249–2259. [4] Rion Snow, Daniel Jurafsky, and Andrew Y Ng. 2004. Learning syntactic patterns for automatic hypernym discovery. In NIPS. [5] Ndapandula Nakashole, Gerhard Weikum, and Fabian Suchanek. 2012. Patty: a taxonomy of relational patterns with semantic types. In EMNLP and CoNLL, pages 1135– 1145. 11