Improving Hypernymy Detection with an Integrated Path-based and Distributional Method

Improving Hypernymy Detection with  an Integrated Path-based and Distributional Method
2016/10/28 @ Kasuga Area himkt <[email protected]> all ﬁgures are extracted from https://www.aclweb.org/anthology/P/P16/P16-1226.pdf Vered Shwartz Yoav Goldberg Ido Dagan  ( Bar-Ilan University )

Summary • Neural network to detect hyponymy relation • RNN
(LSTM) to encode dependency-path features • Use word-embeddings as word features (GloVe) • Ability to take into account   both distributional and dependency-path information • Comparable result to state-of-the-art  (Path-based LSTM only) • Signiﬁcantly improved upon the state-of-the-art !  (Integrated method) 2

Previous Hyponymy Relation extractions • Distributional method (Baroni+[1], Roller+[2], Weeds[3])
• given words x and y, decide whether x is   hyponym of y (isa(x, y)) • distributional vector are commonly used • Path-based method (Snow+[4], Nakashole+[5]) • hyponymy relations are expressed by  dependency paths  3

Dependency Path and Edge representation • They represent each dependency
path as a sequence of edges that leads from x to y in the dependency tree 4

Edge and Path Representation • Edge representation   l: lemma,
pos: part of speech, dep: dependency label,  dir: dependency direction  lemma: GloVe  others including out-of vocabulary lemma: Randomly initialize • Path representation • are composed of   (where are edges) • : a vector from LSTM encoder 5 op p ve1 , ve2 , . . . , vek e1, e2, . . . , ek v e = [v l , v pos , v dep , v dir ]

LSTM-based Hypernymy Detection • Path-based method • ɹɹɹɹ: the frequency
of p in paths(x,y) • ɹ: LSTM output • Integrated network • : word embedding of x, y (GloVe) • Common classiﬁer: 6 vxy = vpaths ( x,y ) = P p 2 paths ( x,y ) fp, ( x,y ) · op P p 2 paths ( x,y ) fp, ( x,y ) v xy = ⇥ v wx , v paths ( x,y ) , v wy ⇤ f p, ( x,y ) op v x , v y c = softmax ( W · vxy )

Proposed method - Architecture (Integrated model) • Integrate distributional approach
and   path-based approach • Path LSTM + Term-Pair Classiﬁer 7

Training dataset • Hyponymy relations • Distant Supervision • make
use of an already existing database • WordNet, DBPedia, Wikidata, Yago • Corpus to train LSTM path encoder • wikipedia dump (May 2015) • dependency parser: spaCy 8

Experiment and Discussion 1 - Lexical Memorization • Comparable to
state-of-the-art (path-based) • Improve state-of-the-art (combined) • And there were some lexical memorization • the score of lexical split are worse than  that of random split • this means some kind of overﬁt are occurred  (called lexical memorization) 9

Discussion 2 - Noun Phrase concepts • They do not
detect noun phrase [maybe] • it seems x and y are words, not phrase 10

References [1] Marco Baroni, Raffaella Bernardi, Ngoc-Quynh Do, and Chung-chieh
Shan. 2012. Entailment above the word level in distributional semantics. In EACL, pages 23–32. [2] Stephen Roller, Katrin Erk, and Gemma Boleda. 2014. Inclusive yet selective: Supervised distributional hypernymy detection. In COLING, pages 1025–1036 [3] Julie Weeds, Daoud Clarke, Jeremy Refﬁn, David Weir, and Bill Keller. 2014. Learning to distinguish hypernyms and co-hyponyms. In COLING, pages 2249–2259. [4] Rion Snow, Daniel Jurafsky, and Andrew Y Ng. 2004. Learning syntactic patterns for automatic hypernym discovery. In NIPS. [5] Ndapandula Nakashole, Gerhard Weikum, and Fabian Suchanek. 2012. Patty: a taxonomy of relational patterns with semantic types. In EMNLP and CoNLL, pages 1135– 1145. 11

Improving Hypernymy Detection with an Integrate...

Improving Hypernymy Detection with an Integrated Path-based and Distributional Method

himkt

More Decks by himkt

Other Decks in Science

Featured

Transcript

Improving Hypernymy Detection with  an Integrated Path-based and Distributional Method

Summary • Neural network to detect hyponymy relation • RNN

Previous Hyponymy Relation extractions • Distributional method (Baroni+[1], Roller+[2], Weeds[3])

Dependency Path and Edge representation • They represent each dependency

Edge and Path Representation • Edge representation   l: lemma,

LSTM-based Hypernymy Detection • Path-based method • ɹɹɹɹ: the frequency

Proposed method - Architecture (Integrated model) • Integrate distributional approach

Training dataset • Hyponymy relations • Distant Supervision • make

Experiment and Discussion 1 - Lexical Memorization • Comparable to

Discussion 2 - Noun Phrase concepts • They do not

References [1] Marco Baroni, Raffaella Bernardi, Ngoc-Quynh Do, and Chung-chieh