Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Improving Hypernymy Detection with an Integrate...

October 28, 2016

Improving Hypernymy Detection with an Integrated Path-based and Distributional Method

Journal Club at Kasuga Area (University of Tsukuba)


October 28, 2016

More Decks by himkt

Other Decks in Science


  1. Improving Hypernymy Detection with
 an Integrated Path-based and Distributional Method

    2016/10/28 @ Kasuga Area himkt <himkt@klis.tsukuba.ac.jp> all figures are extracted from https://www.aclweb.org/anthology/P/P16/P16-1226.pdf Vered Shwartz Yoav Goldberg Ido Dagan
 ( Bar-Ilan University )
  2. Summary • Neural network to detect hyponymy relation • RNN

    (LSTM) to encode dependency-path features • Use word-embeddings as word features (GloVe) • Ability to take into account 
 both distributional and dependency-path information • Comparable result to state-of-the-art
 (Path-based LSTM only) • Significantly improved upon the state-of-the-art !
 (Integrated method) 2
  3. Previous Hyponymy Relation extractions • Distributional method (Baroni+[1], Roller+[2], Weeds[3])

    • given words x and y, decide whether x is 
 hyponym of y (isa(x, y)) • distributional vector are commonly used • Path-based method (Snow+[4], Nakashole+[5]) • hyponymy relations are expressed by
 dependency paths
  4. Dependency Path and Edge representation • They represent each dependency

    path as a sequence of edges that leads from x to y in the dependency tree 4
  5. Edge and Path Representation • Edge representation 
 l: lemma,

    pos: part of speech, dep: dependency label,
 dir: dependency direction
 lemma: GloVe
 others including out-of vocabulary lemma: Randomly initialize • Path representation • are composed of 
 (where are edges) • : a vector from LSTM encoder 5 op p ve1 , ve2 , . . . , vek e1, e2, . . . , ek v e = [v l , v pos , v dep , v dir ]
  6. LSTM-based Hypernymy Detection • Path-based method • ɹɹɹɹ: the frequency

    of p in paths(x,y) • ɹ: LSTM output • Integrated network • : word embedding of x, y (GloVe) • Common classifier: 6 vxy = vpaths ( x,y ) = P p 2 paths ( x,y ) fp, ( x,y ) · op P p 2 paths ( x,y ) fp, ( x,y ) v xy = ⇥ v wx , v paths ( x,y ) , v wy ⇤ f p, ( x,y ) op v x , v y c = softmax ( W · vxy )
  7. Proposed method - Architecture (Integrated model) • Integrate distributional approach

 path-based approach • Path LSTM + Term-Pair Classifier 7
  8. Training dataset • Hyponymy relations • Distant Supervision • make

    use of an already existing database • WordNet, DBPedia, Wikidata, Yago • Corpus to train LSTM path encoder • wikipedia dump (May 2015) • dependency parser: spaCy 8
  9. Experiment and Discussion 1 - Lexical Memorization • Comparable to

    state-of-the-art (path-based) • Improve state-of-the-art (combined) • And there were some lexical memorization • the score of lexical split are worse than
 that of random split • this means some kind of overfit are occurred
 (called lexical memorization) 9
  10. Discussion 2 - Noun Phrase concepts • They do not

    detect noun phrase [maybe] • it seems x and y are words, not phrase 10
  11. References [1] Marco Baroni, Raffaella Bernardi, Ngoc-Quynh Do, and Chung-chieh

    Shan. 2012. Entailment above the word level in distributional semantics. In EACL, pages 23–32. [2] Stephen Roller, Katrin Erk, and Gemma Boleda. 2014. Inclusive yet selective: Supervised distributional hypernymy detection. In COLING, pages 1025–1036 [3] Julie Weeds, Daoud Clarke, Jeremy Reffin, David Weir, and Bill Keller. 2014. Learning to distinguish hypernyms and co-hyponyms. In COLING, pages 2249–2259. [4] Rion Snow, Daniel Jurafsky, and Andrew Y Ng. 2004. Learning syntactic patterns for automatic hypernym discovery. In NIPS. [5] Ndapandula Nakashole, Gerhard Weikum, and Fabian Suchanek. 2012. Patty: a taxonomy of relational patterns with semantic types. In EMNLP and CoNLL, pages 1135– 1145. 11