Upgrade to Pro — share decks privately, control downloads, hide ads and more …

An End-to-End Model for Question Answering over Knowledge Base with Cross-Attention Combining Global Knowledge

An End-to-End Model for Question Answering over Knowledge Base with Cross-Attention Combining Global Knowledge

Scatter Lab Inc.

July 24, 2019
Tweet

More Decks by Scatter Lab Inc.

Other Decks in Research

Transcript

  1. 스캐터랩(ScatterLab) ੌ࢚؀ച ੋҕ૑מ ML Technical Seminar Session #2 (KB-QA) 서수인

    An End-to-End Model for Question Answering over Knowledge Base with Cross-Attention Combining Global Knowledge Machine Learning Engineer
  2. • SP-based (Semantic Parsing) • Natural language -> Structured expression

    • IR-based (Information Retrieval) • Search answers from KB • NN-based (Neural Networks) • Calculate similarity between a question and candidate answers • Goal • NN-based approach (end-to-end) + constructed KB !4 KB-QA System Variation of KB-QA
  3. • Projection of answer set • For (q, a) pair,

    question q, answer a, • train: (q ~ a) • KB: (a ~ a’) <- not guarantee (previous methods) • goal: (q ~ a’) • OOV when test • For words which are not contained in train dataset • Incorporate KB (KB + train embedding) • Alleviated OOV problem !5 KB-QA System Motivation: Problems
  4. • Given question q, return answer set A • •

    Freebase: subject - predicate - object • Answer aspect • Entity • Relation • Type • Context A ∈ (a1 , a2 , a3 , . . . an ) !8 Model Overview Model Architecture
  5. • Freebase resolve 86% of questions (when using top1 result)

    • Collect entities connected to topic with 2-hops !11 Candidate Generation Extract Answer set A
  6. • Answer aspect • KB embedding • entity: • relation:

    • type: • context (1-hop entity) • • Ek ae to ee ar to er at to et ac = (c1 , c2 , . . . , cm ) to (ec1 , ec2 , . . . , ecm ) ec = 1 m m ∑ i=1 eei !14 Cross-Attention Model Answer Aspect Representation answer entity entity embedding answer relation relaltion embedding answer type type embedding answer context context embedding
  7. • Re-reading mechanism 1. First look the type of answer

    2. Reread the question 3. Find which pard is focused (handling attention) 4. Go to next aspect 5. Reread the question 6. While all aspect is utilized • Scoring • Get “question - answer” score for all answer entities • Final score = weighted sum of them !15 Cross-Attention Model Process of Cross-Attention Question entity relation type context score score score Score
  8. • Assumption: • Each answer aspect focus on
 different words

    of same question • Relation • Each word representation • Answer aspect embedding • • Attention weight: • • hj ei ∈ {ee , er , et , ec } ωij = f(WT[hj : ei ] + b) αij = exp(ωij ) ∑n k=1 exp(ωik ) qi = n ∑ j=1 αij hj S(q, ei ) = h(qi , ei ) !16 Cross-Attention Model Answer-towards-Question (A-Q) Attention
  9. • Intuition • Different question evaluate four answer aspects •

    Final similarity • Average pooling • • Attention weight: • Similarity ¯ q = 1 n n ∑ j hj ωei = f(WT[¯ q : ei ] + b) βei = exp(ωei ) ∑ ek exp(ωek ) S(q, a) = ∑ ei ∈ee ,er ,et ,ec βei S(q, ei ) !17 Cross-Attention Model Question-towards-Answer (Q-A) Attention
  10. • Construct training data • q-a pair • (randomly sampled)

    • • Positive real number margin • • Objective Function • (q, a) = (q, Cq ) = (q, [Pq , Nq ]) a ∈ Pq , a′ ∈ Nq Lq,a,a′ = [γ + S(q, a′) − S(q, a)]+ γ ( > 0) [z]+ = max(z,0) min∑ q 1 |Pq | ∑ a∈Pq ∑ a′∈Nq Lq,a,a′ !19 Other Techniques Negative Sampling for training
  11. • Problem • Question with multiple answer (X) • Question

    with unique answer (O) • Inference stage • • Smax = arg max a∈Cq {S(q, a)} A = { ̂ a|Smax − S(q, ̂ a) < γ} !20 Other Techniques Answer set for inference
  12. • Adaptation Model • TransE (Bordes et al., 2013) •

    Consider relation as translation in embedding space • Training loss • Set of KB facts • (randomly sampled) Set of corrupted fact • distance • • Train KB-QA and TransE in turns (s, p, o) ∈ S (s′, p, o′) ∈ S′ d(s + p, o) = ||s + p − o||2 2 Lk = ∑ (s,p,o)∈S ∑ (s′,p,o′)∈S′ [γk + d(s + p, o) − d(s′+ p, o′)]+ !21 Other Techniques Combining Global Knowledge
  13. • WebQuestion (Berant et al., 2013) • https://nlp.stanford.edu/software/sempre/ • https://github.com/brmson/dataset-factoid-webquestions

    • Training: 3778 q-a pairs • Testing: 2032 q-a pairs • Collected from Google Suggest API • Answers are manually labeled by Amazon MTurk • All answers are from Freebase !24 Experiment Settings Main Task
  14. • Sample question : answer • who invented dell computer?

    : Michael S. Dell • who was darth vader in episode 3? : Hayden Christensen • where is the time zone in florida? : North American Eastern Time Zone • what does donald trump own? : Trump Tower • what year did michael jordan get drafted? : 1984 NBA Draft • who plays saruman in lord of the rings? : Christopher Lee • which team does ronaldinho play for 2013? : Brazil national football team • what undergraduate school did martin luther king jr. attend? : Morehouse College • where did will smith go to high school? : Overbrook High School • what is south korea's capital city? : Seoul • who is ruling north korea now? : Kim Jong-un • where do samsung lions play? : Daegu Baseball Stadium • Single answer, W5 question (who, when, where, what, which) !25 Experiment Settings Samples of WebQuestion
  15. • Freebase (until 2007.03 ~ ) • Knowledge base constructed

    with metadata • Metaweb Technology (m&a by Google, 2010) • Discontinue updating on 2015.12.16 • Downloadable only dump data from website (https://developers.google.com/freebase/) • Format • Fact: <subject> <predicate> <object> • <Donald_Trump> <Is_president_of> <United_State_of_America> • <Seoul_Olympic> <Was_held_on> <1988> !26 Experiment Settings Knowledge Base Subject Object Predicate
  16. • Metric • Average F1 score (for top1) • Compared

    Methods • Bordes et al., 2014b • BOW to obtain single vector for question and answer • Bordes et al., 2014a • Subgraph embedding + BOW • Yang et al., 2014 • SP-based + map entities with relation from KB • Dong et al., 2015 • Use three CNNs to three aspects • Bordes et al., 2015 (+ improved by Sukhaatar et al., 2015) • Put KB-QA into Memory Networks framework !28 Results and Analysis Comparison with other approaches
  17. • Control Variables • Attention • No Attention • A-Q-Attention

    • C-Attention (Cross-) • Global Knowledge Information • No GKI • Apply GKI • Improvement by each component • uni-LSTM to Bi-LSTM: 0.9 • No ATT to A-Q-ATT: 1.5 ~ 2.2 • A-Q-ATT to C-ATT : 0.2 ~ 0.3 • No GKI to GKI: 1 ~ 1.3 !29 Results and Analysis Modal Analysis
  18. • Target • Q: Where is the carpathian mountain range

    located • A: Slovakia • Entity: /m/06npd (Slovakia) • Type: /location/country • Relation: partially_containedby • Context: /m/04dq9kf, /m/01mp, … etc !30 Results and Analysis Attention Visualization
  19. • Wrong Attention (18%) • Q: What are the songs

    that Justin Bieber wrote? • answer type: /music/composition -> strong attention in “What” rather than “songs” • Probably due to bias of training data • Complex questions (35%) • Q: When was the last time Knicks won the championship? • predicted: all championships • Cannot learn what “last” mean • Labelling Error (3%) • Q: What college did John Nash teach at? • labeled answer: Princeton University • real answer: Massachusetts Institute of Technology !31 Results and Analysis Error Analysis
  20. • Contribution • Focus on answer aspects to each word

    in question • Attention weights toward answer aspects • Dynamic representation is more precise and flexible • Leverage Global KB • Take full advantage of complete KB • Alleviate OOV problem • Results • Get state-of-the-art among end-to-end methods !33 Conclusion Summary