An End-to-End Model for Question Answering over Knowledge Base with Cross-Attention Combining Global Knowledge

스캐터랩(ScatterLab) ੌ࢚؀ച ੋҕ૑מ ML Technical Seminar Session #2 (KB-QA) 서수인
An End-to-End Model for Question Answering over Knowledge Base with Cross-Attention Combining Global Knowledge Machine Learning Engineer

#1. Introduction

!3 KB-QA System Knowledge-based Question-Answering

• SP-based (Semantic Parsing) • Natural language -> Structured expression
• IR-based (Information Retrieval) • Search answers from KB • NN-based (Neural Networks) • Calculate similarity between a question and candidate answers • Goal • NN-based approach (end-to-end) + constructed KB !4 KB-QA System Variation of KB-QA

• Projection of answer set • For (q, a) pair,
question q, answer a, • train: (q ~ a) • KB: (a ~ a’) <- not guarantee (previous methods) • goal: (q ~ a’) • OOV when test • For words which are not contained in train dataset • Incorporate KB (KB + train embedding) • Alleviated OOV problem !5 KB-QA System Motivation: Problems

#2. Overview

!7 Model Overview One line explaination

• Given question q, return answer set A • •
Freebase: subject - predicate - object • Answer aspect • Entity • Relation • Type • Context A ∈ (a1 , a2 , a3 , . . . an ) !8 Model Overview Model Architecture

#3. Technical Approaches

!10 Candidate Generation How to choose n candidate

• Freebase resolve 86% of questions (when using top1 result)
• Collect entities connected to topic with 2-hops !11 Candidate Generation Extract Answer set A

!12 Cross-Attention Model Main contribution model of paper

• Encoding • Word embedding • bi-LSTM Ew !13 Cross-Attention
Model Question Representation

• Answer aspect • KB embedding • entity: • relation:
• type: • context (1-hop entity) • • Ek ae to ee ar to er at to et ac = (c1 , c2 , . . . , cm ) to (ec1 , ec2 , . . . , ecm ) ec = 1 m m ∑ i=1 eei !14 Cross-Attention Model Answer Aspect Representation answer entity entity embedding answer relation relaltion embedding answer type type embedding answer context context embedding

• Re-reading mechanism 1. First look the type of answer
2. Reread the question 3. Find which pard is focused (handling attention) 4. Go to next aspect 5. Reread the question 6. While all aspect is utilized • Scoring • Get “question - answer” score for all answer entities • Final score = weighted sum of them !15 Cross-Attention Model Process of Cross-Attention Question entity relation type context score score score Score

• Assumption: • Each answer aspect focus on  diﬀerent words
of same question • Relation • Each word representation • Answer aspect embedding • • Attention weight: • • hj ei ∈ {ee , er , et , ec } ωij = f(WT[hj : ei ] + b) αij = exp(ωij ) ∑n k=1 exp(ωik ) qi = n ∑ j=1 αij hj S(q, ei ) = h(qi , ei ) !16 Cross-Attention Model Answer-towards-Question (A-Q) Attention

• Intuition • Diﬀerent question evaluate four answer aspects •
Final similarity • Average pooling • • Attention weight: • Similarity ¯ q = 1 n n ∑ j hj ωei = f(WT[¯ q : ei ] + b) βei = exp(ωei ) ∑ ek exp(ωek ) S(q, a) = ∑ ei ∈ee ,er ,et ,ec βei S(q, ei ) !17 Cross-Attention Model Question-towards-Answer (Q-A) Attention

!18 Other Techniques Training / Combining KB Techniques

• Construct training data • q-a pair • (randomly sampled)
• • Positive real number margin • • Objective Function • (q, a) = (q, Cq ) = (q, [Pq , Nq ]) a ∈ Pq , a′ ∈ Nq Lq,a,a′ = [γ + S(q, a′) − S(q, a)]+ γ ( > 0) [z]+ = max(z,0) min∑ q 1 |Pq | ∑ a∈Pq ∑ a′∈Nq Lq,a,a′ !19 Other Techniques Negative Sampling for training

• Problem • Question with multiple answer (X) • Question
with unique answer (O) • Inference stage • • Smax = arg max a∈Cq {S(q, a)} A = { ̂ a|Smax − S(q, ̂ a) < γ} !20 Other Techniques Answer set for inference

• Adaptation Model • TransE (Bordes et al., 2013) •
Consider relation as translation in embedding space • Training loss • Set of KB facts • (randomly sampled) Set of corrupted fact • distance • • Train KB-QA and TransE in turns (s, p, o) ∈ S (s′, p, o′) ∈ S′ d(s + p, o) = ||s + p − o||2 2 Lk = ∑ (s,p,o)∈S ∑ (s′,p,o′)∈S′ [γk + d(s + p, o) − d(s′+ p, o′)]+ !21 Other Techniques Combining Global Knowledge

#4. Experiments

!23 Experiment Settings Details about experiment settings

• WebQuestion (Berant et al., 2013) • https://nlp.stanford.edu/software/sempre/ • https://github.com/brmson/dataset-factoid-webquestions
• Training: 3778 q-a pairs • Testing: 2032 q-a pairs • Collected from Google Suggest API • Answers are manually labeled by Amazon MTurk • All answers are from Freebase !24 Experiment Settings Main Task

• Sample question : answer • who invented dell computer?
: Michael S. Dell • who was darth vader in episode 3? : Hayden Christensen • where is the time zone in ﬂorida? : North American Eastern Time Zone • what does donald trump own? : Trump Tower • what year did michael jordan get drafted? : 1984 NBA Draft • who plays saruman in lord of the rings? : Christopher Lee • which team does ronaldinho play for 2013? : Brazil national football team • what undergraduate school did martin luther king jr. attend? : Morehouse College • where did will smith go to high school? : Overbrook High School • what is south korea's capital city? : Seoul • who is ruling north korea now? : Kim Jong-un • where do samsung lions play? : Daegu Baseball Stadium • Single answer, W5 question (who, when, where, what, which) !25 Experiment Settings Samples of WebQuestion

• Freebase (until 2007.03 ~ ) • Knowledge base constructed
with metadata • Metaweb Technology (m&a by Google, 2010) • Discontinue updating on 2015.12.16 • Downloadable only dump data from website (https://developers.google.com/freebase/) • Format • Fact: <subject> <predicate> <object> • <Donald_Trump> <Is_president_of> <United_State_of_America> • <Seoul_Olympic> <Was_held_on> <1988> !26 Experiment Settings Knowledge Base Subject Object Predicate

!27 Results and Analysis About experiment results

• Metric • Average F1 score (for top1) • Compared
Methods • Bordes et al., 2014b • BOW to obtain single vector for question and answer • Bordes et al., 2014a • Subgraph embedding + BOW • Yang et al., 2014 • SP-based + map entities with relation from KB • Dong et al., 2015 • Use three CNNs to three aspects • Bordes et al., 2015 (+ improved by Sukhaatar et al., 2015) • Put KB-QA into Memory Networks framework !28 Results and Analysis Comparison with other approaches

• Control Variables • Attention • No Attention • A-Q-Attention
• C-Attention (Cross-) • Global Knowledge Information • No GKI • Apply GKI • Improvement by each component • uni-LSTM to Bi-LSTM: 0.9 • No ATT to A-Q-ATT: 1.5 ~ 2.2 • A-Q-ATT to C-ATT : 0.2 ~ 0.3 • No GKI to GKI: 1 ~ 1.3 !29 Results and Analysis Modal Analysis

• Target • Q: Where is the carpathian mountain range
located • A: Slovakia • Entity: /m/06npd (Slovakia) • Type: /location/country • Relation: partially_containedby • Context: /m/04dq9kf, /m/01mp, … etc !30 Results and Analysis Attention Visualization

• Wrong Attention (18%) • Q: What are the songs
that Justin Bieber wrote? • answer type: /music/composition -> strong attention in “What” rather than “songs” • Probably due to bias of training data • Complex questions (35%) • Q: When was the last time Knicks won the championship? • predicted: all championships • Cannot learn what “last” mean • Labelling Error (3%) • Q: What college did John Nash teach at? • labeled answer: Princeton University • real answer: Massachusetts Institute of Technology !31 Results and Analysis Error Analysis

#5. Conclusion

• Contribution • Focus on answer aspects to each word
in question • Attention weights toward answer aspects • Dynamic representation is more precise and ﬂexible • Leverage Global KB • Take full advantage of complete KB • Alleviate OOV problem • Results • Get state-of-the-art among end-to-end methods !33 Conclusion Summary

Thank you for listening

An End-to-End Model for Question Answering over...

An End-to-End Model for Question Answering over Knowledge Base with Cross-Attention Combining Global Knowledge

More Decks by Scatter Lab Inc.

Other Decks in Research

Featured

Transcript