Semantic Parsing Methods

Semantic Parsing Methods An Overview Haruki Kirigaya 2016.09.30 Haruki Kirigaya
2016.09.30 1 / 86

Agenda 1 Semantics 2 Parsing 3 Summary Haruki Kirigaya 2016.09.30
2 / 86

3 / 86

Background When it comes to the understanding of natural language
sentences, NLP researchers solve it in various granularities. These tasks diﬀer in the amount of information they use. Information Extraction (less informative) is a(Obama, PRESIDENT) Haruki Kirigaya 2016.09.30 4 / 86

sentences, NLP researchers solve it in various granularities. These tasks diﬀer in the amount of information they use. Information Extraction (less informative) is a(Obama, PRESIDENT) Summarization (modestly informative) Obama wins. Haruki Kirigaya 2016.09.30 4 / 86

sentences, NLP researchers solve it in various granularities. These tasks diﬀer in the amount of information they use. Information Extraction (less informative) is a(Obama, PRESIDENT) Summarization (modestly informative) Obama wins. Semantic Parsing (exact matching) ∃e.beat(e) ∧ Sub(e, Obama) ∧ Obj(e, Romney) Haruki Kirigaya 2016.09.30 4 / 86

sentences, NLP researchers solve it in various granularities. These tasks diﬀer in the amount of information they use. Information Extraction (less informative) is a(Obama, PRESIDENT) Summarization (modestly informative) Obama wins. Semantic Parsing (exact matching) ∃e.beat(e) ∧ Sub(e, Obama) ∧ Obj(e, Romney) Caveat Semantic here is more of composition than telling apart from word senses. Haruki Kirigaya 2016.09.30 4 / 86

Semantic Parsing Task The key task of semantic parsing is
to ﬁnd an f such that f : Sentence → LogicForm Haruki Kirigaya 2016.09.30 5 / 86

Semantic Parsing Task The key task of semantic parsing is
to ﬁnd an f such that f : Sentence → LogicForm Generally, there are 3 aspects a semantic parser need take into consideration: Modelling: how to represent a logic form Parsing: design a grammar and parsing algorithm Learning: use supervision to ﬁx parameters Haruki Kirigaya 2016.09.30 5 / 86

Agenda 1 Semantics Davidsonian Representation MRS AMR 2 Parsing 3
Summary Haruki Kirigaya 2016.09.30 6 / 86

Logic Form from Example Haruki Kirigaya 2016.09.30 7 / 86

Logic Form from Example Brutus stabs Caesar. stab(Brutus, Caesar) predicate
Haruki Kirigaya 2016.09.30 7 / 86

Brutus stabs Caesar with a knife. stab(Brutus, Caesar, knife) n-ary predicate Haruki Kirigaya 2016.09.30 7 / 86

Brutus stabs Caesar with a knife. stab(Brutus, Caesar, knife) n-ary predicate Brutus stabs Caesar in the agora. stab(Brutus, Caesar, agora) ambiguous predicate Haruki Kirigaya 2016.09.30 7 / 86

Brutus stabs Caesar with a knife. stab(Brutus, Caesar, knife) n-ary predicate Brutus stabs Caesar in the agora. stab(Brutus, Caesar, agora) ambiguous predicate Brutus stabs Caesar in the agora with a knife. stab(Brutus, Caesar) & with(knife) & in(agora) move adjunct apart Haruki Kirigaya 2016.09.30 7 / 86

Logic Form from Example Brutus stabs Caesar in the agora
with a knife. stab(Brutus, Caesar) & with(knife) & in(agora) Haruki Kirigaya 2016.09.30 8 / 86

with a knife. stab(Brutus, Caesar) & with(knife) & in(agora) Brutus stabs Caesar with a knife in the agora and twisted it hard. stab(Brutus, Caesar) & with(knife) & in(agora) & twist(Brutus, knife) & hard Haruki Kirigaya 2016.09.30 8 / 86

with a knife. stab(Brutus, Caesar) & with(knife) & in(agora) Brutus stabs Caesar with a knife in the agora and twisted it hard. stab(Brutus, Caesar) & with(knife) & in(agora) & twist(Brutus, knife) & hard The standard predicate calculus has problems. unable to refer to predicates natural language are ﬂexible in the number of arguments Pass the axe. Pass me the axe. Haruki Kirigaya 2016.09.30 8 / 86

Davidsonian Representation Semantic is characterized in events. We don’t know
an event beforehand, thus we existentially quantify it. Brutus stabs Caesar with a knife in the agora and twisted it hard. ∃e.stab(e, Brutus, Caesar) ∧ with(e, knife) ∧ in(e, agora) ∧(∃e .twist(e , Brutus, knife) ∧ hard(e )) Caesar is stabbed. ∃x∃e.stab(e, x, Caesar) Missing arguments are left with placeholders. Haruki Kirigaya 2016.09.30 9 / 86

Problem in Davidsonian Way Consider the following sentence: Examples In
a dream last night, I was stabbed, although in fact nobody had stabbed me and I wasn’t stabbed with anything. There’s NOBODY here to initiate the stab event. The representation should correspond to the utterance rather than reality? Haruki Kirigaya 2016.09.30 10 / 86

neo-Davidsonian Representation (Parson, 1995) Replace arguments (and placeholders) with independent
conjuncts. Haruki Kirigaya 2016.09.30 11 / 86

conjuncts. Basically, two roles are important: Agent, Thematic/Patient. Haruki Kirigaya 2016.09.30 11 / 86

conjuncts. Basically, two roles are important: Agent, Thematic/Patient. Brutus stabbed Caesar in the back with a knife ∃e.stab(e) ∧ Agent(e, Brutus) ∧ Patient(e, Caesar) ∧with(e, knife) ∧ in(e, agora) Haruki Kirigaya 2016.09.30 11 / 86

Advantages of the neo-Davidsonian (Palmer, 2014) (1) Entailment Given the
following sentences A. Brutus stabbed Caesar in the back with a knife. B. Brutus stabbed Caesar in the back. C. Brutus stabbed Caesar with a knife. We know A → B ∨ C but NOT B ∨ C → A. Haruki Kirigaya 2016.09.30 12 / 86

Advantages of the neo-Davidsonian (Palmer, 2014) (1) Entailment Given the
following sentences A. Brutus stabbed Caesar in the back with a knife. B. Brutus stabbed Caesar in the back. C. Brutus stabbed Caesar with a knife. We know A → B ∨ C but NOT B ∨ C → A. Using neo-Davidsonian representation preserves this phenomenon. Let Agt = Agent, B = Brutus, C = Caesar, Pat = Patient, then. A. ∃e.stab(e) ∧ Agt(e, B) ∧ Pat(e, C) ∧ in(e, back) ∧ with(e, knife) B. ∃e.stab(e) ∧ Agt(e, B) ∧ Pat(e, C) ∧ in(e, back) C. ∃e.stab(e) ∧ Agt(e, B) ∧ Pat(e, C) ∧ with(e, knife) Haruki Kirigaya 2016.09.30 12 / 86

Advantages of the neo-Davidsonian (2) Scope Traditional way uses scope
to connect an adjunct and a verb. x stabbed y violently with z There’re two logically equative representations with different scope settings: (with z (violently (stab (y)))) (x) (violently (with z (stab (y)))) (x) But a flat representation like the neo-Davidsonian keeps meaning consistent and doesn’t introduce explicit syntactic scope. The slides will talk about flat and scope later. Haruki Kirigaya 2016.09.30 13 / 86

Advantages of the neo-Davidsonian (3) Temporal and Causal Sentences Mary
saw Brutus stabbed Caesar. Traditional way: Mary saw Brutus & Brutus stabbed Caesar. neo-Davidsonian way ∃e.see(e) ∧ Agt(e, Mary) ∧ (∃e .stab(e ) ∧ Agt(e , Brutus) ∧Pat(e, e ))) After the singing of national anthem, they saluted the flag. After the national anthem was sung, they saluted the flag. ∃e.salute(e) ∧ Agt(e, they) ∧ Pat(e, flag) ∧(∃e .sing(e ) ∧ Agt(e , they) ∧ Pat(e, NationalAnthem) ∧after(e, e )) Haruki Kirigaya 2016.09.30 14 / 86

Possible Problems of the neo-Davidsonian I sold him a car
for $50,000. Which is the patient, car or $50,000? Haruki Kirigaya 2016.09.30 15 / 86

for $50,000. Which is the patient, car or $50,000? I sold a car for Mary for $50,000. the same preposition with diﬀerent meanings Haruki Kirigaya 2016.09.30 15 / 86

for $50,000. Which is the patient, car or $50,000? I sold a car for Mary for $50,000. the same preposition with diﬀerent meanings Mary fed her baby. Can the baby, who is feeding, be the agent? Haruki Kirigaya 2016.09.30 15 / 86

for $50,000. Which is the patient, car or $50,000? I sold a car for Mary for $50,000. the same preposition with diﬀerent meanings Mary fed her baby. Can the baby, who is feeding, be the agent? Brutus stabbed Caesar with a knife. The removal of Brutus may be diﬀerent from that of knife. Haruki Kirigaya 2016.09.30 15 / 86

for $50,000. Which is the patient, car or $50,000? I sold a car for Mary for $50,000. the same preposition with diﬀerent meanings Mary fed her baby. Can the baby, who is feeding, be the agent? Brutus stabbed Caesar with a knife. The removal of Brutus may be diﬀerent from that of knife. Brutus stabbed Caesar once. It’s hard to specify the event happens only once in neo-Davidsonian. Haruki Kirigaya 2016.09.30 15 / 86

for $50,000. Which is the patient, car or $50,000? I sold a car for Mary for $50,000. the same preposition with diﬀerent meanings Mary fed her baby. Can the baby, who is feeding, be the agent? Brutus stabbed Caesar with a knife. The removal of Brutus may be diﬀerent from that of knife. Brutus stabbed Caesar once. It’s hard to specify the event happens only once in neo-Davidsonian. A saw B leave. When B left, he had the documents in his briefcase. = A saw B leave with the documents in his briefcase. If both leave events are the same, to make the inference work, how could A see one one without seeing another? Haruki Kirigaya 2016.09.30 15 / 86

Summary of the neo-Davidsonian The neo-Davidsonian have several characteristics in
representating semantic. Some of them are advantages while others are trival choices from various approaches. uses variables and is ﬂat. event-style. An event is unique in time of occurrence. event arguments moved into roles and independent conjuncts. modiﬁers(adjectives, adverbs, adjuncts) are conjunct predicates transparent scope facilitate logical inference Haruki Kirigaya 2016.09.30 16 / 86

Minimal Recursion Semantics (Copestake, 2005) MRS is another ﬂat semantic
framework, serving as the basis of English Resource Semantic (ERS) or English Resource Grammar (ERG). Expressive Adequacy: ability to express meaning correctly Grammatical Compatibility: ability to link representations to grammatical information. Computation Tractability: ability to compare two representations (equality, relation, etc.) underspeciﬁability: leave semantic distinctions unresolved Haruki Kirigaya 2016.09.30 18 / 86

An MRS Example Every big white horse sleeps. h0:every(x) h1:big(x),h1:white(x),h1:horse(x)
h2:sleep(x) Haruki Kirigaya 2016.09.30 19 / 86

Why a ﬂat form In MT or other task, a
structural representation is hard to use and unnecessary. Examples Sentence: white English horse Rule: white(horse)(x) ↔ Schimmel(x) Form: white(English(horse)) (x) Examples Sentence: The beginning of spring arrived. Rule: beginning of spring ↔ Fr¨ uhlingsanfang Form 1: def q(x, spring(x), the(y, beginning(y, x), arrive(y))) Form 2: the(y, def q(x, spring(x), beginning(y, x), arrive(y))) Haruki Kirigaya 2016.09.30 20 / 86

Why a ﬂat form A ﬂat form is a group
of elementary predicates. Examples Sentence: white English horse Rule: white(horse)(x) ↔ Schimmel(x) Form: white(x) & English(x) & horse(x) Examples Sentence: The beginning of spring arrived. Rule: beginning of spring ↔ Fr¨ uhlingsanfang Form: the(y) & beginning(y, x) & def(x) & spring(x) & arrive(e, y) Haruki Kirigaya 2016.09.30 21 / 86

Underspeciﬁability in MRS There may be several semantically identical representations
of a sentence. Every dog chases some white cat. Haruki Kirigaya 2016.09.30 22 / 86

Underspeciﬁability in MRS There may be several semantically identical representations
of a sentence. Every dog chases some white cat. Leave some handles unspeciﬁed. Then specify it later: h0 = h1, h3 = h5, h7 = h4 constraints, h3 = h7 to make it still a tree qeq constraint, h0 =q h5 is a trival example Haruki Kirigaya 2016.09.30 22 / 86

MRS formally in a whole MRS is a quadruple {GT,
LT, R, C} GT: global top. h0 LT: local top. h1, h4, h5 (semantic of local phrase) R: relations. h1:every(x, h2, h3), h5:dog(y, h6, h7), h4:chase(x), etc. C: constraints. h0 qeq h4, etc. Haruki Kirigaya 2016.09.30 23 / 86

Highlights of MRS We reify scopal relationships as handles so
that syntactically the language looks ﬁrst-order. Preserve underspeciﬁability Haruki Kirigaya 2016.09.30 24 / 86

Abstract Meaning Representation (Banarescu, 2013) AMR is an semantic representation
that is rooted, directed and labeled graph is identical for diﬀerent utterance uses variables for co-reference uses PropBank frame (analogous to roles in neo-Davidsonian) designs non-core relations out of PropBank (analogous to adjuncts in neo-Davidsonian) Speciﬁcation: https://github.com/amrisi/amr-guidelines/blob/master/amr.md Haruki Kirigaya 2016.09.30 26 / 86

An AMR Example Brutus stabbed Caesar with a knife in
the back in the agora and twisted it hard. (s / stab :ARG0 (p / person :name (n / name :op1 "Brutus") :ARG0-of (t / twist :ARG1 k :manner (h / hard))) :ARG1 (p2 / person :name (n2 / name :op1 "Caesar")) :ARG2 (k / knife) :ARG3 (b / back) :location (a / agora)) Haruki Kirigaya 2016.09.30 27 / 86

Event Frames Rise from Various POS Verb Haruki Kirigaya 2016.09.30
28 / 86

Event Frames Rise from Various POS Verb Noun Examples the
destruction of the city by the God (d / destroy-01 :ARG0 (g / God) :ARG1 (c / city)) Examples the bond investor (p / person :ARG0-of (i / invest-01 :ARG1 (b / bond))) but professor doesn’t yield an event frame Haruki Kirigaya 2016.09.30 28 / 86

Event Frames Rise from Various POS Verb Noun Examples the
destruction of the city by the God (d / destroy-01 :ARG0 (g / God) :ARG1 (c / city)) Examples the bond investor (p / person :ARG0-of (i / invest-01 :ARG1 (b / bond))) Adjective Examples the attractive spy (s / spy :ARG0-of (a / attract-01)) Haruki Kirigaya 2016.09.30 28 / 86

Reiﬁcation - Frame from Non-Core Relation An adjunct for non-core
relation in AMR must serve as a role for the relation, rather than for any object participating in that relation. Examples the marble in the jar (m / marble :location (j / jar)) the marble is not in the jar (b / be-located-at-91 :ARG1 (m / marble) :ARG2 (j / jar) :polarity -) Semantic Error (m / marble :location (j / jar :polarity -)) which reads the marble is in the non-jar Haruki Kirigaya 2016.09.30 29 / 86

Other Language Phenomenons Deﬁned in AMR AMR deﬁnes approximately 100
relations for language phenomenons. negation and modals interrogation and wh-questions named entities location source, destination, path cause, concession, condition quantities, date, time link with wikipedia article :wiki “Barack Obama” . . . Haruki Kirigaya 2016.09.30 30 / 86

AMR Data Overview 1. Annotated Corpus: The Little Prince, 1274:145:143
The Little Prince Chinese Version, 1274:145:143 Bio AMR Corpus from PubMed (cancer) articles, 5452:500:500 LDC Corpus General Release 1.0 (June 2014), 13051 in all, a new general release is due in summer of 2016 2. Evaluation: smatch metric, comparison of two AMR 3. SemEval-2017 Task 9: Parsing and Generation English Biomedical Data to AMR (SemEval-2016 Task 8) AMR to English Generation 4. A python parser: https://github.com/nschneid/amr-hackathon Haruki Kirigaya 2016.09.30 31 / 86

Chinese AMR Corpus Example Haruki Kirigaya 2016.09.30 32 / 86

AMR Editor A simple web editor to build an AMR.

34 / 86

Parsing Methods There’re many semantic parsing paradigms. Some of them
are new methods while others borrow ideas from other domains or tasks to do semantic parsing exactly. Shift-Reduce (LR) (1993) Combinatory Categorial Grammar (2005) Word Alignment (Synchronized CFG) (2006) Generative Model (2008) Syntactic Parse to Semantic Parse (2009) Weak Supervision and Unsupervised Methods (2010) Large-scale SP for Freebase and QA (2013) Paraphrase-driven SP (2014) Neural Semantic Parsing (2015) Haruki Kirigaya 2016.09.30 35 / 86

Agenda 1 Semantics 2 Parsing Shift-Reduce CCG Word Alignments Semantic
Parsing from Syntactic Parses Weak and Unsupervised Parser Paraphrase-driven Parsing Neural Semantic Parsing 3 Summary Haruki Kirigaya 2016.09.30 36 / 86

Inductive Logic Programming (Zelle et al., 1993) Shift-Reduce is a
simple bottom-up parsing. Each action correspond to a prolog clause. Haruki Kirigaya 2016.09.30 37 / 86

Inductive Logic Programming (Zelle et al., 1993) CHILL(Constructive Heuristic Induction
for Language Learning) Find Generalization: merge clauses not cover any negative sample. Reduce Deﬁnition: prefer new clause to prove positive examples Haruki Kirigaya 2016.09.30 38 / 86

CHILL on GeoQuery (Zelle et al., 1996) Haruki Kirigaya 2016.09.30
39 / 86

Combinatory Category Grammar (Steedman, 1996, 2000) CCG comes with a
lexicon whose element is a pair of word and a category: borders := (S\NP)/NP : λx.λy.borders(y, x) word: borders syntactic type: (S\NP)/NP semantic type: λx.λy.borders(y, x) Haruki Kirigaya 2016.09.30 41 / 86

Combinatory Category Grammar (Steedman, 1996, 2000) Categories can be combined.
forward and backward application A / B : f + B : x ⇒ A : f(x) B : x + A \ B : f ⇒ A : f(x) forward and backword composition A / B : f + B / C : g ⇒ A / C : f ◦ g A \ B : f + B \ C : g ⇒ A \ C : f ◦ g type raising X ⇒ T/(T\X) Haruki Kirigaya 2016.09.30 41 / 86

Combinatory Category Grammar (Steedman, 1996, 2000) A CCG Parse Example.

Semantic Parsing using CCG on GeoQuery Zettlemoyer and Collins, 2005
Given the lexicon and model parameter, CCG is formulated as a log-linear probablistic model to deal with ambiguity, e.g. duplicated lexicon entries for a word, and spurious ambiguity: P(L, T | S; ¯ θ) = exp(¯ f (L, T, S) · ¯ θ) (L,T) exp(¯ f (L, T, S) · ¯ θ) And we can do inference on the model: L = arg max L P(L | S; ¯ θ) = arg max L T P(L, T | S; ¯ θ) Features are designed as local and thus we can use dynamic programming (beam-search acturally) and prune the search space (like CKY-style). Haruki Kirigaya 2016.09.30 42 / 86

Learning the Model (Zettlemoyer et al. 2005) Learning the parameters
using SGD. Haruki Kirigaya 2016.09.30 43 / 86

Learning the Lexicon (Zettlemoyer et al. 2005) GENLEX(S, L) =
{x := y | x ∈ W (S), y ∈ C(L)} W(S) is all subsequence of S C(L) produces categories using rules L triggered Haruki Kirigaya 2016.09.30 44 / 86

Problems in ZC05 GENLEX is controlled by rules, and will
be insuﬃcient if the rules don’t cover all the (S, L) pairs. Examples Through which states does the Mississippi run. GENLEX doesn’t trigger a category suitable for the through-adjunct placed ahead. Namely, phrase order may be relaxed. Haruki Kirigaya 2016.09.30 45 / 86

Relaxed Combinatory Rules (Zettlemoyer et al., 2007) relaxed function application
relaxed function composition role-hypothesising type shifting (for missing predicates) null-head type shifting (for missing arguments) crossed functional composition Triggers are added for these new rules, too. Haruki Kirigaya 2016.09.30 46 / 86

Online Learning (Zettlemoyer et al., 2007) Use a peceptron learning
instead. New features are also added. Haruki Kirigaya 2016.09.30 47 / 86

Problems in ZC07 GENLEX needs hand-written rules. Haruki Kirigaya 2016.09.30
48 / 86

CCG Induction using Unification (Kwiatkowski et al. 2010) Unification in
wikipedia: Unification is an algorithmic process of solving equations between symbolic expressions. e.g. {cons(x, cons(x, nil)) = cons(2, y)} ⇒ {x → 2, y → cons(2, nil)} Here unification aims to find f and g given h, s.t. h = λx.f (g(x)) or h = f (g). Haruki Kirigaya 2016.09.30 49 / 86

CCG Induction using Unification (Kwiatkowski et al. 2010) Unification in
wikipedia: Unification is an algorithmic process of solving equations between symbolic expressions. e.g. {cons(x, cons(x, nil)) = cons(2, y)} ⇒ {x → 2, y → cons(2, nil)} Here unification aims to find f and g given h, s.t. h = λx.f (g(x)) or h = f (g). For example, the given initial lexical entry New York borders Vermont S : next to(ny, vt) will be splitted as New York borders S/NP : λx.next to(ny, vt) Vermont NP : vt Haruki Kirigaya 2016.09.30 49 / 86

CCG Induction using Uniﬁcation (Kwiatkowski et al. 2010) Parsing with
PCCG P(y, z | x; θ, Λ) = exp(θ · φ(x, y, z)) Z(y , z ) f (x) = arg max z p(z | x; θ, Λ) p(z | x; θ, Λ) = y p(y, z | x; θ, Λ) Again, to compute the parse eﬃciently, CKY-style parsing with dynamic programming summing over y with inside-outside algorithm Haruki Kirigaya 2016.09.30 50 / 86

CCG Induction using Uniﬁcation (Kwiatkowski et al. 2010) Learning algorithm:
NEW-LEX will consider whether to split the lexical entries and gives new lexicon from arg maxy∗ p(y∗ | xi , zi ; θ , Λ ) Haruki Kirigaya 2016.09.30 50 / 86

Split a lexicon (Kwiatkowski et al. 2010) Split a lexical
entry: Step 1, function New York borders Vermont S : next to(ny, vt) uniﬁcation constraints (otherwise inﬁnite-result): No vacuous variables: g = λx.tex limited coordination extraction: g contains less than N adjuncts limited application: f contains no new variables for non-variable subexpression in h like h = λx.in(x, tex) f → λq.q(tex) g → λyλx.in(x, y) Haruki Kirigaya 2016.09.30 51 / 86

entry: Step 1, function New York borders Vermont S : next to(ny, vt) uniﬁcation constraints (otherwise inﬁnite-result): No vacuous variables: g = λx.tex limited coordination extraction: g contains less than N adjuncts limited application: f contains no new variables for non-variable subexpression in h like h = λx.in(x, tex) f → λq.q(tex) g → λyλx.in(x, y) we can get many (f, g) pairs, among which there is: f → λx.next to(ny, x) g → vt Haruki Kirigaya 2016.09.30 51 / 86

entry: Step 2, syntactic type New York borders Vermont S : next to(ny, vt) According to CCG combinatory rules(only 4 here), deﬁne SC (A) = {FA(A) ∪ BA(A) ∪ FC(A) ∪ BC(A)} FA(X : h) = {(X/Y : f , Y : g) | h = f (g) ∧ Y = C(T(g))} BA(X : h) = {(Y : g, X\Y : f ) | h = f (g) ∧ Y = C(T(g))} FC(X/Y : h) = {(X/W : f , W /Y : g) | h = λx.f (g(x)) ∧ W = C(T(g(x))) BC(X\Y : h) = {(W \Y : f , X\W : g) | h = λx.f (g(x)) ∧ W = C(T(g(x))) where T : F → {e, t, F} is the type function and C is deﬁned as C(T) =      NP if T = e S if T = t C(T2)|C(T1) if T = T1, T2 Haruki Kirigaya 2016.09.30 52 / 86

entry: Step 2, syntactic type New York borders Vermont S : next to(ny, vt) These are some possible pair from the splitting set. Semantic: (λx.next to(ny, x), vt) Syntactic: (S/NP, NP) Semantic: (ny, λx.next to(x, vt)) Syntactic: (NP, S\NP) Semantic: (λx.next to(x, vt), ny) Syntactic: (S/NP, NP) Haruki Kirigaya 2016.09.30 52 / 86

entry: Step 3, word sequence New York borders Vermont S : next to(ny, vt) Splitting is deﬁned as SL(w0:n A) = {(w0:i B, wi+1:n C) | 0 ≤ i < n ∧ (B, C) ∈ SC (A)} Haruki Kirigaya 2016.09.30 53 / 86

entry: Step 3, word sequence New York borders Vermont S : next to(ny, vt) Splitting is deﬁned as SL(w0:n A) = {(w0:i B, wi+1:n C) | 0 ≤ i < n ∧ (B, C) ∈ SC (A)} For some speciﬁc i, the previous splits may raise problems. (S/NP : λx.next to(ny, x), NP : vt) Sequence: (New York borders, Vermont) (NP : ny, S\NP : λx.next to(x, vt)) Sequence: (New York, borders Vermont) (S/NP : λx.next to(x, vt), NP : ny) Sequence: (borders Vermont, New York) incorrect Haruki Kirigaya 2016.09.30 53 / 86

Problems in Kwiatkowski et al. 2010 Learned CCG lexicon is
too big. Haruki Kirigaya 2016.09.30 54 / 86

Factored Lexicon in CCG (Kwiatkowski et al 2011) Original lexical
entry: Boston N/N : λf λx.from(x, bos) ∧ f (x) Factored Parts: lexeme, pair of a word span and a constant list: (Boston, [from, bos]) template, λ(w, v).(w N/N : λf λx.v1(x, v2) ∧ f (x)) Haruki Kirigaya 2016.09.30 55 / 86

Factored Lexicon in CCG (Kwiatkowski et al 2011) Original lexical
entry: Boston N/N : λf λx.from(x, bos) ∧ f (x) Factored Parts: lexeme, pair of a word span and a constant list: (Boston, [from, bos]) template, λ(w, v).(w N/N : λf λx.v1(x, v2) ∧ f (x)) Two type of factorization: 1 maximal factor: all constants are in lexeme (Boston, [from, bos]), λ(w, v).(w N/N : λf λx.v1(x, v2) ∧ f (x)) 2 partial factor: some constants remain in the template (Boston, [bos]), λ(w, v).(w N/N : λf λx.from(x, v1) ∧ f (x)) Partial factor is used for missing words: ﬂights Boston to New York Haruki Kirigaya 2016.09.30 55 / 86

Factored Lexicon in CCG (Kwiatkowski et al 2011) Learning is
similar but to consider factorization: Haruki Kirigaya 2016.09.30 56 / 86

Ontological Mismatch Problem GeoQuery / ATIS dataset is too small.
Learning a parser is easy for it. a few predicates a few utterances (more than predicate) Haruki Kirigaya 2016.09.30 57 / 86

Learning a parser is easy for it. a few predicates a few utterances (more than predicate) If a database has more predicates and thus more capable to answer more questions in theory, the amount of possible utterance can go even further. Haruki Kirigaya 2016.09.30 57 / 86

Learning a parser is easy for it. a few predicates a few utterances (more than predicate) If a database has more predicates and thus more capable to answer more questions in theory, the amount of possible utterance can go even further. What’s worse, new utterances linguistically involve more predicates in theory, but database schema is ﬁxed and supports only limited predicates. Haruki Kirigaya 2016.09.30 57 / 86

Learning a parser is easy for it. a few predicates a few utterances (more than predicate) If a database has more predicates and thus more capable to answer more questions in theory, the amount of possible utterance can go even further. What’s worse, new utterances linguistically involve more predicates in theory, but database schema is fixed and supports only limited predicates. parse to more predicates: unusable on databases parse to fit the schema: difficult to learn Haruki Kirigaya 2016.09.30 57 / 86

SP with On-the-ﬂy Matching(Kwiatkowski et al., 2013) Choose to convert
Q1 to MR1, Q2 to MR2, then MR2 to MR1. Using Q-A pairs. Haruki Kirigaya 2016.09.30 58 / 86

SP with On-the-ﬂy Matching(Kwiatkowski et al., 2013) Parsing Framework Domain-independent
parsing use a domain-independent CCG parser(Clark & Curran, 2007) to convert the utterance to underspeciﬁed LF, with a hand-written lexicon 59 lexical categories with POS tags, and assign to words based on POS tags from Wiktionary. 49 domain-independent lexical items (what, when, and, is, etc.) Ontological Matching Use a series of matching operations M = o1, o2, · · · Structural Match: Collapse Operator, Expansion Operator Constant Matching (replace in the same type) Haruki Kirigaya 2016.09.30 59 / 86

SP with On-the-ﬂy Matching(Kwiatkowski et al., 2013) Haruki Kirigaya 2016.09.30
59 / 86

SP with On-the-ﬂy Matching(Kwiatkowski et al., 2013) Parsing and Learning:
Parse(x, O) = arg maxd∈GEN(x,O) (Score(d)) Score(d) = φ(d)θ = φ(Π)θ + o∈M φ(o)θ Haruki Kirigaya 2016.09.30 59 / 86

SP with On-the-ﬂy Matching(Kwiatkowski et al., 2013) Features: CCG Parse
Features(Pi): #u category, #(word, category), #(POS, category) Structural Features(M): identity of complex-typed const., domain-indep. const. Lexical Features(M): for (cu, cO), φnp, φstem, φsyn, φfp:stem, φdef overlap Knowledge Base Features: exec y on K, φdirect, φjoin, φempty , φ0, φ1 Haruki Kirigaya 2016.09.30 59 / 86

Other CCG Works Artzi et al., TACL2013, Weakly Supervised Learning
of Semantic Parsers for Mapping Instructions to Actions Modeling robot instruction and get feedback from robot action. Reddy et al., TACL2014, Large-scale Semantic Parsing without Question-Answer Pairs Use ClueWeb09 and FACC1, and general CCG parser to build a LF, which is then converted to an ungrounded graph sharing commonalities with Freebase. Artzi et al., EMNLP2015, Broad-coverage CCG Semantic Parsing with AMR To deal with co-reference in AMR, use Skolem Terms (Steedman, 2011) to build an underspeciﬁed LF, which is then mapped to speciﬁed LF. Haruki Kirigaya 2016.09.30 60 / 86

Learning SP with SMT(Wong and Mooney, 2006) Synchronized CFG rule:
X → α, β (pattern & template) Haruki Kirigaya 2016.09.30 62 / 86

Learning SP with SMT(Wong and Mooney, 2006) Parsing: enumerate derivations
that gives e, f f ∗ = m(arg max d∈D(G|e) Pr(d | e; λ)) Pr λ (d | e) = 1 Zλ(e) exp i λi fi (d) Features (3000 or so): number of each rule r is used in derivation number of each word w generated from gaps Parameter Estimation: Viterbi algorithm is used for eﬃcient decoding. Since gold derivation is latent, EM is used to ﬁnd the optimal parameters. Haruki Kirigaya 2016.09.30 62 / 86

Learning SP with SMT(Wong and Mooney, 2006) Grammar rules acquisition:
Step 1, suppose we know the grammar, LF can be convert to a production (rule) sequence of the left-most top-down derivation (by convention). We prefer derivation sequence over LF because MR may be not well-formed without grammar and MR tokens can be polysemy or carry no speciﬁc meaning. Examples (Sentence and LF from RoboCup) ((bowner our {4}) (do our {6} (pos (left (half our))))) If our player 4 has the ball, then our player 6 should stay in the left side of our half. Haruki Kirigaya 2016.09.30 63 / 86

Step 2, use GIZA++ to ﬁnd alignment between words and derivation rules. Haruki Kirigaya 2016.09.30 63 / 86

Step 3, extract bottom up, starting with rules where RHS contains only terminals, then those whose RHS contains non-terminals. UNUM → 4, 4 TEAM → our, our CONTITION → TEAM player UNUM has {1} ball , (bowner TEAM {UNUM}) Haruki Kirigaya 2016.09.30 63 / 86

Step 3 special case: rule that doesn’t derive any terminal, and that break links outside this sub-parse tree. Merge the rules as: REGION → (left (penalty-area TEAM) For excessively merged rules (overﬁtting), try a greedy link-removal policy (alignment ﬁxing). Haruki Kirigaya 2016.09.30 63 / 86

SCFG with Lambda Calculus(Wong and Mooney, 2007) Use lambda expression
for semantic function instead. To improve NL-MR isomorphism, ﬁnd a MST on a graph where edges between rules are established for any shared variable and weighted by the minimal word distance. Haruki Kirigaya 2016.09.30 64 / 86

Data Recombination (Jia and Liang, 2016) Fit a new model
in SCFG using 3 kinds of policies. Then draw training examples from it. Haruki Kirigaya 2016.09.30 65 / 86

Data Recombination (Jia and Liang, 2016) Haruki Kirigaya 2016.09.30 65
/ 86

Generative Models Hybrid Tree (Lu et al., 2008) Haruki Kirigaya
2016.09.30 66 / 86

Generative Models Learning with Less Supervision (Liang et al., 2009)
Given a World State paired with several sentences. p(r, f , c, w | s) = p(r | s)p(f | r)p(c, w | r, f , s) Haruki Kirigaya 2016.09.30 66 / 86

Using an Syntactic Parse(Ge and Mooney, 2009) Parser Components: an
existing syntactic parser a learned lexicon from words to predicates a learned set of composition rules Assumption: unambiguous CFG of LF is known Haruki Kirigaya 2016.09.30 68 / 86

Using an Syntactic Parse(Ge and Mooney, 2009) Parse from bottom-up:

Using an Syntactic Parse(Ge and Mooney, 2009) Not all semantic
sub-tree strictly follows the syntactic derivation. Introduce macro-predicates when children MRs can’t combine. become as an argument if the child MR is complete otherwise become as part of the predicate Haruki Kirigaya 2016.09.30 68 / 86

Using an Syntactic Parse(Ge and Mooney, 2009) Learning: lexicon is
learned with GIZA++ (like Wong and Mooney, 2006) if a predicate is not aligned to any word, the predicate is inferred and just bound to their values in MR if a predicate is aligned to several word, split it to several alignments Haruki Kirigaya 2016.09.30 69 / 86

Using an Syntactic Parse(Ge and Mooney, 2009) Learning: lexicon is
learned with GIZA++ (like Wong and Mooney, 2006) if a predicate is not aligned to any word, the predicate is inferred and just bound to their values in MR if a predicate is aligned to several word, split it to several alignments composition rules is learned in the form Λ1.P1 + Λ2.P2 ⇒ {Λp.Pp, R} λa1λa2 .P PLAYER + P UNUM ⇒ {λa1.P PLAYER, a2 = c2} disambiguation model: max-ent with L-BFGS Pr(D | S, T; ¯ θ) = exp i θi fi (D) Z(S, T) Haruki Kirigaya 2016.09.30 69 / 86

Transforming Dep. Parse (Reddy et al., 2016) Parsing: binarize dep
parse to S-expr. substitution symbols with λ-expr hierarchical composition(beta-reduction) Haruki Kirigaya 2016.09.30 70 / 86

Transforming Dep. Parse (Reddy et al., 2016) Parsing: Follow Reddy
et al. 2014 to transform to grounded graph. New operators to deal with mismatch between dep and semantic parse: CONTRACT: merge some nodes and edges into a single node EXPAND: add edges for some disjoint nodes (errors in depparse) Haruki Kirigaya 2016.09.30 70 / 86

Other works for SynParse to SemParse Incremental Parser for AMR
(Damonte et al. 2016) A greedy transition-based parser for AMR, inspired by former work of transition-based syntactic parser ArcEager (Nivre 2004, 2008), using an existing dep-parser. Imitation learning for AMR(Goodman et al., 2016) A transition-based parser with imitation learning, extended by techniques like noise reduction and targeted exploration, using an existing dep-parser. Haruki Kirigaya 2016.09.30 71 / 86

SP from World’s Response(Clarke et al., 2010) Parsing: ˆ z
= Fw (x) = arg maxy∈Y ,z∈Z wT Φ(x, y, z) Learning: Haruki Kirigaya 2016.09.30 73 / 86

SP from World’s Response(Clarke et al., 2010) Parsing: In order
to adapt to unseen inputs, consider the entire meaning space instead of rule extraction from training data. Fw (x) = arg maxy,z wT Φ(x, y, z) = arg maxα,β c∈X s∈D αcs · wT Φ1 + c,d∈X s,t∈D βcs,dt · wT Φ2 Haruki Kirigaya 2016.09.30 73 / 86

to adapt to unseen inputs, consider the entire meaning space instead of rule extraction from training data. Fw (x) = arg maxy,z wT Φ(x, y, z) = arg maxα,β c∈X s∈D αcs · wT Φ1 + c,d∈X s,t∈D βcs,dt · wT Φ2 α: word span c aligned with symbol s β: word span d aligned with t, when α is activated such that A consituent is associated with 1 symbol beta(cs,dt) activated iﬀ. alpha(cs) and alpha(dt) activated beta(cs,dt) activated then s is a function and (s, t) is type-consistent functional composition is directional and acyclic Haruki Kirigaya 2016.09.30 73 / 86

to adapt to unseen inputs, consider the entire meaning space instead of rule extraction from training data. Fw (x) = arg maxy,z wT Φ(x, y, z) = arg maxα,β c∈X s∈D αcs · wT Φ1 + c,d∈X s,t∈D βcs,dt · wT Φ2 Features Used: 1st-order: stemmed word match 1st-order: similarity based on WordNet (Do et al. 2010) 2nd-order: normalized distance of the head words in c and d for beta(cs, dt) on the dependency tree of sentence 2nd-order: symbol concurrence frequency (regardless of alignments) Haruki Kirigaya 2016.09.30 73 / 86

Confidence-driven Unsupervised SP (Goldwasser 2011) What is “confidence-driven unsupervised method”:
Idea: if a pattern is produced multiple times from non-random model, it is likely to be an indication of an underlying phenomenon in the data. Confidence: Output structures close to the center of statistic mass will receive a high confidence score. Confidence-driven: the model will be significantly improved compared with using only prediction score wT Φ(x, y, z) Parsing is the same with Clarke et al. 2010, formulated as an ILP problem. Haruki Kirigaya 2016.09.30 74 / 86

Confidence-driven Unsupervised SP (Goldwasser 2011) Confidence: (1). translation model unigram
p(z | x) = |z| i=1 p(si | y(si )) bigram p(z | x) = |z| i=1 p(si−1(si ) | y(si−1), y(si )) (2). structural proportion Prop(x, z): proportion of #pred in z and #words in x AvProp(S): Average over sets PropScore(S, (x,z)) = AvProp(S) - Prop(x, z) combined: use (2) to filter candidates and (1) to rank items Haruki Kirigaya 2016.09.30 74 / 86

Grounded Unsupervised SP(Poon, 2013) Parsing Idea: Annotate states to nodes
and edges of a dep-parse No need to train speciﬁc tokens: datetime, numerics, logical op. States: “states” are from DB schema: entity / attribute complex states are included for mismatching of depparse & semantic Inference: z∗ = arg max z Pθ(d, z) Pθ(d, z) = 1 Z exp i fi (d, z) · wi (d, z) Learning: θ∗ = arg max d∈D log z Pθ(d, z) Haruki Kirigaya 2016.09.30 75 / 86

Grounded Unsupervised SP(Poon, 2013) get ﬂight from toronto to san
diego stopping in dtw. Haruki Kirigaya 2016.09.30 75 / 86

SP on Freebase and QA Cai and Yates, 2013a, 2013b
They discuss methods to use an existing parser in new domain. Berant et al., 2013 Collect WebQuestions dataset. Yih et al., 2014 A CNN-based Semantic Model (CNNSM) Yih et al., 2015 Staged Query Graph Generation. Find a core inferential chain executed on Freebase. Pasupat and Liang, 2015 SP on semi-structured tables. Propose a dataset of tables and adopt a method to convert tables to knowledge graph ﬁrst. Haruki Kirigaya 2016.09.30 76 / 86

Paraphrase Comparision (Berant and Liang, 2014): paraphrase independent with KB,
using two kinds of templates (Chen et al. 2016): direct paraphrase, using wiktionary Haruki Kirigaya 2016.09.30 78 / 86

SP via Paraphrasing(Berant and Liang, 2014) pθ(c, z | x)
= 1 Z exp(φ(x, c, z)T θ) φ(x, c, z)T θ = φpr (x, c)T θpr + φlf (x, z)T φlf Haruki Kirigaya 2016.09.30 79 / 86

Building SP Overnight (Wang et al. 2015) Build an SP
for a new Domain (8 published datasets): Human write a lexicon The domain-general grammar G induces canonical LFs and utterances Crowdsourcing to rewrite the awkward utterances into ﬂuent ones Train a parser using the grammar G, by paraphrasing Haruki Kirigaya 2016.09.30 80 / 86

Sequence-based SP (Xiao et al., 2016) Vinyals et al. 2015
proved successful to use sequence model on grammar parsing. Xiao et al. compaired various sequence forms based on SPO(Wang et al. 2015). LF as raw token sequence DSP (Derivation Sequence Prediction) DSP-C (Constrained), use grammar to constrain the next rule at testing time DSP-CL (Constrained Loss), p (yt) is normalized only over possible values CFP (Canonical Form Prediction), predict CF instead which is then parsed to LF Haruki Kirigaya 2016.09.30 82 / 86

Sequence-based SP (Xiao et al., 2016) Haruki Kirigaya 2016.09.30 82
/ 86

Parsing with Neural Attention(Dong and Lapata, 2016) The Seq2Tree model
that also learns latent grammar. Haruki Kirigaya 2016.09.30 83 / 86

Result Comparison AMR Parsing(F1: 70 Goodman et al. 2016) and
WebQuestions(F1: 52.5 Yih et al. 2015) Haruki Kirigaya 2016.09.30 84 / 86

85 / 86

Summary Problems: LF not Well-Formed particularly in Neural SP methods,
use existing grammar or learned grammar. Ontology Mismatching paraphrasing or other two-phase parsing. Utterance Explosion prefer expansion from meaning space over rule extraction from training data. Isomorphism between NL(or Syntactic Parse) and Semantic Parse add relaxing extensions because NL isn’t strict and syn-parse may introduce errors. Haruki Kirigaya 2016.09.30 86 / 86

Semantic Parsing Methods

Semantic Parsing Methods

More Decks by Haruki Kirigaya

Other Decks in Research

Featured

Transcript