sentences, NLP researchers solve it in various granularities. These tasks differ in the amount of information they use. Information Extraction (less informative) is a(Obama, PRESIDENT) Haruki Kirigaya 2016.09.30 4 / 86
sentences, NLP researchers solve it in various granularities. These tasks differ in the amount of information they use. Information Extraction (less informative) is a(Obama, PRESIDENT) Summarization (modestly informative) Obama wins. Haruki Kirigaya 2016.09.30 4 / 86
sentences, NLP researchers solve it in various granularities. These tasks differ in the amount of information they use. Information Extraction (less informative) is a(Obama, PRESIDENT) Summarization (modestly informative) Obama wins. Semantic Parsing (exact matching) ∃e.beat(e) ∧ Sub(e, Obama) ∧ Obj(e, Romney) Haruki Kirigaya 2016.09.30 4 / 86
sentences, NLP researchers solve it in various granularities. These tasks differ in the amount of information they use. Information Extraction (less informative) is a(Obama, PRESIDENT) Summarization (modestly informative) Obama wins. Semantic Parsing (exact matching) ∃e.beat(e) ∧ Sub(e, Obama) ∧ Obj(e, Romney) Caveat Semantic here is more of composition than telling apart from word senses. Haruki Kirigaya 2016.09.30 4 / 86
to find an f such that f : Sentence → LogicForm Generally, there are 3 aspects a semantic parser need take into consideration: Modelling: how to represent a logic form Parsing: design a grammar and parsing algorithm Learning: use supervision to fix parameters Haruki Kirigaya 2016.09.30 5 / 86
Brutus stabs Caesar with a knife. stab(Brutus, Caesar, knife) n-ary predicate Brutus stabs Caesar in the agora. stab(Brutus, Caesar, agora) ambiguous predicate Haruki Kirigaya 2016.09.30 7 / 86
Brutus stabs Caesar with a knife. stab(Brutus, Caesar, knife) n-ary predicate Brutus stabs Caesar in the agora. stab(Brutus, Caesar, agora) ambiguous predicate Brutus stabs Caesar in the agora with a knife. stab(Brutus, Caesar) & with(knife) & in(agora) move adjunct apart Haruki Kirigaya 2016.09.30 7 / 86
with a knife. stab(Brutus, Caesar) & with(knife) & in(agora) Brutus stabs Caesar with a knife in the agora and twisted it hard. stab(Brutus, Caesar) & with(knife) & in(agora) & twist(Brutus, knife) & hard Haruki Kirigaya 2016.09.30 8 / 86
with a knife. stab(Brutus, Caesar) & with(knife) & in(agora) Brutus stabs Caesar with a knife in the agora and twisted it hard. stab(Brutus, Caesar) & with(knife) & in(agora) & twist(Brutus, knife) & hard The standard predicate calculus has problems. unable to refer to predicates natural language are flexible in the number of arguments Pass the axe. Pass me the axe. Haruki Kirigaya 2016.09.30 8 / 86
an event beforehand, thus we existentially quantify it. Brutus stabs Caesar with a knife in the agora and twisted it hard. ∃e.stab(e, Brutus, Caesar) ∧ with(e, knife) ∧ in(e, agora) ∧(∃e .twist(e , Brutus, knife) ∧ hard(e )) Caesar is stabbed. ∃x∃e.stab(e, x, Caesar) Missing arguments are left with placeholders. Haruki Kirigaya 2016.09.30 9 / 86
a dream last night, I was stabbed, although in fact nobody had stabbed me and I wasn’t stabbed with anything. There’s NOBODY here to initiate the stab event. The representation should correspond to the utterance rather than reality? Haruki Kirigaya 2016.09.30 10 / 86
conjuncts. Basically, two roles are important: Agent, Thematic/Patient. Brutus stabbed Caesar in the back with a knife ∃e.stab(e) ∧ Agent(e, Brutus) ∧ Patient(e, Caesar) ∧with(e, knife) ∧ in(e, agora) Haruki Kirigaya 2016.09.30 11 / 86
following sentences A. Brutus stabbed Caesar in the back with a knife. B. Brutus stabbed Caesar in the back. C. Brutus stabbed Caesar with a knife. We know A → B ∨ C but NOT B ∨ C → A. Haruki Kirigaya 2016.09.30 12 / 86
following sentences A. Brutus stabbed Caesar in the back with a knife. B. Brutus stabbed Caesar in the back. C. Brutus stabbed Caesar with a knife. We know A → B ∨ C but NOT B ∨ C → A. Using neo-Davidsonian representation preserves this phenomenon. Let Agt = Agent, B = Brutus, C = Caesar, Pat = Patient, then. A. ∃e.stab(e) ∧ Agt(e, B) ∧ Pat(e, C) ∧ in(e, back) ∧ with(e, knife) B. ∃e.stab(e) ∧ Agt(e, B) ∧ Pat(e, C) ∧ in(e, back) C. ∃e.stab(e) ∧ Agt(e, B) ∧ Pat(e, C) ∧ with(e, knife) Haruki Kirigaya 2016.09.30 12 / 86
to connect an adjunct and a verb. x stabbed y violently with z There’re two logically equative representations with different scope settings: (with z (violently (stab (y)))) (x) (violently (with z (stab (y)))) (x) But a flat representation like the neo-Davidsonian keeps meaning consistent and doesn’t introduce explicit syntactic scope. The slides will talk about flat and scope later. Haruki Kirigaya 2016.09.30 13 / 86
saw Brutus stabbed Caesar. Traditional way: Mary saw Brutus & Brutus stabbed Caesar. neo-Davidsonian way ∃e.see(e) ∧ Agt(e, Mary) ∧ (∃e .stab(e ) ∧ Agt(e , Brutus) ∧Pat(e, e ))) After the singing of national anthem, they saluted the flag. After the national anthem was sung, they saluted the flag. ∃e.salute(e) ∧ Agt(e, they) ∧ Pat(e, flag) ∧(∃e .sing(e ) ∧ Agt(e , they) ∧ Pat(e, NationalAnthem) ∧after(e, e )) Haruki Kirigaya 2016.09.30 14 / 86
for $50,000. Which is the patient, car or $50,000? I sold a car for Mary for $50,000. the same preposition with different meanings Haruki Kirigaya 2016.09.30 15 / 86
for $50,000. Which is the patient, car or $50,000? I sold a car for Mary for $50,000. the same preposition with different meanings Mary fed her baby. Can the baby, who is feeding, be the agent? Haruki Kirigaya 2016.09.30 15 / 86
for $50,000. Which is the patient, car or $50,000? I sold a car for Mary for $50,000. the same preposition with different meanings Mary fed her baby. Can the baby, who is feeding, be the agent? Brutus stabbed Caesar with a knife. The removal of Brutus may be different from that of knife. Haruki Kirigaya 2016.09.30 15 / 86
for $50,000. Which is the patient, car or $50,000? I sold a car for Mary for $50,000. the same preposition with different meanings Mary fed her baby. Can the baby, who is feeding, be the agent? Brutus stabbed Caesar with a knife. The removal of Brutus may be different from that of knife. Brutus stabbed Caesar once. It’s hard to specify the event happens only once in neo-Davidsonian. Haruki Kirigaya 2016.09.30 15 / 86
for $50,000. Which is the patient, car or $50,000? I sold a car for Mary for $50,000. the same preposition with different meanings Mary fed her baby. Can the baby, who is feeding, be the agent? Brutus stabbed Caesar with a knife. The removal of Brutus may be different from that of knife. Brutus stabbed Caesar once. It’s hard to specify the event happens only once in neo-Davidsonian. A saw B leave. When B left, he had the documents in his briefcase. = A saw B leave with the documents in his briefcase. If both leave events are the same, to make the inference work, how could A see one one without seeing another? Haruki Kirigaya 2016.09.30 15 / 86
representating semantic. Some of them are advantages while others are trival choices from various approaches. uses variables and is flat. event-style. An event is unique in time of occurrence. event arguments moved into roles and independent conjuncts. modifiers(adjectives, adverbs, adjuncts) are conjunct predicates transparent scope facilitate logical inference Haruki Kirigaya 2016.09.30 16 / 86
framework, serving as the basis of English Resource Semantic (ERS) or English Resource Grammar (ERG). Expressive Adequacy: ability to express meaning correctly Grammatical Compatibility: ability to link representations to grammatical information. Computation Tractability: ability to compare two representations (equality, relation, etc.) underspecifiability: leave semantic distinctions unresolved Haruki Kirigaya 2016.09.30 18 / 86
structural representation is hard to use and unnecessary. Examples Sentence: white English horse Rule: white(horse)(x) ↔ Schimmel(x) Form: white(English(horse)) (x) Examples Sentence: The beginning of spring arrived. Rule: beginning of spring ↔ Fr¨ uhlingsanfang Form 1: def q(x, spring(x), the(y, beginning(y, x), arrive(y))) Form 2: the(y, def q(x, spring(x), beginning(y, x), arrive(y))) Haruki Kirigaya 2016.09.30 20 / 86
of a sentence. Every dog chases some white cat. Leave some handles unspecified. Then specify it later: h0 = h1, h3 = h5, h7 = h4 constraints, h3 = h7 to make it still a tree qeq constraint, h0 =q h5 is a trival example Haruki Kirigaya 2016.09.30 22 / 86
that is rooted, directed and labeled graph is identical for different utterance uses variables for co-reference uses PropBank frame (analogous to roles in neo-Davidsonian) designs non-core relations out of PropBank (analogous to adjuncts in neo-Davidsonian) Specification: https://github.com/amrisi/amr-guidelines/blob/master/amr.md Haruki Kirigaya 2016.09.30 26 / 86
the back in the agora and twisted it hard. (s / stab :ARG0 (p / person :name (n / name :op1 "Brutus") :ARG0-of (t / twist :ARG1 k :manner (h / hard))) :ARG1 (p2 / person :name (n2 / name :op1 "Caesar")) :ARG2 (k / knife) :ARG3 (b / back) :location (a / agora)) Haruki Kirigaya 2016.09.30 27 / 86
destruction of the city by the God (d / destroy-01 :ARG0 (g / God) :ARG1 (c / city)) Examples the bond investor (p / person :ARG0-of (i / invest-01 :ARG1 (b / bond))) but professor doesn’t yield an event frame Haruki Kirigaya 2016.09.30 28 / 86
destruction of the city by the God (d / destroy-01 :ARG0 (g / God) :ARG1 (c / city)) Examples the bond investor (p / person :ARG0-of (i / invest-01 :ARG1 (b / bond))) Adjective Examples the attractive spy (s / spy :ARG0-of (a / attract-01)) Haruki Kirigaya 2016.09.30 28 / 86
relation in AMR must serve as a role for the relation, rather than for any object participating in that relation. Examples the marble in the jar (m / marble :location (j / jar)) the marble is not in the jar (b / be-located-at-91 :ARG1 (m / marble) :ARG2 (j / jar) :polarity -) Semantic Error (m / marble :location (j / jar :polarity -)) which reads the marble is in the non-jar Haruki Kirigaya 2016.09.30 29 / 86
relations for language phenomenons. negation and modals interrogation and wh-questions named entities location source, destination, path cause, concession, condition quantities, date, time link with wikipedia article :wiki “Barack Obama” . . . Haruki Kirigaya 2016.09.30 30 / 86
The Little Prince Chinese Version, 1274:145:143 Bio AMR Corpus from PubMed (cancer) articles, 5452:500:500 LDC Corpus General Release 1.0 (June 2014), 13051 in all, a new general release is due in summer of 2016 2. Evaluation: smatch metric, comparison of two AMR 3. SemEval-2017 Task 9: Parsing and Generation English Biomedical Data to AMR (SemEval-2016 Task 8) AMR to English Generation 4. A python parser: https://github.com/nschneid/amr-hackathon Haruki Kirigaya 2016.09.30 31 / 86
are new methods while others borrow ideas from other domains or tasks to do semantic parsing exactly. Shift-Reduce (LR) (1993) Combinatory Categorial Grammar (2005) Word Alignment (Synchronized CFG) (2006) Generative Model (2008) Syntactic Parse to Semantic Parse (2009) Weak Supervision and Unsupervised Methods (2010) Large-scale SP for Freebase and QA (2013) Paraphrase-driven SP (2014) Neural Semantic Parsing (2015) Haruki Kirigaya 2016.09.30 35 / 86
for Language Learning) Find Generalization: merge clauses not cover any negative sample. Reduce Definition: prefer new clause to prove positive examples Haruki Kirigaya 2016.09.30 38 / 86
lexicon whose element is a pair of word and a category: borders := (S\NP)/NP : λx.λy.borders(y, x) word: borders syntactic type: (S\NP)/NP semantic type: λx.λy.borders(y, x) Haruki Kirigaya 2016.09.30 41 / 86
forward and backward application A / B : f + B : x ⇒ A : f(x) B : x + A \ B : f ⇒ A : f(x) forward and backword composition A / B : f + B / C : g ⇒ A / C : f ◦ g A \ B : f + B \ C : g ⇒ A \ C : f ◦ g type raising X ⇒ T/(T\X) Haruki Kirigaya 2016.09.30 41 / 86
Given the lexicon and model parameter, CCG is formulated as a log-linear probablistic model to deal with ambiguity, e.g. duplicated lexicon entries for a word, and spurious ambiguity: P(L, T | S; ¯ θ) = exp(¯ f (L, T, S) · ¯ θ) (L,T) exp(¯ f (L, T, S) · ¯ θ) And we can do inference on the model: L = arg max L P(L | S; ¯ θ) = arg max L T P(L, T | S; ¯ θ) Features are designed as local and thus we can use dynamic programming (beam-search acturally) and prune the search space (like CKY-style). Haruki Kirigaya 2016.09.30 42 / 86
be insufficient if the rules don’t cover all the (S, L) pairs. Examples Through which states does the Mississippi run. GENLEX doesn’t trigger a category suitable for the through-adjunct placed ahead. Namely, phrase order may be relaxed. Haruki Kirigaya 2016.09.30 45 / 86
relaxed function composition role-hypothesising type shifting (for missing predicates) null-head type shifting (for missing arguments) crossed functional composition Triggers are added for these new rules, too. Haruki Kirigaya 2016.09.30 46 / 86
wikipedia: Unification is an algorithmic process of solving equations between symbolic expressions. e.g. {cons(x, cons(x, nil)) = cons(2, y)} ⇒ {x → 2, y → cons(2, nil)} Here unification aims to find f and g given h, s.t. h = λx.f (g(x)) or h = f (g). Haruki Kirigaya 2016.09.30 49 / 86
wikipedia: Unification is an algorithmic process of solving equations between symbolic expressions. e.g. {cons(x, cons(x, nil)) = cons(2, y)} ⇒ {x → 2, y → cons(2, nil)} Here unification aims to find f and g given h, s.t. h = λx.f (g(x)) or h = f (g). For example, the given initial lexical entry New York borders Vermont S : next to(ny, vt) will be splitted as New York borders S/NP : λx.next to(ny, vt) Vermont NP : vt Haruki Kirigaya 2016.09.30 49 / 86
PCCG P(y, z | x; θ, Λ) = exp(θ · φ(x, y, z)) Z(y , z ) f (x) = arg max z p(z | x; θ, Λ) p(z | x; θ, Λ) = y p(y, z | x; θ, Λ) Again, to compute the parse efficiently, CKY-style parsing with dynamic programming summing over y with inside-outside algorithm Haruki Kirigaya 2016.09.30 50 / 86
NEW-LEX will consider whether to split the lexical entries and gives new lexicon from arg maxy∗ p(y∗ | xi , zi ; θ , Λ ) Haruki Kirigaya 2016.09.30 50 / 86
entry: Step 1, function New York borders Vermont S : next to(ny, vt) unification constraints (otherwise infinite-result): No vacuous variables: g = λx.tex limited coordination extraction: g contains less than N adjuncts limited application: f contains no new variables for non-variable subexpression in h like h = λx.in(x, tex) f → λq.q(tex) g → λyλx.in(x, y) Haruki Kirigaya 2016.09.30 51 / 86
entry: Step 1, function New York borders Vermont S : next to(ny, vt) unification constraints (otherwise infinite-result): No vacuous variables: g = λx.tex limited coordination extraction: g contains less than N adjuncts limited application: f contains no new variables for non-variable subexpression in h like h = λx.in(x, tex) f → λq.q(tex) g → λyλx.in(x, y) we can get many (f, g) pairs, among which there is: f → λx.next to(ny, x) g → vt Haruki Kirigaya 2016.09.30 51 / 86
entry: Step 2, syntactic type New York borders Vermont S : next to(ny, vt) According to CCG combinatory rules(only 4 here), define SC (A) = {FA(A) ∪ BA(A) ∪ FC(A) ∪ BC(A)} FA(X : h) = {(X/Y : f , Y : g) | h = f (g) ∧ Y = C(T(g))} BA(X : h) = {(Y : g, X\Y : f ) | h = f (g) ∧ Y = C(T(g))} FC(X/Y : h) = {(X/W : f , W /Y : g) | h = λx.f (g(x)) ∧ W = C(T(g(x))) BC(X\Y : h) = {(W \Y : f , X\W : g) | h = λx.f (g(x)) ∧ W = C(T(g(x))) where T : F → {e, t, F} is the type function and C is defined as C(T) = NP if T = e S if T = t C(T2)|C(T1) if T = T1, T2 Haruki Kirigaya 2016.09.30 52 / 86
entry: Step 2, syntactic type New York borders Vermont S : next to(ny, vt) These are some possible pair from the splitting set. Semantic: (λx.next to(ny, x), vt) Syntactic: (S/NP, NP) Semantic: (ny, λx.next to(x, vt)) Syntactic: (NP, S\NP) Semantic: (λx.next to(x, vt), ny) Syntactic: (S/NP, NP) Haruki Kirigaya 2016.09.30 52 / 86
entry: Step 3, word sequence New York borders Vermont S : next to(ny, vt) Splitting is defined as SL(w0:n A) = {(w0:i B, wi+1:n C) | 0 ≤ i < n ∧ (B, C) ∈ SC (A)} Haruki Kirigaya 2016.09.30 53 / 86
entry: Step 3, word sequence New York borders Vermont S : next to(ny, vt) Splitting is defined as SL(w0:n A) = {(w0:i B, wi+1:n C) | 0 ≤ i < n ∧ (B, C) ∈ SC (A)} For some specific i, the previous splits may raise problems. (S/NP : λx.next to(ny, x), NP : vt) Sequence: (New York borders, Vermont) (NP : ny, S\NP : λx.next to(x, vt)) Sequence: (New York, borders Vermont) (S/NP : λx.next to(x, vt), NP : ny) Sequence: (borders Vermont, New York) incorrect Haruki Kirigaya 2016.09.30 53 / 86
entry: Boston N/N : λf λx.from(x, bos) ∧ f (x) Factored Parts: lexeme, pair of a word span and a constant list: (Boston, [from, bos]) template, λ(w, v).(w N/N : λf λx.v1(x, v2) ∧ f (x)) Two type of factorization: 1 maximal factor: all constants are in lexeme (Boston, [from, bos]), λ(w, v).(w N/N : λf λx.v1(x, v2) ∧ f (x)) 2 partial factor: some constants remain in the template (Boston, [bos]), λ(w, v).(w N/N : λf λx.from(x, v1) ∧ f (x)) Partial factor is used for missing words: flights Boston to New York Haruki Kirigaya 2016.09.30 55 / 86
Learning a parser is easy for it. a few predicates a few utterances (more than predicate) If a database has more predicates and thus more capable to answer more questions in theory, the amount of possible utterance can go even further. Haruki Kirigaya 2016.09.30 57 / 86
Learning a parser is easy for it. a few predicates a few utterances (more than predicate) If a database has more predicates and thus more capable to answer more questions in theory, the amount of possible utterance can go even further. What’s worse, new utterances linguistically involve more predicates in theory, but database schema is fixed and supports only limited predicates. Haruki Kirigaya 2016.09.30 57 / 86
Learning a parser is easy for it. a few predicates a few utterances (more than predicate) If a database has more predicates and thus more capable to answer more questions in theory, the amount of possible utterance can go even further. What’s worse, new utterances linguistically involve more predicates in theory, but database schema is fixed and supports only limited predicates. parse to more predicates: unusable on databases parse to fit the schema: difficult to learn Haruki Kirigaya 2016.09.30 57 / 86
parsing use a domain-independent CCG parser(Clark & Curran, 2007) to convert the utterance to underspecified LF, with a hand-written lexicon 59 lexical categories with POS tags, and assign to words based on POS tags from Wiktionary. 49 domain-independent lexical items (what, when, and, is, etc.) Ontological Matching Use a series of matching operations M = o1, o2, · · · Structural Match: Collapse Operator, Expansion Operator Constant Matching (replace in the same type) Haruki Kirigaya 2016.09.30 59 / 86
of Semantic Parsers for Mapping Instructions to Actions Modeling robot instruction and get feedback from robot action. Reddy et al., TACL2014, Large-scale Semantic Parsing without Question-Answer Pairs Use ClueWeb09 and FACC1, and general CCG parser to build a LF, which is then converted to an ungrounded graph sharing commonalities with Freebase. Artzi et al., EMNLP2015, Broad-coverage CCG Semantic Parsing with AMR To deal with co-reference in AMR, use Skolem Terms (Steedman, 2011) to build an underspecified LF, which is then mapped to specified LF. Haruki Kirigaya 2016.09.30 60 / 86
that gives e, f f ∗ = m(arg max d∈D(G|e) Pr(d | e; λ)) Pr λ (d | e) = 1 Zλ(e) exp i λi fi (d) Features (3000 or so): number of each rule r is used in derivation number of each word w generated from gaps Parameter Estimation: Viterbi algorithm is used for efficient decoding. Since gold derivation is latent, EM is used to find the optimal parameters. Haruki Kirigaya 2016.09.30 62 / 86
Step 1, suppose we know the grammar, LF can be convert to a production (rule) sequence of the left-most top-down derivation (by convention). We prefer derivation sequence over LF because MR may be not well-formed without grammar and MR tokens can be polysemy or carry no specific meaning. Examples (Sentence and LF from RoboCup) ((bowner our {4}) (do our {6} (pos (left (half our))))) If our player 4 has the ball, then our player 6 should stay in the left side of our half. Haruki Kirigaya 2016.09.30 63 / 86
Step 3, extract bottom up, starting with rules where RHS contains only terminals, then those whose RHS contains non-terminals. UNUM → 4, 4 TEAM → our, our CONTITION → TEAM player UNUM has {1} ball , (bowner TEAM {UNUM}) Haruki Kirigaya 2016.09.30 63 / 86
Step 3 special case: rule that doesn’t derive any terminal, and that break links outside this sub-parse tree. Merge the rules as: REGION → (left (penalty-area TEAM) For excessively merged rules (overfitting), try a greedy link-removal policy (alignment fixing). Haruki Kirigaya 2016.09.30 63 / 86
for semantic function instead. To improve NL-MR isomorphism, find a MST on a graph where edges between rules are established for any shared variable and weighted by the minimal word distance. Haruki Kirigaya 2016.09.30 64 / 86
existing syntactic parser a learned lexicon from words to predicates a learned set of composition rules Assumption: unambiguous CFG of LF is known Haruki Kirigaya 2016.09.30 68 / 86
sub-tree strictly follows the syntactic derivation. Introduce macro-predicates when children MRs can’t combine. become as an argument if the child MR is complete otherwise become as part of the predicate Haruki Kirigaya 2016.09.30 68 / 86
learned with GIZA++ (like Wong and Mooney, 2006) if a predicate is not aligned to any word, the predicate is inferred and just bound to their values in MR if a predicate is aligned to several word, split it to several alignments Haruki Kirigaya 2016.09.30 69 / 86
learned with GIZA++ (like Wong and Mooney, 2006) if a predicate is not aligned to any word, the predicate is inferred and just bound to their values in MR if a predicate is aligned to several word, split it to several alignments composition rules is learned in the form Λ1.P1 + Λ2.P2 ⇒ {Λp.Pp, R} λa1λa2 .P PLAYER + P UNUM ⇒ {λa1.P PLAYER, a2 = c2} disambiguation model: max-ent with L-BFGS Pr(D | S, T; ¯ θ) = exp i θi fi (D) Z(S, T) Haruki Kirigaya 2016.09.30 69 / 86
et al. 2014 to transform to grounded graph. New operators to deal with mismatch between dep and semantic parse: CONTRACT: merge some nodes and edges into a single node EXPAND: add edges for some disjoint nodes (errors in depparse) Haruki Kirigaya 2016.09.30 70 / 86
(Damonte et al. 2016) A greedy transition-based parser for AMR, inspired by former work of transition-based syntactic parser ArcEager (Nivre 2004, 2008), using an existing dep-parser. Imitation learning for AMR(Goodman et al., 2016) A transition-based parser with imitation learning, extended by techniques like noise reduction and targeted exploration, using an existing dep-parser. Haruki Kirigaya 2016.09.30 71 / 86
to adapt to unseen inputs, consider the entire meaning space instead of rule extraction from training data. Fw (x) = arg maxy,z wT Φ(x, y, z) = arg maxα,β c∈X s∈D αcs · wT Φ1 + c,d∈X s,t∈D βcs,dt · wT Φ2 α: word span c aligned with symbol s β: word span d aligned with t, when α is activated such that A consituent is associated with 1 symbol beta(cs,dt) activated iff. alpha(cs) and alpha(dt) activated beta(cs,dt) activated then s is a function and (s, t) is type-consistent functional composition is directional and acyclic Haruki Kirigaya 2016.09.30 73 / 86
to adapt to unseen inputs, consider the entire meaning space instead of rule extraction from training data. Fw (x) = arg maxy,z wT Φ(x, y, z) = arg maxα,β c∈X s∈D αcs · wT Φ1 + c,d∈X s,t∈D βcs,dt · wT Φ2 Features Used: 1st-order: stemmed word match 1st-order: similarity based on WordNet (Do et al. 2010) 2nd-order: normalized distance of the head words in c and d for beta(cs, dt) on the dependency tree of sentence 2nd-order: symbol concurrence frequency (regardless of alignments) Haruki Kirigaya 2016.09.30 73 / 86
Idea: if a pattern is produced multiple times from non-random model, it is likely to be an indication of an underlying phenomenon in the data. Confidence: Output structures close to the center of statistic mass will receive a high confidence score. Confidence-driven: the model will be significantly improved compared with using only prediction score wT Φ(x, y, z) Parsing is the same with Clarke et al. 2010, formulated as an ILP problem. Haruki Kirigaya 2016.09.30 74 / 86
and edges of a dep-parse No need to train specific tokens: datetime, numerics, logical op. States: “states” are from DB schema: entity / attribute complex states are included for mismatching of depparse & semantic Inference: z∗ = arg max z Pθ(d, z) Pθ(d, z) = 1 Z exp i fi (d, z) · wi (d, z) Learning: θ∗ = arg max d∈D log z Pθ(d, z) Haruki Kirigaya 2016.09.30 75 / 86
They discuss methods to use an existing parser in new domain. Berant et al., 2013 Collect WebQuestions dataset. Yih et al., 2014 A CNN-based Semantic Model (CNNSM) Yih et al., 2015 Staged Query Graph Generation. Find a core inferential chain executed on Freebase. Pasupat and Liang, 2015 SP on semi-structured tables. Propose a dataset of tables and adopt a method to convert tables to knowledge graph first. Haruki Kirigaya 2016.09.30 76 / 86
for a new Domain (8 published datasets): Human write a lexicon The domain-general grammar G induces canonical LFs and utterances Crowdsourcing to rewrite the awkward utterances into fluent ones Train a parser using the grammar G, by paraphrasing Haruki Kirigaya 2016.09.30 80 / 86
proved successful to use sequence model on grammar parsing. Xiao et al. compaired various sequence forms based on SPO(Wang et al. 2015). LF as raw token sequence DSP (Derivation Sequence Prediction) DSP-C (Constrained), use grammar to constrain the next rule at testing time DSP-CL (Constrained Loss), p (yt) is normalized only over possible values CFP (Canonical Form Prediction), predict CF instead which is then parsed to LF Haruki Kirigaya 2016.09.30 82 / 86
use existing grammar or learned grammar. Ontology Mismatching paraphrasing or other two-phase parsing. Utterance Explosion prefer expansion from meaning space over rule extraction from training data. Isomorphism between NL(or Syntactic Parse) and Semantic Parse add relaxing extensions because NL isn’t strict and syn-parse may introduce errors. Haruki Kirigaya 2016.09.30 86 / 86