Semantic Parsing Methods

Slide 1

Slide 1 text

Semantic Parsing Methods An Overview Haruki Kirigaya 2016.09.30 Haruki Kirigaya 2016.09.30 1 / 86

Slide 2

Slide 2 text

Agenda 1 Semantics 2 Parsing 3 Summary Haruki Kirigaya 2016.09.30 2 / 86

Slide 3

Slide 3 text

Agenda 1 Semantics 2 Parsing 3 Summary Haruki Kirigaya 2016.09.30 3 / 86

Slide 4

Slide 4 text

Slide 5

Slide 5 text

Slide 6

Slide 6 text

Background When it comes to the understanding of natural language sentences, NLP researchers solve it in various granularities. These tasks diﬀer in the amount of information they use. Information Extraction (less informative) is a(Obama, PRESIDENT) Summarization (modestly informative) Obama wins. Semantic Parsing (exact matching) ∃e.beat(e) ∧ Sub(e, Obama) ∧ Obj(e, Romney) Haruki Kirigaya 2016.09.30 4 / 86

Slide 7

Slide 7 text

Slide 8

Slide 8 text

Semantic Parsing Task The key task of semantic parsing is to ﬁnd an f such that f : Sentence → LogicForm Haruki Kirigaya 2016.09.30 5 / 86

Slide 9

Slide 9 text

Semantic Parsing Task The key task of semantic parsing is to ﬁnd an f such that f : Sentence → LogicForm Generally, there are 3 aspects a semantic parser need take into consideration: Modelling: how to represent a logic form Parsing: design a grammar and parsing algorithm Learning: use supervision to ﬁx parameters Haruki Kirigaya 2016.09.30 5 / 86

Slide 10

Slide 10 text

Agenda 1 Semantics Davidsonian Representation MRS AMR 2 Parsing 3 Summary Haruki Kirigaya 2016.09.30 6 / 86

Slide 11

Slide 11 text

Logic Form from Example Haruki Kirigaya 2016.09.30 7 / 86

Slide 12

Slide 12 text

Logic Form from Example Brutus stabs Caesar. stab(Brutus, Caesar) predicate Haruki Kirigaya 2016.09.30 7 / 86

Slide 13

Slide 13 text

Logic Form from Example Brutus stabs Caesar. stab(Brutus, Caesar) predicate Brutus stabs Caesar with a knife. stab(Brutus, Caesar, knife) n-ary predicate Haruki Kirigaya 2016.09.30 7 / 86

Slide 14

Slide 14 text

Logic Form from Example Brutus stabs Caesar. stab(Brutus, Caesar) predicate Brutus stabs Caesar with a knife. stab(Brutus, Caesar, knife) n-ary predicate Brutus stabs Caesar in the agora. stab(Brutus, Caesar, agora) ambiguous predicate Haruki Kirigaya 2016.09.30 7 / 86

Slide 15

Slide 15 text

Slide 16

Slide 16 text

Logic Form from Example Brutus stabs Caesar in the agora with a knife. stab(Brutus, Caesar) & with(knife) & in(agora) Haruki Kirigaya 2016.09.30 8 / 86

Slide 17

Slide 17 text

Slide 18

Slide 18 text

Logic Form from Example Brutus stabs Caesar in the agora with a knife. stab(Brutus, Caesar) & with(knife) & in(agora) Brutus stabs Caesar with a knife in the agora and twisted it hard. stab(Brutus, Caesar) & with(knife) & in(agora) & twist(Brutus, knife) & hard The standard predicate calculus has problems. unable to refer to predicates natural language are ﬂexible in the number of arguments Pass the axe. Pass me the axe. Haruki Kirigaya 2016.09.30 8 / 86

Slide 19

Slide 19 text

Davidsonian Representation Semantic is characterized in events. We don’t know an event beforehand, thus we existentially quantify it. Brutus stabs Caesar with a knife in the agora and twisted it hard. ∃e.stab(e, Brutus, Caesar) ∧ with(e, knife) ∧ in(e, agora) ∧(∃e .twist(e , Brutus, knife) ∧ hard(e )) Caesar is stabbed. ∃x∃e.stab(e, x, Caesar) Missing arguments are left with placeholders. Haruki Kirigaya 2016.09.30 9 / 86

Slide 20

Slide 20 text

Problem in Davidsonian Way Consider the following sentence: Examples In a dream last night, I was stabbed, although in fact nobody had stabbed me and I wasn’t stabbed with anything. There’s NOBODY here to initiate the stab event. The representation should correspond to the utterance rather than reality? Haruki Kirigaya 2016.09.30 10 / 86

Slide 21

Slide 21 text

neo-Davidsonian Representation (Parson, 1995) Replace arguments (and placeholders) with independent conjuncts. Haruki Kirigaya 2016.09.30 11 / 86

Slide 22

Slide 22 text

Slide 23

Slide 23 text

neo-Davidsonian Representation (Parson, 1995) Replace arguments (and placeholders) with independent conjuncts. Basically, two roles are important: Agent, Thematic/Patient. Brutus stabbed Caesar in the back with a knife ∃e.stab(e) ∧ Agent(e, Brutus) ∧ Patient(e, Caesar) ∧with(e, knife) ∧ in(e, agora) Haruki Kirigaya 2016.09.30 11 / 86

Slide 24

Slide 24 text

Slide 25

Slide 25 text

Advantages of the neo-Davidsonian (Palmer, 2014) (1) Entailment Given the following sentences A. Brutus stabbed Caesar in the back with a knife. B. Brutus stabbed Caesar in the back. C. Brutus stabbed Caesar with a knife. We know A → B ∨ C but NOT B ∨ C → A. Using neo-Davidsonian representation preserves this phenomenon. Let Agt = Agent, B = Brutus, C = Caesar, Pat = Patient, then. A. ∃e.stab(e) ∧ Agt(e, B) ∧ Pat(e, C) ∧ in(e, back) ∧ with(e, knife) B. ∃e.stab(e) ∧ Agt(e, B) ∧ Pat(e, C) ∧ in(e, back) C. ∃e.stab(e) ∧ Agt(e, B) ∧ Pat(e, C) ∧ with(e, knife) Haruki Kirigaya 2016.09.30 12 / 86

Slide 26

Slide 26 text

Advantages of the neo-Davidsonian (2) Scope Traditional way uses scope to connect an adjunct and a verb. x stabbed y violently with z There’re two logically equative representations with different scope settings: (with z (violently (stab (y)))) (x) (violently (with z (stab (y)))) (x) But a flat representation like the neo-Davidsonian keeps meaning consistent and doesn’t introduce explicit syntactic scope. The slides will talk about flat and scope later. Haruki Kirigaya 2016.09.30 13 / 86

Slide 27

Slide 27 text

Advantages of the neo-Davidsonian (3) Temporal and Causal Sentences Mary saw Brutus stabbed Caesar. Traditional way: Mary saw Brutus & Brutus stabbed Caesar. neo-Davidsonian way ∃e.see(e) ∧ Agt(e, Mary) ∧ (∃e .stab(e ) ∧ Agt(e , Brutus) ∧Pat(e, e ))) After the singing of national anthem, they saluted the flag. After the national anthem was sung, they saluted the flag. ∃e.salute(e) ∧ Agt(e, they) ∧ Pat(e, flag) ∧(∃e .sing(e ) ∧ Agt(e , they) ∧ Pat(e, NationalAnthem) ∧after(e, e )) Haruki Kirigaya 2016.09.30 14 / 86

Slide 28

Slide 28 text

Possible Problems of the neo-Davidsonian I sold him a car for $50,000. Which is the patient, car or $50,000? Haruki Kirigaya 2016.09.30 15 / 86

Slide 29

Slide 29 text

Slide 30

Slide 30 text

Slide 31

Slide 31 text

Slide 32

Slide 32 text

Slide 33

Slide 33 text

Possible Problems of the neo-Davidsonian I sold him a car for $50,000. Which is the patient, car or $50,000? I sold a car for Mary for $50,000. the same preposition with diﬀerent meanings Mary fed her baby. Can the baby, who is feeding, be the agent? Brutus stabbed Caesar with a knife. The removal of Brutus may be diﬀerent from that of knife. Brutus stabbed Caesar once. It’s hard to specify the event happens only once in neo-Davidsonian. A saw B leave. When B left, he had the documents in his briefcase. = A saw B leave with the documents in his briefcase. If both leave events are the same, to make the inference work, how could A see one one without seeing another? Haruki Kirigaya 2016.09.30 15 / 86

Slide 34

Slide 34 text

Summary of the neo-Davidsonian The neo-Davidsonian have several characteristics in representating semantic. Some of them are advantages while others are trival choices from various approaches. uses variables and is ﬂat. event-style. An event is unique in time of occurrence. event arguments moved into roles and independent conjuncts. modiﬁers(adjectives, adverbs, adjuncts) are conjunct predicates transparent scope facilitate logical inference Haruki Kirigaya 2016.09.30 16 / 86

Slide 35

Slide 35 text

Agenda 1 Semantics Davidsonian Representation MRS AMR 2 Parsing 3 Summary Haruki Kirigaya 2016.09.30 17 / 86

Slide 36

Slide 36 text

Minimal Recursion Semantics (Copestake, 2005) MRS is another ﬂat semantic framework, serving as the basis of English Resource Semantic (ERS) or English Resource Grammar (ERG). Expressive Adequacy: ability to express meaning correctly Grammatical Compatibility: ability to link representations to grammatical information. Computation Tractability: ability to compare two representations (equality, relation, etc.) underspeciﬁability: leave semantic distinctions unresolved Haruki Kirigaya 2016.09.30 18 / 86

Slide 37

Slide 37 text

An MRS Example Every big white horse sleeps. h0:every(x) h1:big(x),h1:white(x),h1:horse(x) h2:sleep(x) Haruki Kirigaya 2016.09.30 19 / 86

Slide 38

Slide 38 text

Why a ﬂat form In MT or other task, a structural representation is hard to use and unnecessary. Examples Sentence: white English horse Rule: white(horse)(x) ↔ Schimmel(x) Form: white(English(horse)) (x) Examples Sentence: The beginning of spring arrived. Rule: beginning of spring ↔ Fr¨ uhlingsanfang Form 1: def q(x, spring(x), the(y, beginning(y, x), arrive(y))) Form 2: the(y, def q(x, spring(x), beginning(y, x), arrive(y))) Haruki Kirigaya 2016.09.30 20 / 86

Slide 39

Slide 39 text

Why a ﬂat form A ﬂat form is a group of elementary predicates. Examples Sentence: white English horse Rule: white(horse)(x) ↔ Schimmel(x) Form: white(x) & English(x) & horse(x) Examples Sentence: The beginning of spring arrived. Rule: beginning of spring ↔ Fr¨ uhlingsanfang Form: the(y) & beginning(y, x) & def(x) & spring(x) & arrive(e, y) Haruki Kirigaya 2016.09.30 21 / 86

Slide 40

Slide 40 text

Underspeciﬁability in MRS There may be several semantically identical representations of a sentence. Every dog chases some white cat. Haruki Kirigaya 2016.09.30 22 / 86

Slide 41

Slide 41 text

Underspeciﬁability in MRS There may be several semantically identical representations of a sentence. Every dog chases some white cat. Leave some handles unspeciﬁed. Then specify it later: h0 = h1, h3 = h5, h7 = h4 constraints, h3 = h7 to make it still a tree qeq constraint, h0 =q h5 is a trival example Haruki Kirigaya 2016.09.30 22 / 86

Slide 42

Slide 42 text

MRS formally in a whole MRS is a quadruple {GT, LT, R, C} GT: global top. h0 LT: local top. h1, h4, h5 (semantic of local phrase) R: relations. h1:every(x, h2, h3), h5:dog(y, h6, h7), h4:chase(x), etc. C: constraints. h0 qeq h4, etc. Haruki Kirigaya 2016.09.30 23 / 86

Slide 43

Slide 43 text

Highlights of MRS We reify scopal relationships as handles so that syntactically the language looks ﬁrst-order. Preserve underspeciﬁability Haruki Kirigaya 2016.09.30 24 / 86

Slide 44

Slide 44 text

Agenda 1 Semantics Davidsonian Representation MRS AMR 2 Parsing 3 Summary Haruki Kirigaya 2016.09.30 25 / 86

Slide 45

Slide 45 text

Abstract Meaning Representation (Banarescu, 2013) AMR is an semantic representation that is rooted, directed and labeled graph is identical for diﬀerent utterance uses variables for co-reference uses PropBank frame (analogous to roles in neo-Davidsonian) designs non-core relations out of PropBank (analogous to adjuncts in neo-Davidsonian) Speciﬁcation: https://github.com/amrisi/amr-guidelines/blob/master/amr.md Haruki Kirigaya 2016.09.30 26 / 86

Slide 46

Slide 46 text

An AMR Example Brutus stabbed Caesar with a knife in the back in the agora and twisted it hard. (s / stab :ARG0 (p / person :name (n / name :op1 "Brutus") :ARG0-of (t / twist :ARG1 k :manner (h / hard))) :ARG1 (p2 / person :name (n2 / name :op1 "Caesar")) :ARG2 (k / knife) :ARG3 (b / back) :location (a / agora)) Haruki Kirigaya 2016.09.30 27 / 86

Slide 47

Slide 47 text

Event Frames Rise from Various POS Verb Haruki Kirigaya 2016.09.30 28 / 86

Slide 48

Slide 48 text

Event Frames Rise from Various POS Verb Noun Examples the destruction of the city by the God (d / destroy-01 :ARG0 (g / God) :ARG1 (c / city)) Examples the bond investor (p / person :ARG0-of (i / invest-01 :ARG1 (b / bond))) but professor doesn’t yield an event frame Haruki Kirigaya 2016.09.30 28 / 86

Slide 49

Slide 49 text

Event Frames Rise from Various POS Verb Noun Examples the destruction of the city by the God (d / destroy-01 :ARG0 (g / God) :ARG1 (c / city)) Examples the bond investor (p / person :ARG0-of (i / invest-01 :ARG1 (b / bond))) Adjective Examples the attractive spy (s / spy :ARG0-of (a / attract-01)) Haruki Kirigaya 2016.09.30 28 / 86

Slide 50

Slide 50 text

Reiﬁcation - Frame from Non-Core Relation An adjunct for non-core relation in AMR must serve as a role for the relation, rather than for any object participating in that relation. Examples the marble in the jar (m / marble :location (j / jar)) the marble is not in the jar (b / be-located-at-91 :ARG1 (m / marble) :ARG2 (j / jar) :polarity -) Semantic Error (m / marble :location (j / jar :polarity -)) which reads the marble is in the non-jar Haruki Kirigaya 2016.09.30 29 / 86

Slide 51

Slide 51 text

Other Language Phenomenons Deﬁned in AMR AMR deﬁnes approximately 100 relations for language phenomenons. negation and modals interrogation and wh-questions named entities location source, destination, path cause, concession, condition quantities, date, time link with wikipedia article :wiki “Barack Obama” . . . Haruki Kirigaya 2016.09.30 30 / 86

Slide 52

Slide 52 text

AMR Data Overview 1. Annotated Corpus: The Little Prince, 1274:145:143 The Little Prince Chinese Version, 1274:145:143 Bio AMR Corpus from PubMed (cancer) articles, 5452:500:500 LDC Corpus General Release 1.0 (June 2014), 13051 in all, a new general release is due in summer of 2016 2. Evaluation: smatch metric, comparison of two AMR 3. SemEval-2017 Task 9: Parsing and Generation English Biomedical Data to AMR (SemEval-2016 Task 8) AMR to English Generation 4. A python parser: https://github.com/nschneid/amr-hackathon Haruki Kirigaya 2016.09.30 31 / 86

Slide 53

Slide 53 text

Chinese AMR Corpus Example Haruki Kirigaya 2016.09.30 32 / 86

Slide 54

Slide 54 text

AMR Editor A simple web editor to build an AMR. Haruki Kirigaya 2016.09.30 33 / 86

Slide 55

Slide 55 text

Agenda 1 Semantics 2 Parsing 3 Summary Haruki Kirigaya 2016.09.30 34 / 86

Slide 56

Slide 56 text

Parsing Methods There’re many semantic parsing paradigms. Some of them are new methods while others borrow ideas from other domains or tasks to do semantic parsing exactly. Shift-Reduce (LR) (1993) Combinatory Categorial Grammar (2005) Word Alignment (Synchronized CFG) (2006) Generative Model (2008) Syntactic Parse to Semantic Parse (2009) Weak Supervision and Unsupervised Methods (2010) Large-scale SP for Freebase and QA (2013) Paraphrase-driven SP (2014) Neural Semantic Parsing (2015) Haruki Kirigaya 2016.09.30 35 / 86

Slide 57

Slide 57 text

Agenda 1 Semantics 2 Parsing Shift-Reduce CCG Word Alignments Semantic Parsing from Syntactic Parses Weak and Unsupervised Parser Paraphrase-driven Parsing Neural Semantic Parsing 3 Summary Haruki Kirigaya 2016.09.30 36 / 86

Slide 58

Slide 58 text

Inductive Logic Programming (Zelle et al., 1993) Shift-Reduce is a simple bottom-up parsing. Each action correspond to a prolog clause. Haruki Kirigaya 2016.09.30 37 / 86

Slide 59

Slide 59 text

Inductive Logic Programming (Zelle et al., 1993) CHILL(Constructive Heuristic Induction for Language Learning) Find Generalization: merge clauses not cover any negative sample. Reduce Deﬁnition: prefer new clause to prove positive examples Haruki Kirigaya 2016.09.30 38 / 86

Slide 60

Slide 60 text

CHILL on GeoQuery (Zelle et al., 1996) Haruki Kirigaya 2016.09.30 39 / 86

Slide 61

Slide 61 text

CHILL on GeoQuery (Zelle et al., 1996) Haruki Kirigaya 2016.09.30 39 / 86

Slide 62

Slide 62 text

Slide 63

Slide 63 text

Combinatory Category Grammar (Steedman, 1996, 2000) CCG comes with a lexicon whose element is a pair of word and a category: borders := (S\NP)/NP : λx.λy.borders(y, x) word: borders syntactic type: (S\NP)/NP semantic type: λx.λy.borders(y, x) Haruki Kirigaya 2016.09.30 41 / 86

Slide 64

Slide 64 text

Combinatory Category Grammar (Steedman, 1996, 2000) Categories can be combined. forward and backward application A / B : f + B : x ⇒ A : f(x) B : x + A \ B : f ⇒ A : f(x) forward and backword composition A / B : f + B / C : g ⇒ A / C : f ◦ g A \ B : f + B \ C : g ⇒ A \ C : f ◦ g type raising X ⇒ T/(T\X) Haruki Kirigaya 2016.09.30 41 / 86

Slide 65

Slide 65 text

Combinatory Category Grammar (Steedman, 1996, 2000) A CCG Parse Example. Haruki Kirigaya 2016.09.30 41 / 86

Slide 66

Slide 66 text

Semantic Parsing using CCG on GeoQuery Zettlemoyer and Collins, 2005 Given the lexicon and model parameter, CCG is formulated as a log-linear probablistic model to deal with ambiguity, e.g. duplicated lexicon entries for a word, and spurious ambiguity: P(L, T | S; ¯ θ) = exp(¯ f (L, T, S) · ¯ θ) (L,T) exp(¯ f (L, T, S) · ¯ θ) And we can do inference on the model: L = arg max L P(L | S; ¯ θ) = arg max L T P(L, T | S; ¯ θ) Features are designed as local and thus we can use dynamic programming (beam-search acturally) and prune the search space (like CKY-style). Haruki Kirigaya 2016.09.30 42 / 86

Slide 67

Slide 67 text

Learning the Model (Zettlemoyer et al. 2005) Learning the parameters using SGD. Haruki Kirigaya 2016.09.30 43 / 86

Slide 68

Slide 68 text

Learning the Lexicon (Zettlemoyer et al. 2005) GENLEX(S, L) = {x := y | x ∈ W (S), y ∈ C(L)} W(S) is all subsequence of S C(L) produces categories using rules L triggered Haruki Kirigaya 2016.09.30 44 / 86

Slide 69

Slide 69 text

Problems in ZC05 GENLEX is controlled by rules, and will be insuﬃcient if the rules don’t cover all the (S, L) pairs. Examples Through which states does the Mississippi run. GENLEX doesn’t trigger a category suitable for the through-adjunct placed ahead. Namely, phrase order may be relaxed. Haruki Kirigaya 2016.09.30 45 / 86

Slide 70

Slide 70 text

Relaxed Combinatory Rules (Zettlemoyer et al., 2007) relaxed function application relaxed function composition role-hypothesising type shifting (for missing predicates) null-head type shifting (for missing arguments) crossed functional composition Triggers are added for these new rules, too. Haruki Kirigaya 2016.09.30 46 / 86

Slide 71

Slide 71 text

Online Learning (Zettlemoyer et al., 2007) Use a peceptron learning instead. New features are also added. Haruki Kirigaya 2016.09.30 47 / 86

Slide 72

Slide 72 text

Problems in ZC07 GENLEX needs hand-written rules. Haruki Kirigaya 2016.09.30 48 / 86

Slide 73

Slide 73 text

Slide 74

Slide 74 text

CCG Induction using Unification (Kwiatkowski et al. 2010) Unification in wikipedia: Unification is an algorithmic process of solving equations between symbolic expressions. e.g. {cons(x, cons(x, nil)) = cons(2, y)} ⇒ {x → 2, y → cons(2, nil)} Here unification aims to find f and g given h, s.t. h = λx.f (g(x)) or h = f (g). For example, the given initial lexical entry New York borders Vermont S : next to(ny, vt) will be splitted as New York borders S/NP : λx.next to(ny, vt) Vermont NP : vt Haruki Kirigaya 2016.09.30 49 / 86

Slide 75

Slide 75 text

CCG Induction using Uniﬁcation (Kwiatkowski et al. 2010) Parsing with PCCG P(y, z | x; θ, Λ) = exp(θ · φ(x, y, z)) Z(y , z ) f (x) = arg max z p(z | x; θ, Λ) p(z | x; θ, Λ) = y p(y, z | x; θ, Λ) Again, to compute the parse eﬃciently, CKY-style parsing with dynamic programming summing over y with inside-outside algorithm Haruki Kirigaya 2016.09.30 50 / 86

Slide 76

Slide 76 text

CCG Induction using Uniﬁcation (Kwiatkowski et al. 2010) Learning algorithm: NEW-LEX will consider whether to split the lexical entries and gives new lexicon from arg maxy∗ p(y∗ | xi , zi ; θ , Λ ) Haruki Kirigaya 2016.09.30 50 / 86

Slide 77

Slide 77 text

Split a lexicon (Kwiatkowski et al. 2010) Split a lexical entry: Step 1, function New York borders Vermont S : next to(ny, vt) uniﬁcation constraints (otherwise inﬁnite-result): No vacuous variables: g = λx.tex limited coordination extraction: g contains less than N adjuncts limited application: f contains no new variables for non-variable subexpression in h like h = λx.in(x, tex) f → λq.q(tex) g → λyλx.in(x, y) Haruki Kirigaya 2016.09.30 51 / 86

Slide 78

Slide 78 text

Slide 79

Slide 79 text

Split a lexicon (Kwiatkowski et al. 2010) Split a lexical entry: Step 2, syntactic type New York borders Vermont S : next to(ny, vt) According to CCG combinatory rules(only 4 here), deﬁne SC (A) = {FA(A) ∪ BA(A) ∪ FC(A) ∪ BC(A)} FA(X : h) = {(X/Y : f , Y : g) | h = f (g) ∧ Y = C(T(g))} BA(X : h) = {(Y : g, X\Y : f ) | h = f (g) ∧ Y = C(T(g))} FC(X/Y : h) = {(X/W : f , W /Y : g) | h = λx.f (g(x)) ∧ W = C(T(g(x))) BC(X\Y : h) = {(W \Y : f , X\W : g) | h = λx.f (g(x)) ∧ W = C(T(g(x))) where T : F → {e, t, F} is the type function and C is deﬁned as C(T) =      NP if T = e S if T = t C(T2)|C(T1) if T = T1, T2 Haruki Kirigaya 2016.09.30 52 / 86

Slide 80

Slide 80 text

Split a lexicon (Kwiatkowski et al. 2010) Split a lexical entry: Step 2, syntactic type New York borders Vermont S : next to(ny, vt) These are some possible pair from the splitting set. Semantic: (λx.next to(ny, x), vt) Syntactic: (S/NP, NP) Semantic: (ny, λx.next to(x, vt)) Syntactic: (NP, S\NP) Semantic: (λx.next to(x, vt), ny) Syntactic: (S/NP, NP) Haruki Kirigaya 2016.09.30 52 / 86

Slide 81

Slide 81 text

Slide 82

Slide 82 text

Split a lexicon (Kwiatkowski et al. 2010) Split a lexical entry: Step 3, word sequence New York borders Vermont S : next to(ny, vt) Splitting is deﬁned as SL(w0:n A) = {(w0:i B, wi+1:n C) | 0 ≤ i < n ∧ (B, C) ∈ SC (A)} For some speciﬁc i, the previous splits may raise problems. (S/NP : λx.next to(ny, x), NP : vt) Sequence: (New York borders, Vermont) (NP : ny, S\NP : λx.next to(x, vt)) Sequence: (New York, borders Vermont) (S/NP : λx.next to(x, vt), NP : ny) Sequence: (borders Vermont, New York) incorrect Haruki Kirigaya 2016.09.30 53 / 86

Slide 83

Slide 83 text

Problems in Kwiatkowski et al. 2010 Learned CCG lexicon is too big. Haruki Kirigaya 2016.09.30 54 / 86

Slide 84

Slide 84 text

Slide 85

Slide 85 text

Factored Lexicon in CCG (Kwiatkowski et al 2011) Original lexical entry: Boston N/N : λf λx.from(x, bos) ∧ f (x) Factored Parts: lexeme, pair of a word span and a constant list: (Boston, [from, bos]) template, λ(w, v).(w N/N : λf λx.v1(x, v2) ∧ f (x)) Two type of factorization: 1 maximal factor: all constants are in lexeme (Boston, [from, bos]), λ(w, v).(w N/N : λf λx.v1(x, v2) ∧ f (x)) 2 partial factor: some constants remain in the template (Boston, [bos]), λ(w, v).(w N/N : λf λx.from(x, v1) ∧ f (x)) Partial factor is used for missing words: ﬂights Boston to New York Haruki Kirigaya 2016.09.30 55 / 86

Slide 86

Slide 86 text

Factored Lexicon in CCG (Kwiatkowski et al 2011) Learning is similar but to consider factorization: Haruki Kirigaya 2016.09.30 56 / 86

Slide 87

Slide 87 text

Ontological Mismatch Problem GeoQuery / ATIS dataset is too small. Learning a parser is easy for it. a few predicates a few utterances (more than predicate) Haruki Kirigaya 2016.09.30 57 / 86

Slide 88

Slide 88 text

Slide 89

Slide 89 text

Ontological Mismatch Problem GeoQuery / ATIS dataset is too small. Learning a parser is easy for it. a few predicates a few utterances (more than predicate) If a database has more predicates and thus more capable to answer more questions in theory, the amount of possible utterance can go even further. What’s worse, new utterances linguistically involve more predicates in theory, but database schema is ﬁxed and supports only limited predicates. Haruki Kirigaya 2016.09.30 57 / 86

Slide 90

Slide 90 text

Slide 91

Slide 91 text

SP with On-the-ﬂy Matching(Kwiatkowski et al., 2013) Choose to convert Q1 to MR1, Q2 to MR2, then MR2 to MR1. Using Q-A pairs. Haruki Kirigaya 2016.09.30 58 / 86

Slide 92

Slide 92 text

SP with On-the-ﬂy Matching(Kwiatkowski et al., 2013) Parsing Framework Domain-independent parsing use a domain-independent CCG parser(Clark & Curran, 2007) to convert the utterance to underspeciﬁed LF, with a hand-written lexicon 59 lexical categories with POS tags, and assign to words based on POS tags from Wiktionary. 49 domain-independent lexical items (what, when, and, is, etc.) Ontological Matching Use a series of matching operations M = o1, o2, · · · Structural Match: Collapse Operator, Expansion Operator Constant Matching (replace in the same type) Haruki Kirigaya 2016.09.30 59 / 86

Slide 93

Slide 93 text

SP with On-the-ﬂy Matching(Kwiatkowski et al., 2013) Haruki Kirigaya 2016.09.30 59 / 86

Slide 94

Slide 94 text

SP with On-the-ﬂy Matching(Kwiatkowski et al., 2013) Haruki Kirigaya 2016.09.30 59 / 86

Slide 95

Slide 95 text

SP with On-the-ﬂy Matching(Kwiatkowski et al., 2013) Parsing and Learning: Parse(x, O) = arg maxd∈GEN(x,O) (Score(d)) Score(d) = φ(d)θ = φ(Π)θ + o∈M φ(o)θ Haruki Kirigaya 2016.09.30 59 / 86

Slide 96

Slide 96 text

SP with On-the-ﬂy Matching(Kwiatkowski et al., 2013) Features: CCG Parse Features(Pi): #u category, #(word, category), #(POS, category) Structural Features(M): identity of complex-typed const., domain-indep. const. Lexical Features(M): for (cu, cO), φnp, φstem, φsyn, φfp:stem, φdef overlap Knowledge Base Features: exec y on K, φdirect, φjoin, φempty , φ0, φ1 Haruki Kirigaya 2016.09.30 59 / 86

Slide 97

Slide 97 text

Other CCG Works Artzi et al., TACL2013, Weakly Supervised Learning of Semantic Parsers for Mapping Instructions to Actions Modeling robot instruction and get feedback from robot action. Reddy et al., TACL2014, Large-scale Semantic Parsing without Question-Answer Pairs Use ClueWeb09 and FACC1, and general CCG parser to build a LF, which is then converted to an ungrounded graph sharing commonalities with Freebase. Artzi et al., EMNLP2015, Broad-coverage CCG Semantic Parsing with AMR To deal with co-reference in AMR, use Skolem Terms (Steedman, 2011) to build an underspeciﬁed LF, which is then mapped to speciﬁed LF. Haruki Kirigaya 2016.09.30 60 / 86

Slide 98

Slide 98 text

Slide 99

Slide 99 text

Learning SP with SMT(Wong and Mooney, 2006) Synchronized CFG rule: X → α, β (pattern & template) Haruki Kirigaya 2016.09.30 62 / 86

Slide 100

Slide 100 text

Learning SP with SMT(Wong and Mooney, 2006) Parsing: enumerate derivations that gives e, f f ∗ = m(arg max d∈D(G|e) Pr(d | e; λ)) Pr λ (d | e) = 1 Zλ(e) exp i λi fi (d) Features (3000 or so): number of each rule r is used in derivation number of each word w generated from gaps Parameter Estimation: Viterbi algorithm is used for eﬃcient decoding. Since gold derivation is latent, EM is used to ﬁnd the optimal parameters. Haruki Kirigaya 2016.09.30 62 / 86

Slide 101

Slide 101 text

Learning SP with SMT(Wong and Mooney, 2006) Grammar rules acquisition: Step 1, suppose we know the grammar, LF can be convert to a production (rule) sequence of the left-most top-down derivation (by convention). We prefer derivation sequence over LF because MR may be not well-formed without grammar and MR tokens can be polysemy or carry no speciﬁc meaning. Examples (Sentence and LF from RoboCup) ((bowner our {4}) (do our {6} (pos (left (half our))))) If our player 4 has the ball, then our player 6 should stay in the left side of our half. Haruki Kirigaya 2016.09.30 63 / 86

Slide 102

Slide 102 text

Learning SP with SMT(Wong and Mooney, 2006) Grammar rules acquisition: Step 2, use GIZA++ to ﬁnd alignment between words and derivation rules. Haruki Kirigaya 2016.09.30 63 / 86

Slide 103

Slide 103 text

Learning SP with SMT(Wong and Mooney, 2006) Grammar rules acquisition: Step 3, extract bottom up, starting with rules where RHS contains only terminals, then those whose RHS contains non-terminals. UNUM → 4, 4 TEAM → our, our CONTITION → TEAM player UNUM has {1} ball , (bowner TEAM {UNUM}) Haruki Kirigaya 2016.09.30 63 / 86

Slide 104

Slide 104 text

Learning SP with SMT(Wong and Mooney, 2006) Grammar rules acquisition: Step 3 special case: rule that doesn’t derive any terminal, and that break links outside this sub-parse tree. Merge the rules as: REGION → (left (penalty-area TEAM) For excessively merged rules (overﬁtting), try a greedy link-removal policy (alignment ﬁxing). Haruki Kirigaya 2016.09.30 63 / 86

Slide 105

Slide 105 text

SCFG with Lambda Calculus(Wong and Mooney, 2007) Use lambda expression for semantic function instead. To improve NL-MR isomorphism, ﬁnd a MST on a graph where edges between rules are established for any shared variable and weighted by the minimal word distance. Haruki Kirigaya 2016.09.30 64 / 86

Slide 106

Slide 106 text

Data Recombination (Jia and Liang, 2016) Fit a new model in SCFG using 3 kinds of policies. Then draw training examples from it. Haruki Kirigaya 2016.09.30 65 / 86

Slide 107

Slide 107 text

Data Recombination (Jia and Liang, 2016) Haruki Kirigaya 2016.09.30 65 / 86

Slide 108

Slide 108 text

Generative Models Hybrid Tree (Lu et al., 2008) Haruki Kirigaya 2016.09.30 66 / 86

Slide 109

Slide 109 text

Generative Models Learning with Less Supervision (Liang et al., 2009) Given a World State paired with several sentences. p(r, f , c, w | s) = p(r | s)p(f | r)p(c, w | r, f , s) Haruki Kirigaya 2016.09.30 66 / 86

Slide 110

Slide 110 text

Slide 111

Slide 111 text

Using an Syntactic Parse(Ge and Mooney, 2009) Parser Components: an existing syntactic parser a learned lexicon from words to predicates a learned set of composition rules Assumption: unambiguous CFG of LF is known Haruki Kirigaya 2016.09.30 68 / 86

Slide 112

Slide 112 text

Using an Syntactic Parse(Ge and Mooney, 2009) Parse from bottom-up: Haruki Kirigaya 2016.09.30 68 / 86

Slide 113

Slide 113 text

Using an Syntactic Parse(Ge and Mooney, 2009) Not all semantic sub-tree strictly follows the syntactic derivation. Introduce macro-predicates when children MRs can’t combine. become as an argument if the child MR is complete otherwise become as part of the predicate Haruki Kirigaya 2016.09.30 68 / 86

Slide 114

Slide 114 text

Slide 115

Slide 115 text

Using an Syntactic Parse(Ge and Mooney, 2009) Learning: lexicon is learned with GIZA++ (like Wong and Mooney, 2006) if a predicate is not aligned to any word, the predicate is inferred and just bound to their values in MR if a predicate is aligned to several word, split it to several alignments composition rules is learned in the form Λ1.P1 + Λ2.P2 ⇒ {Λp.Pp, R} λa1λa2 .P PLAYER + P UNUM ⇒ {λa1.P PLAYER, a2 = c2} disambiguation model: max-ent with L-BFGS Pr(D | S, T; ¯ θ) = exp i θi fi (D) Z(S, T) Haruki Kirigaya 2016.09.30 69 / 86

Slide 116

Slide 116 text

Transforming Dep. Parse (Reddy et al., 2016) Parsing: binarize dep parse to S-expr. substitution symbols with λ-expr hierarchical composition(beta-reduction) Haruki Kirigaya 2016.09.30 70 / 86

Slide 117

Slide 117 text

Transforming Dep. Parse (Reddy et al., 2016) Parsing: Follow Reddy et al. 2014 to transform to grounded graph. New operators to deal with mismatch between dep and semantic parse: CONTRACT: merge some nodes and edges into a single node EXPAND: add edges for some disjoint nodes (errors in depparse) Haruki Kirigaya 2016.09.30 70 / 86

Slide 118

Slide 118 text

Other works for SynParse to SemParse Incremental Parser for AMR (Damonte et al. 2016) A greedy transition-based parser for AMR, inspired by former work of transition-based syntactic parser ArcEager (Nivre 2004, 2008), using an existing dep-parser. Imitation learning for AMR(Goodman et al., 2016) A transition-based parser with imitation learning, extended by techniques like noise reduction and targeted exploration, using an existing dep-parser. Haruki Kirigaya 2016.09.30 71 / 86

Slide 119

Slide 119 text

Slide 120

Slide 120 text

SP from World’s Response(Clarke et al., 2010) Parsing: ˆ z = Fw (x) = arg maxy∈Y ,z∈Z wT Φ(x, y, z) Learning: Haruki Kirigaya 2016.09.30 73 / 86

Slide 121

Slide 121 text

Slide 122

Slide 122 text

SP from World’s Response(Clarke et al., 2010) Parsing: In order to adapt to unseen inputs, consider the entire meaning space instead of rule extraction from training data. Fw (x) = arg maxy,z wT Φ(x, y, z) = arg maxα,β c∈X s∈D αcs · wT Φ1 + c,d∈X s,t∈D βcs,dt · wT Φ2 α: word span c aligned with symbol s β: word span d aligned with t, when α is activated such that A consituent is associated with 1 symbol beta(cs,dt) activated iﬀ. alpha(cs) and alpha(dt) activated beta(cs,dt) activated then s is a function and (s, t) is type-consistent functional composition is directional and acyclic Haruki Kirigaya 2016.09.30 73 / 86

Slide 123

Slide 123 text

SP from World’s Response(Clarke et al., 2010) Parsing: In order to adapt to unseen inputs, consider the entire meaning space instead of rule extraction from training data. Fw (x) = arg maxy,z wT Φ(x, y, z) = arg maxα,β c∈X s∈D αcs · wT Φ1 + c,d∈X s,t∈D βcs,dt · wT Φ2 Features Used: 1st-order: stemmed word match 1st-order: similarity based on WordNet (Do et al. 2010) 2nd-order: normalized distance of the head words in c and d for beta(cs, dt) on the dependency tree of sentence 2nd-order: symbol concurrence frequency (regardless of alignments) Haruki Kirigaya 2016.09.30 73 / 86

Slide 124

Slide 124 text

Confidence-driven Unsupervised SP (Goldwasser 2011) What is “confidence-driven unsupervised method”: Idea: if a pattern is produced multiple times from non-random model, it is likely to be an indication of an underlying phenomenon in the data. Confidence: Output structures close to the center of statistic mass will receive a high confidence score. Confidence-driven: the model will be significantly improved compared with using only prediction score wT Φ(x, y, z) Parsing is the same with Clarke et al. 2010, formulated as an ILP problem. Haruki Kirigaya 2016.09.30 74 / 86

Slide 125

Slide 125 text

Confidence-driven Unsupervised SP (Goldwasser 2011) Confidence: (1). translation model unigram p(z | x) = |z| i=1 p(si | y(si )) bigram p(z | x) = |z| i=1 p(si−1(si ) | y(si−1), y(si )) (2). structural proportion Prop(x, z): proportion of #pred in z and #words in x AvProp(S): Average over sets PropScore(S, (x,z)) = AvProp(S) - Prop(x, z) combined: use (2) to filter candidates and (1) to rank items Haruki Kirigaya 2016.09.30 74 / 86

Slide 126

Slide 126 text

Grounded Unsupervised SP(Poon, 2013) Parsing Idea: Annotate states to nodes and edges of a dep-parse No need to train speciﬁc tokens: datetime, numerics, logical op. States: “states” are from DB schema: entity / attribute complex states are included for mismatching of depparse & semantic Inference: z∗ = arg max z Pθ(d, z) Pθ(d, z) = 1 Z exp i fi (d, z) · wi (d, z) Learning: θ∗ = arg max d∈D log z Pθ(d, z) Haruki Kirigaya 2016.09.30 75 / 86

Slide 127

Slide 127 text

Grounded Unsupervised SP(Poon, 2013) get ﬂight from toronto to san diego stopping in dtw. Haruki Kirigaya 2016.09.30 75 / 86

Slide 128

Slide 128 text

SP on Freebase and QA Cai and Yates, 2013a, 2013b They discuss methods to use an existing parser in new domain. Berant et al., 2013 Collect WebQuestions dataset. Yih et al., 2014 A CNN-based Semantic Model (CNNSM) Yih et al., 2015 Staged Query Graph Generation. Find a core inferential chain executed on Freebase. Pasupat and Liang, 2015 SP on semi-structured tables. Propose a dataset of tables and adopt a method to convert tables to knowledge graph ﬁrst. Haruki Kirigaya 2016.09.30 76 / 86

Slide 129

Slide 129 text

Slide 130

Slide 130 text

Paraphrase Comparision (Berant and Liang, 2014): paraphrase independent with KB, using two kinds of templates (Chen et al. 2016): direct paraphrase, using wiktionary Haruki Kirigaya 2016.09.30 78 / 86

Slide 131

Slide 131 text

SP via Paraphrasing(Berant and Liang, 2014) pθ(c, z | x) = 1 Z exp(φ(x, c, z)T θ) φ(x, c, z)T θ = φpr (x, c)T θpr + φlf (x, z)T φlf Haruki Kirigaya 2016.09.30 79 / 86

Slide 132

Slide 132 text

Building SP Overnight (Wang et al. 2015) Build an SP for a new Domain (8 published datasets): Human write a lexicon The domain-general grammar G induces canonical LFs and utterances Crowdsourcing to rewrite the awkward utterances into ﬂuent ones Train a parser using the grammar G, by paraphrasing Haruki Kirigaya 2016.09.30 80 / 86

Slide 133

Slide 133 text

Slide 134

Slide 134 text

Sequence-based SP (Xiao et al., 2016) Vinyals et al. 2015 proved successful to use sequence model on grammar parsing. Xiao et al. compaired various sequence forms based on SPO(Wang et al. 2015). LF as raw token sequence DSP (Derivation Sequence Prediction) DSP-C (Constrained), use grammar to constrain the next rule at testing time DSP-CL (Constrained Loss), p (yt) is normalized only over possible values CFP (Canonical Form Prediction), predict CF instead which is then parsed to LF Haruki Kirigaya 2016.09.30 82 / 86

Slide 135

Slide 135 text

Sequence-based SP (Xiao et al., 2016) Haruki Kirigaya 2016.09.30 82 / 86

Slide 136

Slide 136 text

Parsing with Neural Attention(Dong and Lapata, 2016) The Seq2Tree model that also learns latent grammar. Haruki Kirigaya 2016.09.30 83 / 86

Slide 137

Slide 137 text

Result Comparison AMR Parsing(F1: 70 Goodman et al. 2016) and WebQuestions(F1: 52.5 Yih et al. 2015) Haruki Kirigaya 2016.09.30 84 / 86

Slide 138

Slide 138 text

Agenda 1 Semantics 2 Parsing 3 Summary Haruki Kirigaya 2016.09.30 85 / 86

Slide 139

Slide 139 text

Summary Problems: LF not Well-Formed particularly in Neural SP methods, use existing grammar or learned grammar. Ontology Mismatching paraphrasing or other two-phase parsing. Utterance Explosion prefer expansion from meaning space over rule extraction from training data. Isomorphism between NL(or Syntactic Parse) and Semantic Parse add relaxing extensions because NL isn’t strict and syn-parse may introduce errors. Haruki Kirigaya 2016.09.30 86 / 86