Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Semantic Parsing Methods

Haruki Kirigaya
September 30, 2016

Semantic Parsing Methods

An overview about the existing approaches to do specifically semantic parsing

Haruki Kirigaya

September 30, 2016
Tweet

More Decks by Haruki Kirigaya

Other Decks in Research

Transcript

  1. Background When it comes to the understanding of natural language

    sentences, NLP researchers solve it in various granularities. These tasks differ in the amount of information they use. Information Extraction (less informative) is a(Obama, PRESIDENT) Haruki Kirigaya 2016.09.30 4 / 86
  2. Background When it comes to the understanding of natural language

    sentences, NLP researchers solve it in various granularities. These tasks differ in the amount of information they use. Information Extraction (less informative) is a(Obama, PRESIDENT) Summarization (modestly informative) Obama wins. Haruki Kirigaya 2016.09.30 4 / 86
  3. Background When it comes to the understanding of natural language

    sentences, NLP researchers solve it in various granularities. These tasks differ in the amount of information they use. Information Extraction (less informative) is a(Obama, PRESIDENT) Summarization (modestly informative) Obama wins. Semantic Parsing (exact matching) ∃e.beat(e) ∧ Sub(e, Obama) ∧ Obj(e, Romney) Haruki Kirigaya 2016.09.30 4 / 86
  4. Background When it comes to the understanding of natural language

    sentences, NLP researchers solve it in various granularities. These tasks differ in the amount of information they use. Information Extraction (less informative) is a(Obama, PRESIDENT) Summarization (modestly informative) Obama wins. Semantic Parsing (exact matching) ∃e.beat(e) ∧ Sub(e, Obama) ∧ Obj(e, Romney) Caveat Semantic here is more of composition than telling apart from word senses. Haruki Kirigaya 2016.09.30 4 / 86
  5. Semantic Parsing Task The key task of semantic parsing is

    to find an f such that f : Sentence → LogicForm Haruki Kirigaya 2016.09.30 5 / 86
  6. Semantic Parsing Task The key task of semantic parsing is

    to find an f such that f : Sentence → LogicForm Generally, there are 3 aspects a semantic parser need take into consideration: Modelling: how to represent a logic form Parsing: design a grammar and parsing algorithm Learning: use supervision to fix parameters Haruki Kirigaya 2016.09.30 5 / 86
  7. Agenda 1 Semantics Davidsonian Representation MRS AMR 2 Parsing 3

    Summary Haruki Kirigaya 2016.09.30 6 / 86
  8. Logic Form from Example Brutus stabs Caesar. stab(Brutus, Caesar) predicate

    Brutus stabs Caesar with a knife. stab(Brutus, Caesar, knife) n-ary predicate Haruki Kirigaya 2016.09.30 7 / 86
  9. Logic Form from Example Brutus stabs Caesar. stab(Brutus, Caesar) predicate

    Brutus stabs Caesar with a knife. stab(Brutus, Caesar, knife) n-ary predicate Brutus stabs Caesar in the agora. stab(Brutus, Caesar, agora) ambiguous predicate Haruki Kirigaya 2016.09.30 7 / 86
  10. Logic Form from Example Brutus stabs Caesar. stab(Brutus, Caesar) predicate

    Brutus stabs Caesar with a knife. stab(Brutus, Caesar, knife) n-ary predicate Brutus stabs Caesar in the agora. stab(Brutus, Caesar, agora) ambiguous predicate Brutus stabs Caesar in the agora with a knife. stab(Brutus, Caesar) & with(knife) & in(agora) move adjunct apart Haruki Kirigaya 2016.09.30 7 / 86
  11. Logic Form from Example Brutus stabs Caesar in the agora

    with a knife. stab(Brutus, Caesar) & with(knife) & in(agora) Haruki Kirigaya 2016.09.30 8 / 86
  12. Logic Form from Example Brutus stabs Caesar in the agora

    with a knife. stab(Brutus, Caesar) & with(knife) & in(agora) Brutus stabs Caesar with a knife in the agora and twisted it hard. stab(Brutus, Caesar) & with(knife) & in(agora) & twist(Brutus, knife) & hard Haruki Kirigaya 2016.09.30 8 / 86
  13. Logic Form from Example Brutus stabs Caesar in the agora

    with a knife. stab(Brutus, Caesar) & with(knife) & in(agora) Brutus stabs Caesar with a knife in the agora and twisted it hard. stab(Brutus, Caesar) & with(knife) & in(agora) & twist(Brutus, knife) & hard The standard predicate calculus has problems. unable to refer to predicates natural language are flexible in the number of arguments Pass the axe. Pass me the axe. Haruki Kirigaya 2016.09.30 8 / 86
  14. Davidsonian Representation Semantic is characterized in events. We don’t know

    an event beforehand, thus we existentially quantify it. Brutus stabs Caesar with a knife in the agora and twisted it hard. ∃e.stab(e, Brutus, Caesar) ∧ with(e, knife) ∧ in(e, agora) ∧(∃e .twist(e , Brutus, knife) ∧ hard(e )) Caesar is stabbed. ∃x∃e.stab(e, x, Caesar) Missing arguments are left with placeholders. Haruki Kirigaya 2016.09.30 9 / 86
  15. Problem in Davidsonian Way Consider the following sentence: Examples In

    a dream last night, I was stabbed, although in fact nobody had stabbed me and I wasn’t stabbed with anything. There’s NOBODY here to initiate the stab event. The representation should correspond to the utterance rather than reality? Haruki Kirigaya 2016.09.30 10 / 86
  16. neo-Davidsonian Representation (Parson, 1995) Replace arguments (and placeholders) with independent

    conjuncts. Basically, two roles are important: Agent, Thematic/Patient. Haruki Kirigaya 2016.09.30 11 / 86
  17. neo-Davidsonian Representation (Parson, 1995) Replace arguments (and placeholders) with independent

    conjuncts. Basically, two roles are important: Agent, Thematic/Patient. Brutus stabbed Caesar in the back with a knife ∃e.stab(e) ∧ Agent(e, Brutus) ∧ Patient(e, Caesar) ∧with(e, knife) ∧ in(e, agora) Haruki Kirigaya 2016.09.30 11 / 86
  18. Advantages of the neo-Davidsonian (Palmer, 2014) (1) Entailment Given the

    following sentences A. Brutus stabbed Caesar in the back with a knife. B. Brutus stabbed Caesar in the back. C. Brutus stabbed Caesar with a knife. We know A → B ∨ C but NOT B ∨ C → A. Haruki Kirigaya 2016.09.30 12 / 86
  19. Advantages of the neo-Davidsonian (Palmer, 2014) (1) Entailment Given the

    following sentences A. Brutus stabbed Caesar in the back with a knife. B. Brutus stabbed Caesar in the back. C. Brutus stabbed Caesar with a knife. We know A → B ∨ C but NOT B ∨ C → A. Using neo-Davidsonian representation preserves this phenomenon. Let Agt = Agent, B = Brutus, C = Caesar, Pat = Patient, then. A. ∃e.stab(e) ∧ Agt(e, B) ∧ Pat(e, C) ∧ in(e, back) ∧ with(e, knife) B. ∃e.stab(e) ∧ Agt(e, B) ∧ Pat(e, C) ∧ in(e, back) C. ∃e.stab(e) ∧ Agt(e, B) ∧ Pat(e, C) ∧ with(e, knife) Haruki Kirigaya 2016.09.30 12 / 86
  20. Advantages of the neo-Davidsonian (2) Scope Traditional way uses scope

    to connect an adjunct and a verb. x stabbed y violently with z There’re two logically equative representations with different scope settings: (with z (violently (stab (y)))) (x) (violently (with z (stab (y)))) (x) But a flat representation like the neo-Davidsonian keeps meaning consistent and doesn’t introduce explicit syntactic scope. The slides will talk about flat and scope later. Haruki Kirigaya 2016.09.30 13 / 86
  21. Advantages of the neo-Davidsonian (3) Temporal and Causal Sentences Mary

    saw Brutus stabbed Caesar. Traditional way: Mary saw Brutus & Brutus stabbed Caesar. neo-Davidsonian way ∃e.see(e) ∧ Agt(e, Mary) ∧ (∃e .stab(e ) ∧ Agt(e , Brutus) ∧Pat(e, e ))) After the singing of national anthem, they saluted the flag. After the national anthem was sung, they saluted the flag. ∃e.salute(e) ∧ Agt(e, they) ∧ Pat(e, flag) ∧(∃e .sing(e ) ∧ Agt(e , they) ∧ Pat(e, NationalAnthem) ∧after(e, e )) Haruki Kirigaya 2016.09.30 14 / 86
  22. Possible Problems of the neo-Davidsonian I sold him a car

    for $50,000. Which is the patient, car or $50,000? Haruki Kirigaya 2016.09.30 15 / 86
  23. Possible Problems of the neo-Davidsonian I sold him a car

    for $50,000. Which is the patient, car or $50,000? I sold a car for Mary for $50,000. the same preposition with different meanings Haruki Kirigaya 2016.09.30 15 / 86
  24. Possible Problems of the neo-Davidsonian I sold him a car

    for $50,000. Which is the patient, car or $50,000? I sold a car for Mary for $50,000. the same preposition with different meanings Mary fed her baby. Can the baby, who is feeding, be the agent? Haruki Kirigaya 2016.09.30 15 / 86
  25. Possible Problems of the neo-Davidsonian I sold him a car

    for $50,000. Which is the patient, car or $50,000? I sold a car for Mary for $50,000. the same preposition with different meanings Mary fed her baby. Can the baby, who is feeding, be the agent? Brutus stabbed Caesar with a knife. The removal of Brutus may be different from that of knife. Haruki Kirigaya 2016.09.30 15 / 86
  26. Possible Problems of the neo-Davidsonian I sold him a car

    for $50,000. Which is the patient, car or $50,000? I sold a car for Mary for $50,000. the same preposition with different meanings Mary fed her baby. Can the baby, who is feeding, be the agent? Brutus stabbed Caesar with a knife. The removal of Brutus may be different from that of knife. Brutus stabbed Caesar once. It’s hard to specify the event happens only once in neo-Davidsonian. Haruki Kirigaya 2016.09.30 15 / 86
  27. Possible Problems of the neo-Davidsonian I sold him a car

    for $50,000. Which is the patient, car or $50,000? I sold a car for Mary for $50,000. the same preposition with different meanings Mary fed her baby. Can the baby, who is feeding, be the agent? Brutus stabbed Caesar with a knife. The removal of Brutus may be different from that of knife. Brutus stabbed Caesar once. It’s hard to specify the event happens only once in neo-Davidsonian. A saw B leave. When B left, he had the documents in his briefcase. = A saw B leave with the documents in his briefcase. If both leave events are the same, to make the inference work, how could A see one one without seeing another? Haruki Kirigaya 2016.09.30 15 / 86
  28. Summary of the neo-Davidsonian The neo-Davidsonian have several characteristics in

    representating semantic. Some of them are advantages while others are trival choices from various approaches. uses variables and is flat. event-style. An event is unique in time of occurrence. event arguments moved into roles and independent conjuncts. modifiers(adjectives, adverbs, adjuncts) are conjunct predicates transparent scope facilitate logical inference Haruki Kirigaya 2016.09.30 16 / 86
  29. Agenda 1 Semantics Davidsonian Representation MRS AMR 2 Parsing 3

    Summary Haruki Kirigaya 2016.09.30 17 / 86
  30. Minimal Recursion Semantics (Copestake, 2005) MRS is another flat semantic

    framework, serving as the basis of English Resource Semantic (ERS) or English Resource Grammar (ERG). Expressive Adequacy: ability to express meaning correctly Grammatical Compatibility: ability to link representations to grammatical information. Computation Tractability: ability to compare two representations (equality, relation, etc.) underspecifiability: leave semantic distinctions unresolved Haruki Kirigaya 2016.09.30 18 / 86
  31. Why a flat form In MT or other task, a

    structural representation is hard to use and unnecessary. Examples Sentence: white English horse Rule: white(horse)(x) ↔ Schimmel(x) Form: white(English(horse)) (x) Examples Sentence: The beginning of spring arrived. Rule: beginning of spring ↔ Fr¨ uhlingsanfang Form 1: def q(x, spring(x), the(y, beginning(y, x), arrive(y))) Form 2: the(y, def q(x, spring(x), beginning(y, x), arrive(y))) Haruki Kirigaya 2016.09.30 20 / 86
  32. Why a flat form A flat form is a group

    of elementary predicates. Examples Sentence: white English horse Rule: white(horse)(x) ↔ Schimmel(x) Form: white(x) & English(x) & horse(x) Examples Sentence: The beginning of spring arrived. Rule: beginning of spring ↔ Fr¨ uhlingsanfang Form: the(y) & beginning(y, x) & def(x) & spring(x) & arrive(e, y) Haruki Kirigaya 2016.09.30 21 / 86
  33. Underspecifiability in MRS There may be several semantically identical representations

    of a sentence. Every dog chases some white cat. Haruki Kirigaya 2016.09.30 22 / 86
  34. Underspecifiability in MRS There may be several semantically identical representations

    of a sentence. Every dog chases some white cat. Leave some handles unspecified. Then specify it later: h0 = h1, h3 = h5, h7 = h4 constraints, h3 = h7 to make it still a tree qeq constraint, h0 =q h5 is a trival example Haruki Kirigaya 2016.09.30 22 / 86
  35. MRS formally in a whole MRS is a quadruple {GT,

    LT, R, C} GT: global top. h0 LT: local top. h1, h4, h5 (semantic of local phrase) R: relations. h1:every(x, h2, h3), h5:dog(y, h6, h7), h4:chase(x), etc. C: constraints. h0 qeq h4, etc. Haruki Kirigaya 2016.09.30 23 / 86
  36. Highlights of MRS We reify scopal relationships as handles so

    that syntactically the language looks first-order. Preserve underspecifiability Haruki Kirigaya 2016.09.30 24 / 86
  37. Agenda 1 Semantics Davidsonian Representation MRS AMR 2 Parsing 3

    Summary Haruki Kirigaya 2016.09.30 25 / 86
  38. Abstract Meaning Representation (Banarescu, 2013) AMR is an semantic representation

    that is rooted, directed and labeled graph is identical for different utterance uses variables for co-reference uses PropBank frame (analogous to roles in neo-Davidsonian) designs non-core relations out of PropBank (analogous to adjuncts in neo-Davidsonian) Specification: https://github.com/amrisi/amr-guidelines/blob/master/amr.md Haruki Kirigaya 2016.09.30 26 / 86
  39. An AMR Example Brutus stabbed Caesar with a knife in

    the back in the agora and twisted it hard. (s / stab :ARG0 (p / person :name (n / name :op1 "Brutus") :ARG0-of (t / twist :ARG1 k :manner (h / hard))) :ARG1 (p2 / person :name (n2 / name :op1 "Caesar")) :ARG2 (k / knife) :ARG3 (b / back) :location (a / agora)) Haruki Kirigaya 2016.09.30 27 / 86
  40. Event Frames Rise from Various POS Verb Noun Examples the

    destruction of the city by the God (d / destroy-01 :ARG0 (g / God) :ARG1 (c / city)) Examples the bond investor (p / person :ARG0-of (i / invest-01 :ARG1 (b / bond))) but professor doesn’t yield an event frame Haruki Kirigaya 2016.09.30 28 / 86
  41. Event Frames Rise from Various POS Verb Noun Examples the

    destruction of the city by the God (d / destroy-01 :ARG0 (g / God) :ARG1 (c / city)) Examples the bond investor (p / person :ARG0-of (i / invest-01 :ARG1 (b / bond))) Adjective Examples the attractive spy (s / spy :ARG0-of (a / attract-01)) Haruki Kirigaya 2016.09.30 28 / 86
  42. Reification - Frame from Non-Core Relation An adjunct for non-core

    relation in AMR must serve as a role for the relation, rather than for any object participating in that relation. Examples the marble in the jar (m / marble :location (j / jar)) the marble is not in the jar (b / be-located-at-91 :ARG1 (m / marble) :ARG2 (j / jar) :polarity -) Semantic Error (m / marble :location (j / jar :polarity -)) which reads the marble is in the non-jar Haruki Kirigaya 2016.09.30 29 / 86
  43. Other Language Phenomenons Defined in AMR AMR defines approximately 100

    relations for language phenomenons. negation and modals interrogation and wh-questions named entities location source, destination, path cause, concession, condition quantities, date, time link with wikipedia article :wiki “Barack Obama” . . . Haruki Kirigaya 2016.09.30 30 / 86
  44. AMR Data Overview 1. Annotated Corpus: The Little Prince, 1274:145:143

    The Little Prince Chinese Version, 1274:145:143 Bio AMR Corpus from PubMed (cancer) articles, 5452:500:500 LDC Corpus General Release 1.0 (June 2014), 13051 in all, a new general release is due in summer of 2016 2. Evaluation: smatch metric, comparison of two AMR 3. SemEval-2017 Task 9: Parsing and Generation English Biomedical Data to AMR (SemEval-2016 Task 8) AMR to English Generation 4. A python parser: https://github.com/nschneid/amr-hackathon Haruki Kirigaya 2016.09.30 31 / 86
  45. AMR Editor A simple web editor to build an AMR.

    Haruki Kirigaya 2016.09.30 33 / 86
  46. Parsing Methods There’re many semantic parsing paradigms. Some of them

    are new methods while others borrow ideas from other domains or tasks to do semantic parsing exactly. Shift-Reduce (LR) (1993) Combinatory Categorial Grammar (2005) Word Alignment (Synchronized CFG) (2006) Generative Model (2008) Syntactic Parse to Semantic Parse (2009) Weak Supervision and Unsupervised Methods (2010) Large-scale SP for Freebase and QA (2013) Paraphrase-driven SP (2014) Neural Semantic Parsing (2015) Haruki Kirigaya 2016.09.30 35 / 86
  47. Agenda 1 Semantics 2 Parsing Shift-Reduce CCG Word Alignments Semantic

    Parsing from Syntactic Parses Weak and Unsupervised Parser Paraphrase-driven Parsing Neural Semantic Parsing 3 Summary Haruki Kirigaya 2016.09.30 36 / 86
  48. Inductive Logic Programming (Zelle et al., 1993) Shift-Reduce is a

    simple bottom-up parsing. Each action correspond to a prolog clause. Haruki Kirigaya 2016.09.30 37 / 86
  49. Inductive Logic Programming (Zelle et al., 1993) CHILL(Constructive Heuristic Induction

    for Language Learning) Find Generalization: merge clauses not cover any negative sample. Reduce Definition: prefer new clause to prove positive examples Haruki Kirigaya 2016.09.30 38 / 86
  50. Agenda 1 Semantics 2 Parsing Shift-Reduce CCG Word Alignments Semantic

    Parsing from Syntactic Parses Weak and Unsupervised Parser Paraphrase-driven Parsing Neural Semantic Parsing 3 Summary Haruki Kirigaya 2016.09.30 40 / 86
  51. Combinatory Category Grammar (Steedman, 1996, 2000) CCG comes with a

    lexicon whose element is a pair of word and a category: borders := (S\NP)/NP : λx.λy.borders(y, x) word: borders syntactic type: (S\NP)/NP semantic type: λx.λy.borders(y, x) Haruki Kirigaya 2016.09.30 41 / 86
  52. Combinatory Category Grammar (Steedman, 1996, 2000) Categories can be combined.

    forward and backward application A / B : f + B : x ⇒ A : f(x) B : x + A \ B : f ⇒ A : f(x) forward and backword composition A / B : f + B / C : g ⇒ A / C : f ◦ g A \ B : f + B \ C : g ⇒ A \ C : f ◦ g type raising X ⇒ T/(T\X) Haruki Kirigaya 2016.09.30 41 / 86
  53. Semantic Parsing using CCG on GeoQuery Zettlemoyer and Collins, 2005

    Given the lexicon and model parameter, CCG is formulated as a log-linear probablistic model to deal with ambiguity, e.g. duplicated lexicon entries for a word, and spurious ambiguity: P(L, T | S; ¯ θ) = exp(¯ f (L, T, S) · ¯ θ) (L,T) exp(¯ f (L, T, S) · ¯ θ) And we can do inference on the model: L = arg max L P(L | S; ¯ θ) = arg max L T P(L, T | S; ¯ θ) Features are designed as local and thus we can use dynamic programming (beam-search acturally) and prune the search space (like CKY-style). Haruki Kirigaya 2016.09.30 42 / 86
  54. Learning the Model (Zettlemoyer et al. 2005) Learning the parameters

    using SGD. Haruki Kirigaya 2016.09.30 43 / 86
  55. Learning the Lexicon (Zettlemoyer et al. 2005) GENLEX(S, L) =

    {x := y | x ∈ W (S), y ∈ C(L)} W(S) is all subsequence of S C(L) produces categories using rules L triggered Haruki Kirigaya 2016.09.30 44 / 86
  56. Problems in ZC05 GENLEX is controlled by rules, and will

    be insufficient if the rules don’t cover all the (S, L) pairs. Examples Through which states does the Mississippi run. GENLEX doesn’t trigger a category suitable for the through-adjunct placed ahead. Namely, phrase order may be relaxed. Haruki Kirigaya 2016.09.30 45 / 86
  57. Relaxed Combinatory Rules (Zettlemoyer et al., 2007) relaxed function application

    relaxed function composition role-hypothesising type shifting (for missing predicates) null-head type shifting (for missing arguments) crossed functional composition Triggers are added for these new rules, too. Haruki Kirigaya 2016.09.30 46 / 86
  58. Online Learning (Zettlemoyer et al., 2007) Use a peceptron learning

    instead. New features are also added. Haruki Kirigaya 2016.09.30 47 / 86
  59. CCG Induction using Unification (Kwiatkowski et al. 2010) Unification in

    wikipedia: Unification is an algorithmic process of solving equations between symbolic expressions. e.g. {cons(x, cons(x, nil)) = cons(2, y)} ⇒ {x → 2, y → cons(2, nil)} Here unification aims to find f and g given h, s.t. h = λx.f (g(x)) or h = f (g). Haruki Kirigaya 2016.09.30 49 / 86
  60. CCG Induction using Unification (Kwiatkowski et al. 2010) Unification in

    wikipedia: Unification is an algorithmic process of solving equations between symbolic expressions. e.g. {cons(x, cons(x, nil)) = cons(2, y)} ⇒ {x → 2, y → cons(2, nil)} Here unification aims to find f and g given h, s.t. h = λx.f (g(x)) or h = f (g). For example, the given initial lexical entry New York borders Vermont S : next to(ny, vt) will be splitted as New York borders S/NP : λx.next to(ny, vt) Vermont NP : vt Haruki Kirigaya 2016.09.30 49 / 86
  61. CCG Induction using Unification (Kwiatkowski et al. 2010) Parsing with

    PCCG P(y, z | x; θ, Λ) = exp(θ · φ(x, y, z)) Z(y , z ) f (x) = arg max z p(z | x; θ, Λ) p(z | x; θ, Λ) = y p(y, z | x; θ, Λ) Again, to compute the parse efficiently, CKY-style parsing with dynamic programming summing over y with inside-outside algorithm Haruki Kirigaya 2016.09.30 50 / 86
  62. CCG Induction using Unification (Kwiatkowski et al. 2010) Learning algorithm:

    NEW-LEX will consider whether to split the lexical entries and gives new lexicon from arg maxy∗ p(y∗ | xi , zi ; θ , Λ ) Haruki Kirigaya 2016.09.30 50 / 86
  63. Split a lexicon (Kwiatkowski et al. 2010) Split a lexical

    entry: Step 1, function New York borders Vermont S : next to(ny, vt) unification constraints (otherwise infinite-result): No vacuous variables: g = λx.tex limited coordination extraction: g contains less than N adjuncts limited application: f contains no new variables for non-variable subexpression in h like h = λx.in(x, tex) f → λq.q(tex) g → λyλx.in(x, y) Haruki Kirigaya 2016.09.30 51 / 86
  64. Split a lexicon (Kwiatkowski et al. 2010) Split a lexical

    entry: Step 1, function New York borders Vermont S : next to(ny, vt) unification constraints (otherwise infinite-result): No vacuous variables: g = λx.tex limited coordination extraction: g contains less than N adjuncts limited application: f contains no new variables for non-variable subexpression in h like h = λx.in(x, tex) f → λq.q(tex) g → λyλx.in(x, y) we can get many (f, g) pairs, among which there is: f → λx.next to(ny, x) g → vt Haruki Kirigaya 2016.09.30 51 / 86
  65. Split a lexicon (Kwiatkowski et al. 2010) Split a lexical

    entry: Step 2, syntactic type New York borders Vermont S : next to(ny, vt) According to CCG combinatory rules(only 4 here), define SC (A) = {FA(A) ∪ BA(A) ∪ FC(A) ∪ BC(A)} FA(X : h) = {(X/Y : f , Y : g) | h = f (g) ∧ Y = C(T(g))} BA(X : h) = {(Y : g, X\Y : f ) | h = f (g) ∧ Y = C(T(g))} FC(X/Y : h) = {(X/W : f , W /Y : g) | h = λx.f (g(x)) ∧ W = C(T(g(x))) BC(X\Y : h) = {(W \Y : f , X\W : g) | h = λx.f (g(x)) ∧ W = C(T(g(x))) where T : F → {e, t, F} is the type function and C is defined as C(T) =      NP if T = e S if T = t C(T2)|C(T1) if T = T1, T2 Haruki Kirigaya 2016.09.30 52 / 86
  66. Split a lexicon (Kwiatkowski et al. 2010) Split a lexical

    entry: Step 2, syntactic type New York borders Vermont S : next to(ny, vt) These are some possible pair from the splitting set. Semantic: (λx.next to(ny, x), vt) Syntactic: (S/NP, NP) Semantic: (ny, λx.next to(x, vt)) Syntactic: (NP, S\NP) Semantic: (λx.next to(x, vt), ny) Syntactic: (S/NP, NP) Haruki Kirigaya 2016.09.30 52 / 86
  67. Split a lexicon (Kwiatkowski et al. 2010) Split a lexical

    entry: Step 3, word sequence New York borders Vermont S : next to(ny, vt) Splitting is defined as SL(w0:n A) = {(w0:i B, wi+1:n C) | 0 ≤ i < n ∧ (B, C) ∈ SC (A)} Haruki Kirigaya 2016.09.30 53 / 86
  68. Split a lexicon (Kwiatkowski et al. 2010) Split a lexical

    entry: Step 3, word sequence New York borders Vermont S : next to(ny, vt) Splitting is defined as SL(w0:n A) = {(w0:i B, wi+1:n C) | 0 ≤ i < n ∧ (B, C) ∈ SC (A)} For some specific i, the previous splits may raise problems. (S/NP : λx.next to(ny, x), NP : vt) Sequence: (New York borders, Vermont) (NP : ny, S\NP : λx.next to(x, vt)) Sequence: (New York, borders Vermont) (S/NP : λx.next to(x, vt), NP : ny) Sequence: (borders Vermont, New York) incorrect Haruki Kirigaya 2016.09.30 53 / 86
  69. Problems in Kwiatkowski et al. 2010 Learned CCG lexicon is

    too big. Haruki Kirigaya 2016.09.30 54 / 86
  70. Factored Lexicon in CCG (Kwiatkowski et al 2011) Original lexical

    entry: Boston N/N : λf λx.from(x, bos) ∧ f (x) Factored Parts: lexeme, pair of a word span and a constant list: (Boston, [from, bos]) template, λ(w, v).(w N/N : λf λx.v1(x, v2) ∧ f (x)) Haruki Kirigaya 2016.09.30 55 / 86
  71. Factored Lexicon in CCG (Kwiatkowski et al 2011) Original lexical

    entry: Boston N/N : λf λx.from(x, bos) ∧ f (x) Factored Parts: lexeme, pair of a word span and a constant list: (Boston, [from, bos]) template, λ(w, v).(w N/N : λf λx.v1(x, v2) ∧ f (x)) Two type of factorization: 1 maximal factor: all constants are in lexeme (Boston, [from, bos]), λ(w, v).(w N/N : λf λx.v1(x, v2) ∧ f (x)) 2 partial factor: some constants remain in the template (Boston, [bos]), λ(w, v).(w N/N : λf λx.from(x, v1) ∧ f (x)) Partial factor is used for missing words: flights Boston to New York Haruki Kirigaya 2016.09.30 55 / 86
  72. Factored Lexicon in CCG (Kwiatkowski et al 2011) Learning is

    similar but to consider factorization: Haruki Kirigaya 2016.09.30 56 / 86
  73. Ontological Mismatch Problem GeoQuery / ATIS dataset is too small.

    Learning a parser is easy for it. a few predicates a few utterances (more than predicate) Haruki Kirigaya 2016.09.30 57 / 86
  74. Ontological Mismatch Problem GeoQuery / ATIS dataset is too small.

    Learning a parser is easy for it. a few predicates a few utterances (more than predicate) If a database has more predicates and thus more capable to answer more questions in theory, the amount of possible utterance can go even further. Haruki Kirigaya 2016.09.30 57 / 86
  75. Ontological Mismatch Problem GeoQuery / ATIS dataset is too small.

    Learning a parser is easy for it. a few predicates a few utterances (more than predicate) If a database has more predicates and thus more capable to answer more questions in theory, the amount of possible utterance can go even further. What’s worse, new utterances linguistically involve more predicates in theory, but database schema is fixed and supports only limited predicates. Haruki Kirigaya 2016.09.30 57 / 86
  76. Ontological Mismatch Problem GeoQuery / ATIS dataset is too small.

    Learning a parser is easy for it. a few predicates a few utterances (more than predicate) If a database has more predicates and thus more capable to answer more questions in theory, the amount of possible utterance can go even further. What’s worse, new utterances linguistically involve more predicates in theory, but database schema is fixed and supports only limited predicates. parse to more predicates: unusable on databases parse to fit the schema: difficult to learn Haruki Kirigaya 2016.09.30 57 / 86
  77. SP with On-the-fly Matching(Kwiatkowski et al., 2013) Choose to convert

    Q1 to MR1, Q2 to MR2, then MR2 to MR1. Using Q-A pairs. Haruki Kirigaya 2016.09.30 58 / 86
  78. SP with On-the-fly Matching(Kwiatkowski et al., 2013) Parsing Framework Domain-independent

    parsing use a domain-independent CCG parser(Clark & Curran, 2007) to convert the utterance to underspecified LF, with a hand-written lexicon 59 lexical categories with POS tags, and assign to words based on POS tags from Wiktionary. 49 domain-independent lexical items (what, when, and, is, etc.) Ontological Matching Use a series of matching operations M = o1, o2, · · · Structural Match: Collapse Operator, Expansion Operator Constant Matching (replace in the same type) Haruki Kirigaya 2016.09.30 59 / 86
  79. SP with On-the-fly Matching(Kwiatkowski et al., 2013) Parsing and Learning:

    Parse(x, O) = arg maxd∈GEN(x,O) (Score(d)) Score(d) = φ(d)θ = φ(Π)θ + o∈M φ(o)θ Haruki Kirigaya 2016.09.30 59 / 86
  80. SP with On-the-fly Matching(Kwiatkowski et al., 2013) Features: CCG Parse

    Features(Pi): #u category, #(word, category), #(POS, category) Structural Features(M): identity of complex-typed const., domain-indep. const. Lexical Features(M): for (cu, cO), φnp, φstem, φsyn, φfp:stem, φdef overlap Knowledge Base Features: exec y on K, φdirect, φjoin, φempty , φ0, φ1 Haruki Kirigaya 2016.09.30 59 / 86
  81. Other CCG Works Artzi et al., TACL2013, Weakly Supervised Learning

    of Semantic Parsers for Mapping Instructions to Actions Modeling robot instruction and get feedback from robot action. Reddy et al., TACL2014, Large-scale Semantic Parsing without Question-Answer Pairs Use ClueWeb09 and FACC1, and general CCG parser to build a LF, which is then converted to an ungrounded graph sharing commonalities with Freebase. Artzi et al., EMNLP2015, Broad-coverage CCG Semantic Parsing with AMR To deal with co-reference in AMR, use Skolem Terms (Steedman, 2011) to build an underspecified LF, which is then mapped to specified LF. Haruki Kirigaya 2016.09.30 60 / 86
  82. Agenda 1 Semantics 2 Parsing Shift-Reduce CCG Word Alignments Semantic

    Parsing from Syntactic Parses Weak and Unsupervised Parser Paraphrase-driven Parsing Neural Semantic Parsing 3 Summary Haruki Kirigaya 2016.09.30 61 / 86
  83. Learning SP with SMT(Wong and Mooney, 2006) Synchronized CFG rule:

    X → α, β (pattern & template) Haruki Kirigaya 2016.09.30 62 / 86
  84. Learning SP with SMT(Wong and Mooney, 2006) Parsing: enumerate derivations

    that gives e, f f ∗ = m(arg max d∈D(G|e) Pr(d | e; λ)) Pr λ (d | e) = 1 Zλ(e) exp i λi fi (d) Features (3000 or so): number of each rule r is used in derivation number of each word w generated from gaps Parameter Estimation: Viterbi algorithm is used for efficient decoding. Since gold derivation is latent, EM is used to find the optimal parameters. Haruki Kirigaya 2016.09.30 62 / 86
  85. Learning SP with SMT(Wong and Mooney, 2006) Grammar rules acquisition:

    Step 1, suppose we know the grammar, LF can be convert to a production (rule) sequence of the left-most top-down derivation (by convention). We prefer derivation sequence over LF because MR may be not well-formed without grammar and MR tokens can be polysemy or carry no specific meaning. Examples (Sentence and LF from RoboCup) ((bowner our {4}) (do our {6} (pos (left (half our))))) If our player 4 has the ball, then our player 6 should stay in the left side of our half. Haruki Kirigaya 2016.09.30 63 / 86
  86. Learning SP with SMT(Wong and Mooney, 2006) Grammar rules acquisition:

    Step 2, use GIZA++ to find alignment between words and derivation rules. Haruki Kirigaya 2016.09.30 63 / 86
  87. Learning SP with SMT(Wong and Mooney, 2006) Grammar rules acquisition:

    Step 3, extract bottom up, starting with rules where RHS contains only terminals, then those whose RHS contains non-terminals. UNUM → 4, 4 TEAM → our, our CONTITION → TEAM player UNUM has {1} ball , (bowner TEAM {UNUM}) Haruki Kirigaya 2016.09.30 63 / 86
  88. Learning SP with SMT(Wong and Mooney, 2006) Grammar rules acquisition:

    Step 3 special case: rule that doesn’t derive any terminal, and that break links outside this sub-parse tree. Merge the rules as: REGION → (left (penalty-area TEAM) For excessively merged rules (overfitting), try a greedy link-removal policy (alignment fixing). Haruki Kirigaya 2016.09.30 63 / 86
  89. SCFG with Lambda Calculus(Wong and Mooney, 2007) Use lambda expression

    for semantic function instead. To improve NL-MR isomorphism, find a MST on a graph where edges between rules are established for any shared variable and weighted by the minimal word distance. Haruki Kirigaya 2016.09.30 64 / 86
  90. Data Recombination (Jia and Liang, 2016) Fit a new model

    in SCFG using 3 kinds of policies. Then draw training examples from it. Haruki Kirigaya 2016.09.30 65 / 86
  91. Generative Models Learning with Less Supervision (Liang et al., 2009)

    Given a World State paired with several sentences. p(r, f , c, w | s) = p(r | s)p(f | r)p(c, w | r, f , s) Haruki Kirigaya 2016.09.30 66 / 86
  92. Agenda 1 Semantics 2 Parsing Shift-Reduce CCG Word Alignments Semantic

    Parsing from Syntactic Parses Weak and Unsupervised Parser Paraphrase-driven Parsing Neural Semantic Parsing 3 Summary Haruki Kirigaya 2016.09.30 67 / 86
  93. Using an Syntactic Parse(Ge and Mooney, 2009) Parser Components: an

    existing syntactic parser a learned lexicon from words to predicates a learned set of composition rules Assumption: unambiguous CFG of LF is known Haruki Kirigaya 2016.09.30 68 / 86
  94. Using an Syntactic Parse(Ge and Mooney, 2009) Not all semantic

    sub-tree strictly follows the syntactic derivation. Introduce macro-predicates when children MRs can’t combine. become as an argument if the child MR is complete otherwise become as part of the predicate Haruki Kirigaya 2016.09.30 68 / 86
  95. Using an Syntactic Parse(Ge and Mooney, 2009) Learning: lexicon is

    learned with GIZA++ (like Wong and Mooney, 2006) if a predicate is not aligned to any word, the predicate is inferred and just bound to their values in MR if a predicate is aligned to several word, split it to several alignments Haruki Kirigaya 2016.09.30 69 / 86
  96. Using an Syntactic Parse(Ge and Mooney, 2009) Learning: lexicon is

    learned with GIZA++ (like Wong and Mooney, 2006) if a predicate is not aligned to any word, the predicate is inferred and just bound to their values in MR if a predicate is aligned to several word, split it to several alignments composition rules is learned in the form Λ1.P1 + Λ2.P2 ⇒ {Λp.Pp, R} λa1λa2 .P PLAYER + P UNUM ⇒ {λa1.P PLAYER, a2 = c2} disambiguation model: max-ent with L-BFGS Pr(D | S, T; ¯ θ) = exp i θi fi (D) Z(S, T) Haruki Kirigaya 2016.09.30 69 / 86
  97. Transforming Dep. Parse (Reddy et al., 2016) Parsing: binarize dep

    parse to S-expr. substitution symbols with λ-expr hierarchical composition(beta-reduction) Haruki Kirigaya 2016.09.30 70 / 86
  98. Transforming Dep. Parse (Reddy et al., 2016) Parsing: Follow Reddy

    et al. 2014 to transform to grounded graph. New operators to deal with mismatch between dep and semantic parse: CONTRACT: merge some nodes and edges into a single node EXPAND: add edges for some disjoint nodes (errors in depparse) Haruki Kirigaya 2016.09.30 70 / 86
  99. Other works for SynParse to SemParse Incremental Parser for AMR

    (Damonte et al. 2016) A greedy transition-based parser for AMR, inspired by former work of transition-based syntactic parser ArcEager (Nivre 2004, 2008), using an existing dep-parser. Imitation learning for AMR(Goodman et al., 2016) A transition-based parser with imitation learning, extended by techniques like noise reduction and targeted exploration, using an existing dep-parser. Haruki Kirigaya 2016.09.30 71 / 86
  100. Agenda 1 Semantics 2 Parsing Shift-Reduce CCG Word Alignments Semantic

    Parsing from Syntactic Parses Weak and Unsupervised Parser Paraphrase-driven Parsing Neural Semantic Parsing 3 Summary Haruki Kirigaya 2016.09.30 72 / 86
  101. SP from World’s Response(Clarke et al., 2010) Parsing: ˆ z

    = Fw (x) = arg maxy∈Y ,z∈Z wT Φ(x, y, z) Learning: Haruki Kirigaya 2016.09.30 73 / 86
  102. SP from World’s Response(Clarke et al., 2010) Parsing: In order

    to adapt to unseen inputs, consider the entire meaning space instead of rule extraction from training data. Fw (x) = arg maxy,z wT Φ(x, y, z) = arg maxα,β c∈X s∈D αcs · wT Φ1 + c,d∈X s,t∈D βcs,dt · wT Φ2 Haruki Kirigaya 2016.09.30 73 / 86
  103. SP from World’s Response(Clarke et al., 2010) Parsing: In order

    to adapt to unseen inputs, consider the entire meaning space instead of rule extraction from training data. Fw (x) = arg maxy,z wT Φ(x, y, z) = arg maxα,β c∈X s∈D αcs · wT Φ1 + c,d∈X s,t∈D βcs,dt · wT Φ2 α: word span c aligned with symbol s β: word span d aligned with t, when α is activated such that A consituent is associated with 1 symbol beta(cs,dt) activated iff. alpha(cs) and alpha(dt) activated beta(cs,dt) activated then s is a function and (s, t) is type-consistent functional composition is directional and acyclic Haruki Kirigaya 2016.09.30 73 / 86
  104. SP from World’s Response(Clarke et al., 2010) Parsing: In order

    to adapt to unseen inputs, consider the entire meaning space instead of rule extraction from training data. Fw (x) = arg maxy,z wT Φ(x, y, z) = arg maxα,β c∈X s∈D αcs · wT Φ1 + c,d∈X s,t∈D βcs,dt · wT Φ2 Features Used: 1st-order: stemmed word match 1st-order: similarity based on WordNet (Do et al. 2010) 2nd-order: normalized distance of the head words in c and d for beta(cs, dt) on the dependency tree of sentence 2nd-order: symbol concurrence frequency (regardless of alignments) Haruki Kirigaya 2016.09.30 73 / 86
  105. Confidence-driven Unsupervised SP (Goldwasser 2011) What is “confidence-driven unsupervised method”:

    Idea: if a pattern is produced multiple times from non-random model, it is likely to be an indication of an underlying phenomenon in the data. Confidence: Output structures close to the center of statistic mass will receive a high confidence score. Confidence-driven: the model will be significantly improved compared with using only prediction score wT Φ(x, y, z) Parsing is the same with Clarke et al. 2010, formulated as an ILP problem. Haruki Kirigaya 2016.09.30 74 / 86
  106. Confidence-driven Unsupervised SP (Goldwasser 2011) Confidence: (1). translation model unigram

    p(z | x) = |z| i=1 p(si | y(si )) bigram p(z | x) = |z| i=1 p(si−1(si ) | y(si−1), y(si )) (2). structural proportion Prop(x, z): proportion of #pred in z and #words in x AvProp(S): Average over sets PropScore(S, (x,z)) = AvProp(S) - Prop(x, z) combined: use (2) to filter candidates and (1) to rank items Haruki Kirigaya 2016.09.30 74 / 86
  107. Grounded Unsupervised SP(Poon, 2013) Parsing Idea: Annotate states to nodes

    and edges of a dep-parse No need to train specific tokens: datetime, numerics, logical op. States: “states” are from DB schema: entity / attribute complex states are included for mismatching of depparse & semantic Inference: z∗ = arg max z Pθ(d, z) Pθ(d, z) = 1 Z exp i fi (d, z) · wi (d, z) Learning: θ∗ = arg max d∈D log z Pθ(d, z) Haruki Kirigaya 2016.09.30 75 / 86
  108. Grounded Unsupervised SP(Poon, 2013) get flight from toronto to san

    diego stopping in dtw. Haruki Kirigaya 2016.09.30 75 / 86
  109. SP on Freebase and QA Cai and Yates, 2013a, 2013b

    They discuss methods to use an existing parser in new domain. Berant et al., 2013 Collect WebQuestions dataset. Yih et al., 2014 A CNN-based Semantic Model (CNNSM) Yih et al., 2015 Staged Query Graph Generation. Find a core inferential chain executed on Freebase. Pasupat and Liang, 2015 SP on semi-structured tables. Propose a dataset of tables and adopt a method to convert tables to knowledge graph first. Haruki Kirigaya 2016.09.30 76 / 86
  110. Agenda 1 Semantics 2 Parsing Shift-Reduce CCG Word Alignments Semantic

    Parsing from Syntactic Parses Weak and Unsupervised Parser Paraphrase-driven Parsing Neural Semantic Parsing 3 Summary Haruki Kirigaya 2016.09.30 77 / 86
  111. Paraphrase Comparision (Berant and Liang, 2014): paraphrase independent with KB,

    using two kinds of templates (Chen et al. 2016): direct paraphrase, using wiktionary Haruki Kirigaya 2016.09.30 78 / 86
  112. SP via Paraphrasing(Berant and Liang, 2014) pθ(c, z | x)

    = 1 Z exp(φ(x, c, z)T θ) φ(x, c, z)T θ = φpr (x, c)T θpr + φlf (x, z)T φlf Haruki Kirigaya 2016.09.30 79 / 86
  113. Building SP Overnight (Wang et al. 2015) Build an SP

    for a new Domain (8 published datasets): Human write a lexicon The domain-general grammar G induces canonical LFs and utterances Crowdsourcing to rewrite the awkward utterances into fluent ones Train a parser using the grammar G, by paraphrasing Haruki Kirigaya 2016.09.30 80 / 86
  114. Agenda 1 Semantics 2 Parsing Shift-Reduce CCG Word Alignments Semantic

    Parsing from Syntactic Parses Weak and Unsupervised Parser Paraphrase-driven Parsing Neural Semantic Parsing 3 Summary Haruki Kirigaya 2016.09.30 81 / 86
  115. Sequence-based SP (Xiao et al., 2016) Vinyals et al. 2015

    proved successful to use sequence model on grammar parsing. Xiao et al. compaired various sequence forms based on SPO(Wang et al. 2015). LF as raw token sequence DSP (Derivation Sequence Prediction) DSP-C (Constrained), use grammar to constrain the next rule at testing time DSP-CL (Constrained Loss), p (yt) is normalized only over possible values CFP (Canonical Form Prediction), predict CF instead which is then parsed to LF Haruki Kirigaya 2016.09.30 82 / 86
  116. Parsing with Neural Attention(Dong and Lapata, 2016) The Seq2Tree model

    that also learns latent grammar. Haruki Kirigaya 2016.09.30 83 / 86
  117. Result Comparison AMR Parsing(F1: 70 Goodman et al. 2016) and

    WebQuestions(F1: 52.5 Yih et al. 2015) Haruki Kirigaya 2016.09.30 84 / 86
  118. Summary Problems: LF not Well-Formed particularly in Neural SP methods,

    use existing grammar or learned grammar. Ontology Mismatching paraphrasing or other two-phase parsing. Utterance Explosion prefer expansion from meaning space over rule extraction from training data. Isomorphism between NL(or Syntactic Parse) and Semantic Parse add relaxing extensions because NL isn’t strict and syn-parse may introduce errors. Haruki Kirigaya 2016.09.30 86 / 86