Parsing Language Structures and Meanings @ 2022.09

Parsing Language Structures and Meanings Thursday, July 3, 2025 1
[email protected] July 3, 2025

oTutorial: Chart Parsing oUnsupervised Grammar Induction oMigration to Semantic Parsing
oWhat's Next Thursday, July 3, 2025 2 Agenda

oA recogniser determines if a string belongs to a grammar.
o <----------- Parsing -----------> o <--- Recognition ---><------> oEarley Recogniser oCYK Recogniser oThe Semi-ring Parsing and Inside Algorithm Thursday, July 3, 2025 4 Chart Parser Tutorial

Thursday, July 3, 2025 5 Complete Earley Parser ┌───┬───┬───┬───┬───┬───┬───┬───┬───┐ │
1 │ + │ ( │ 2 │ * │ 3 │ - │ 4 │ ) │ └───┴───┴───┴───┴───┴───┴───┴───┴───┘ Sum -> Sum [+-] Product Sum -> Product Product -> Product [*/] Factor Product -> Factor Factor -> '(' Sum ')' Factor -> Number Number -> [0-9] Number Number -> [0-9]

Thursday, July 3, 2025 6 Chart Parser (Partial Parses) avoid
most unnecessary work by not even trying a whole slew of hopeless partial parses like this one

oEarley Item oState Set Thursday, July 3, 2025 7 Earley
Recognition (1/3) Sum -> Sum • [+-] Product (0) the number where the item starts ┌───┬───┬───┬───┬───┬───┬───┬───┬───┐ │ 1 │ + │ ( │ 2 │ * │ 3 │ - │ 4 │ ) │ └───┴───┴───┴───┴───┴───┴───┴───┴───┘ S0 S1 S2 Completed if: Sum -> Sum [+-] Product • (0) S9 how much the item has been parsed

ofor s in state set, for item in s o
Prediction: add rule of the next non-terminal to the current set o Scan: move fat dot forward, add this item into the next set. Thursday, July 3, 2025 8 Earley Recognition (2/3) S0 initialized with • start rule, • (0), • leading fat dot predicted S1 scanned ┌───┬───┬───┬───┬───┬───┬───┬───┬───┐ │ 1 │ + │ ( │ 2 │ * │ 3 │ - │ 4 │ ) │ └───┴───┴───┴───┴───┴───┴───┴───┴───┘

ocompletion when the dot is at the end Thursday, July
3, 2025 9 Earley Recognition (3/3) S0 S1

oThe Cocke–Younger–Kasami algorithm for ambiguous strings o bottom-up parsing o
Rules are in the Chomsky Norm Form Thursday, July 3, 2025 10 CYK Recognition Illust. S → PRP VP NP → PRP PP NP → DT NNS VP → VBD PRP VP → VBD NP VP → VP PP PP → IN NP l=5 l=4 l=3 l=2 l=1 l=0 PRP VBD NN PRP IN DT NNS I saw him with the binoculars

Thursday, July 3, 2025 11 CYK Recognition Illust. (l=1) l=5
l=4 l=3 l=2 l=1 VP NP l=0 PRP VBD NN PRP IN DT NNS I saw him with the binoculars S → PRP VP NP → PRP PP NP → DT NNS VP → VBD PRP VP → VBD NP VP → VP PP PP → IN NP

Thursday, July 3, 2025 12 CYK Recognition Illust. (l=2) S
→ PRP VP NP → PRP PP NP → DT NNS VP → VBD PRP VP → VBD NP VP → VP PP PP → IN NP l=5 l=4 l=3 l=2 S PP l=1 VP NP l=0 PRP VBD NN PRP IN DT NNS I saw him with the binoculars

→ PRP VP NP → PRP PP NP → DT NNS VP → VBD PRP VP → VBD NP VP → VP PP PP → IN NP l=5 l=4 l=3 NP l=2 S PP l=1 VP NP l=0 PRP VBD NN PRP IN DT NNS I saw him with the binoculars

→ PRP VP NP → PRP PP NP → DT NNS VP → VBD PRP VP → VBD NP VP → VP PP PP → IN NP l=5 l=4 VP/VP l=3 NP l=2 S PP l=1 VP NP l=0 PRP VBD NN PRP IN DT NNS I saw him with the binoculars

→ PRP VP NP → PRP PP NP → DT NNS VP → VBD PRP VP → VBD NP VP → VP PP PP → IN NP l=5 S l=4 VP l=3 NP l=2 S PP l=1 VP NP l=0 PRP VBD NN PRP IN DT NNS I saw him with the binoculars

CYK Recognition Code 4 3 2 1 i+0 i+1 i+2
i+3 Thursday, July 3, 2025 16 1. chart[1..n, 1..n, 1..V] = False 2. for p = 1 to n: 3. for rule A->w_p in rules: 4. chart[1, p, A] := True 5. for l = 2 .. n: 6. for p = 1 .. n - l + 1: 7. for s = 1 .. l - 1: 8. for rule A->BC : 9. chart[l, p, A] = 10. chart [l, p, A] or 11. chart[s, p, B] and 12. chart[l-s, p+s, c] 13. 14. return chart[n, 1, S] The ||| cat is jumping The cat ||| is jumping The cat is ||| jumping

oSemi-ring: (A, ⊕, ⊗, 0, 1) o ⊕: commutative o
⊗: associative o CYK recognition is ({True, False}, OR, AND, True, False) oInside Algorithm o (ℝ⩾0 ⋃{+∞}, +, *, 0, 1) o for all marginal tree weights oState Transition Equation: Thursday, July 3, 2025 17 Semiring-based Parsing Ak i = B C j πA→BCBj i Ck j+1

oWhat do we mean when we say "grammar"? o Conventional
Rules, with less observation o A mixed idea of language without pragmatics in the ESL education o Explain the human language ability as suggested by generative grammar oLearning grammar rules from data o Usually, the grammar formalization is assumed beforehand oSelected Works o DIORA (NAACL 2019) o C-PCFG (ACL 2019) and TD-PCFG (NAACL 2021) o Perturb-and-Parse (ACL 2019) o R2D2 (ACL 2021) Thursday, July 3, 2025 19 Grammar Induction

Thursday, July 3, 2025 20 Techniques for Grammar Induction Zhaofeng
Wu. 2022. Learning with Latent Structures in Natural Language Processing: A Survey. arXiv:2201.00490 DIORA C-PCFG TD-PCFG R2D2 Perturb- and-Parse

Thursday, July 3, 2025 21 Unsupervised Latent Tree Induction with
Deep Inside-Outside Recursive Autoencoders NAACL 2019

oClassical parsers o require annotated treebanks limited in size and
domain oLatent tree parsers o produce representations for all internal nodes o each generated with a soft weighting over all possible sub-trees o requires sentence level annotations for training (usually labels for downstream tasks, such as NLI) oPrevious works o predict trees not aligned with known treebanks o no mechanism to model phrases, requiring a complex procedure to extract syntactic structures (such as ON-LSTM) Thursday, July 3, 2025 22 DIORA: Motivations

oProcedure Illustration oLosses: Thursday, July 3, 2025 23 DIORA: Method

Thursday, July 3, 2025 24 Compound Probabilistic Context-Free Grammars for
Grammar Induction ACL 2019

oDirect methods are found difficult to induce PCFG from data
o Ill-behaved optimization landscape o Overly strict independence assumptions of PCFGs oSuccessful approaches resort to o carefully-crafted auxiliary objectives o priors or non-parametric models o manually engineered features oThey propose to o parameterizing PCFG with neural networks makes it possible to induce linguistically meaningful grammars by simply optimizing log-likelihood o to incorporate side information is straight-forward Thursday, July 3, 2025 25 C-PCFG: Motivations

oNeural Parameterization (N-PCFG) Thursday, July 3, 2025 26 C-PCFG: Models
(1)

oCompound PCFGs (C-PCFG) o for grammar induction, first-order context-free assumption
is adopted not because its adequacy but its tractability. o C-PCFG is a restricted version of some higher-order PCFG Thursday, July 3, 2025 27 C-PCFG: Models (2)

oFor a simple N-PCFG: oFor the compound PCFG o is
intractable, resort to a collapsed amortized variational inference instead oFor inference, use the mean vector to approximate z Thursday, July 3, 2025 28 C-PCFG: Training log pθ (x) = log t∈TG(x) pθ (t) log pθ (x) = log   t∈TG(x) pθ (t | z)pγ (z)dz   Eqφ (z|x) [log pθ (x | z)] − KL [qφ (z | x) pγ (z)]

Thursday, July 3, 2025 29 PCFGs Can Do Better: Inducing
Probabilistic Context-Free Grammars with Many Symbols NAACL 2021

oInside algorithm is cubic computational complexity o e.g. C-PCFG uses
30 non-terminals and 60 pre-terminals oMore symbols are important: o Dividing PTB categories into subtypes improves parsing o Increasing the number of hidden states is helpful for learning latent variables Thursday, July 3, 2025 30 TD-PCFG: Motivations

oKruskal Decomposition oApplied to the state-transition equation o We have
o where U is row-normalized, and V, W are column-normalized Thursday, July 3, 2025 31 TD-PCFG: Methods T = d l=1 T(l) T(l) ijk = u(l) i · v(l) j · w(l) k Ak i = B C j πA→BCBj i Ck j+1 Sik = U j (V T Sij ) (WT Sjk ), U ∈ Rn×d, V, W ∈ Rm×d

Empirical Results Thursday, July 3, 2025 32

Thursday, July 3, 2025 33 Learning Latent Trees with Stochastic
Perturbations and Differentiable Dynamic Programming ACL 2019 Differentiable Perturb-and-Parse: Semi-Supervised Parsing with A Structured Variational Autoencoder ICLR 2019

Dependency Parsing o Collins' algorithm o Space: O(N3) o Time:
O(N5) Thursday, July 3, 2025 34 t1 t2 min max =min+L mid l r

Thursday, July 3, 2025 35 Eisner's Algorithm Illust. t1 t2
min max =min+L mid l r r l mid min l r r min max root

Comparison o Collins' Algorithm o Space: O(N3) o Time: O(N5)
o Chart: [min, max, head] o Eisner's Algorithm o Space: O(N2) o Time: O(N3) o Chart: [min, max, dir, comp] Thursday, July 3, 2025 36

oPrevious work on discrete structures o require treebank annotations limited
in size and domain oLinguistic structures trained for downstream tasks o provide an inductive bias specifying structures o not making any assumptions regarding what the structures represent osample global structures in a differentiable way Thursday, July 3, 2025 37 Perturb-and-Parse: Motivations

oTree distribution models oOptimization with o Monte-Carlo estimates o Gumbel
perturbation o Softmax instead of argmax (in Eisner), though the output T is not valid dependency trees anymore, but a soft selection of arcs instead. Thursday, July 3, 2025 38 Perturb-and-Parse: Methods

Perturb-and-Parse: Results Thursday, July 3, 2025 39

Thursday, July 3, 2025 40 Perturb-and-Parse: Results

Thursday, July 3, 2025 41 R2D2: Recursive Transformer based on
Differentiable Tree for Interpretable Hierarchical Language Modeling ACL 2021

oHuman language is assumed to possess a recursive hierarchical structure.
oPretrained-LMs o has fixed depth and requires positional embeddings o do not explicitly reflect the hierarchical structures oFully differentiable CKY o is O(N3) and hard to scale up oContributions: o Recursive Transformers learn both representations and structures o an efficient optimization algorithm O(n) to scale up o an effective training objective Thursday, July 3, 2025 42 R2D2: Motivations

R2D2: Models Thursday, July 3, 2025 43 Ak i =
B C j πA→BCBj i Ck j+1

Thursday, July 3, 2025 44 R2D2: Pruning

R2D2: Experiments Thursday, July 3, 2025 45

Semantic Parsing as A Meaning Surrogate o Formal Language is
artificial o targeted and specialized o not exactly equivalent to NL o Analysis of NL by FL is barking up the wrong tree o Semantic Parsing knows little semantic o semantic doesn't have to be composed o at least, semantic without pragmatics is not meaning o Assuming o semantic parsing is enough o within application domains, ad-hoc process requirements Thursday, July 3, 2025 47 Meaning/Mind Formal Language Natural Language Formal Semantic Parsing Model Theory Understanding

Thursday, July 3, 2025 48 Intrinsic Features of Formal Semantic
Parsing Natural Language Lexical Structural Inference Formal Semantic Parsing Lexical Gap Ontology Gap Structural Gap Formal Representations Pre-defined CFG Semantic Definition Learning Supervised Semi-supervised Weakly Supervised Application Situated Env Robot, VR Executable CodeGen, KBQA Cross-Domain Contextual Parsing Systematic Generalization

oCharacterizing Mapping Objects (1970s, 1993-2014,2016) o Rules or Lexicons (words
to semantics, syntactic trees to semantics) o CCG / SCFG / HRG / AM Algebra oMapping as the probabilistic model: (2010-) o Log-linear models/hybrid trees/generative models o Agenda-based Parsing/Float Parser/Transition-based Parsers o Neural Nets oMapping with pattern templates(by intermediate repr.): (2014-) o Paraphrasing o Factored CCG/Sketches/Intermediate Grammar oAlignment is found useful again (2020-) Thursday, July 3, 2025 49 Mapping-centric Perspective

NP NP PP NP of the population the capital of
the smallest state NP NP PP NP of the capital the smallest state NP NP PP NP of the population the largest city (a) (b) (c) Training Testing capital:c population:i argmin state size $1 $1 $1 Isomorphic Semantic SQL SELECT POPULATION FROM CITY WHERE CITY_NAME = ( SELECT CAPITAL FROM STATE WHERE AREA = ( SELECT MIN( STATE1.AREA ) FROM STATE AS STATE1 ) ) ; Thursday, July 3, 2025 50 Compositional Generalization

Structure Model Insufficiency Thursday, July 3, 2025 51

Thursday, July 3, 2025 52 Syntactic Ambiguity Meme

oSupervised softmax (NAACL 2021) oAlgebraic Recombination (ACL 2021) oSpanBasedSP (ACL
2021) oLAGr: Label Aligned Graphs (ACL 2022) Thursday, July 3, 2025 53 Mapping-centric Models

Thursday, July 3, 2025 54 Compositional Generalization for Neural Semantic
Parsing via Span- level Supervised Attention NAACL 2021

Supervised Attention: Method Thursday, July 3, 2025 55

Thursday, July 3, 2025 56 Supervised Attention: Results

Thursday, July 3, 2025 57 Span-based Semantic Parsing for Compositional
Generalization ACL 2021

oSpans mapped to: domain categories, join, and ∅ o Hard-EM
for training without tree supervision o CKY-style inference Thursday, July 3, 2025 58 SpanBasedSP: Methods

Thursday, July 3, 2025 59 SpanBasedSP: Results

Thursday, July 3, 2025 60 Learning Algebraic Recombination for Compositional
Generalization ACL 2021

oCompositional generalization requires algebraic recombination o model semantic parsing as
a homomorphism between algebra oSyntactic algebra o o latent and learnt from data oSemantic algebra o M=<M,G> o by enumerating all available semantic primitives and operations oHomomorphism mapping Thursday, July 3, 2025 61 AlgeRecom: Motivations L =< L, (fγ )γ ∈ Γ >, fγ : Lk → L

AlgeRecom: Methods o Model: composer+interpreter o Composer: Latent Tree-LSTM o
Interpreter: o lexical nodes o algebraic nodes Thursday, July 3, 2025 62

AlgeRecom: Setup o Training: o REINFORCE with o Logic-based reward
o Primitive-based reward o Training techniques o space pruning o curriculum learning Thursday, July 3, 2025 63

AlgeRecom: Results Thursday, July 3, 2025 64

Thursday, July 3, 2025 65 LAGr: Label Aligned Graphs for
Better Systematic Generalization in Semantic Parsing ACL 2022

oIntuition: A model o that predicts such aspects of meaning
independently o can be better at learning context-insensitive rules. oSemantic parses as graphs haven't been tested on systematic generalization oExisting methods (e.g. SpanBasedSP) raise complexities and rigidities against seq2seq models. oLatent Alignment o inferred with an MAP algorithm involving minimum cost bipartite matching problems with the Hungarian algorithm Thursday, July 3, 2025 66 LAGr: Motivations

oSentence: 𝑥 = 𝑥! , 𝑥" , … , 𝑥#
oThe graph o with M = L * N nodes arranged in L layers o indicating node labels and edge labels Thursday, July 3, 2025 67 LAGr: Formalization Γa Γa = (z, ξ), z ∈ V M n , ξ ∈ V M×M e

oFor weakly-supervised cases, 𝛤na = (s, e) is assumedly produced
by permuting the columns of a latent aligned graph 𝛤a . oThe permutation denoted as a o aj is the index of column in 𝛤a that becomes the j-th column of 𝛤na oThen the (intractable) model is Thursday, July 3, 2025 68 LAGr: Weakly-supervised Formalization

oUse the MAP alignment: oTraining Objective: oMAP detail: Thursday, July
3, 2025 69 LAGr: MAP inference ˆ a = arg max a p(a | e, s, x)

Thursday, July 3, 2025 70 LAGr: Results (1/2)

Thursday, July 3, 2025 71 LAGr: Results (2/2)

oNon-Isomorphic Semantic Representations Analysis oMapping with Latent Alignments oMapping through
NPDA model oDomain Adaptation following Game-theoretic Semantics and Game-based WSD Thursday, July 3, 2025 73 What's Next

Thursday, July 3, 2025 74 Hungarian Latent Alignments on Hungarian

Thursday, July 3, 2025 76 Thanks!

Parsing Language Structures and Meanings @ 2022.09

Parsing Language Structures and Meanings @ 2022.09

More Decks by Haruki Kirigaya

Other Decks in Research

Featured

Transcript