David Nolen on Parsing With Derivatives

August 24, 2016

2.7k

David Nolen on Parsing With Derivatives

We present a functional approach to parsing unrestricted context-free grammars based on Brzozowski's derivative of regular expressions. If we consider context-free grammars as recursive regular expressions, Brzozowski's equational theory extends without modification to context-free grammars (and it generalizes to parser combinators). The supporting actors in this story are three concepts familiar to functional programmers - laziness, memoization and fixed points; these allow Brzozowski's original equations to be transliterated into purely functional code in about 30 lines spread over three functions.

Yet, this almost impossibly brief implementation has a drawback: its performance is sour - in both theory and practice. The culprit? Each derivative can double the size of a grammar, and with it, the cost of the next derivative.

Fortunately, much of the new structure inflicted by the derivative is either dead on arrival, or it dies after the very next derivative. To eliminate it, we once again exploit laziness and memoization to transliterate an equational theory that prunes such debris into working code. Thanks to this compaction, parsing times become reasonable in practice.

We equip the functional programmer with two equational theories that, when combined, make for an abbreviated understanding and implementation of a system for parsing context-free languages.

Papers_We_Love

August 24, 2016

Tweet

More Decks by Papers_We_Love

See All by Papers_We_Love

What About the Natural Numbers by José Manuel Calderón Trilla

1

140

Is Program Analysis The Silver Bullet Against Software Bugs? by Karim Ali

2

330

On the Expressive Power of Programming Languages by Shriram Krishnamurthi

1

290

Anonymity in the Bitcoin Peer-to-Peer Network by Giulia Fanti

0

280

Building Personable Machines by Star Simpson

0

78

PWL SF 03/18 > Cathie Yun on "Bulletproofs: Short Proofs for Confidential Transactions and More"

1

390

Bonnie Eisenman on Multiphase Numerical Modeling... for Jigsaw Puzzle Generation

0

710

Suz Hinton on Accessible images (AIMS)

0

1.1k

Hannes Frederic Sowa on "BBR: Congestion-Based Congestion Control"

0

1.6k

Other Decks in Programming

See All in Programming

AIエージェントはこう育てる - GitHub Copilot Agentとチームの共進化サイクル

0

620

GitHub Copilot and GitHub Codespaces Hands-on

2

150

iOS 26にアップデートすると実機でのHot Reloadができない？

0

140

Rubyでやりたい駆動開発 / Ruby driven development

1

750

#kanrk08 / 公開版 PicoRubyとマイコンでの自作トレーニング計測装置を用いたワークアウトの理想と現実

1

900

ニーリーにおけるプロダクトエンジニア

0

890

明示と暗黙ー PHPとGoのインターフェイスの違いを知る

2

590

CDK引数設計道場100本ノック

2

360

A full stack side project webapp all in Kotlin (KotlinConf 2025)

0

130

ペアプロ × 生成AI 現場での実践と課題について / generative-ai-in-pair-programming

2

20k

Azure AI Foundryではじめてのマルチエージェントワークフロー

0

190

はじめてのWeb API体験ー飲食店検索アプリを作ろうー

0

130

Featured

See All Featured

Mobile First: as difficult as doing things right

223

9.7k

Helping Users Find Their Own Way: Creating Modern Search Experiences

29

2.7k

Refactoring Trust on Your Teams (GOTO; Chicago 2020)

34

3.1k

Responsive Adventures: Dirty Tricks From The Dark Corners of Front-End

251

21k

[Rails World 2023 - Day 1 Closing Keynote] - The Magic of Rails

35

2.4k

A designer walks into a library…

pauljervisheath

207

24k

455

42k

89

9.7k

[RailsConf 2023] Rails as a piece of cake

55

5.7k

What’s in a name? Adding method to the madness

productmarketing

23

3.5k

A Modern Web Designer's Workflow

695

190k

What's in a price? How to price your products and services

246

12k

Transcript

Parsing with Derivatives David Nolen Papers We Love NYC, August
2016
Parsing with Derivatives David Nolen Papers We Love NYC, August
2016
Parsing with Derivatives David Nolen Papers I Should Read NYC,
August 2016
None
None
None
None
None
None
None
None
None
None
None
None
Overview • Preliminaries • Brzozowski’s derivative • Derivatives of context-free
languages • Parsers & parser combinators • Derivatives of parser combinators
• Performance and complexity • Compaction
Preliminaries
A language L is a set of strings
{foo, bar} {cat, dog} {papers, we, love}
A string w is a sequence of characters from an
alphabet A
2 typical atomic languages • The empty language, ∅, contains
no strings • ∅ = {} • The null language 㸜 contains only the length zero “null” string • 㸜 = {w} where length(w) = 0
Given an alphabet A there is a singleton language for
every character c in the alphabet c ≡{c}
None
union → alt concatenation → cat Kleene star → rep
None
None
Brzozowski’s derivative
The derivative of a language L with respect to character
c is a new language that has been “ﬁltered” and “chopped” Dc(L)
None
To determine membership, derive a language with respect to each
character, and check if the ﬁnal language contains the null string: if yes, the original string was in; if not, it wasn’t.
A recursive deﬁnition of the derivative
None
None
None
None
None
Derivatives of context- free languages
Laziness
Memoization
None
None
Parser & Parser Combinators
None
None
Derivatives of parser combinators
None
None
None
None
Performance & complexity
O(n2nG + (2nG)2) = O(22nG2)
Compaction
None
None
Demo
Thanks!
Questions?