Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Parsing for Humans
Search
Yorick Peterse
May 20, 2014
Programming
2
92
Parsing for Humans
Presentation for the Amsterdam Ruby user group.
Yorick Peterse
May 20, 2014
Tweet
Share
More Decks by Yorick Peterse
See All by Yorick Peterse
Garbage Collection Crash Course
yorickpeterse
1
370
Making GitLab Faster
yorickpeterse
2
450
Rubinius & The Eternal Yak
yorickpeterse
1
250
Oga
yorickpeterse
3
190
Other Decks in Programming
See All in Programming
Rubyで始める関数型ドメインモデリング
shogo_tksk
0
140
AIプログラミング雑キャッチアップ
yuheinakasaka
19
4.8k
iOSでQRコード生成奮闘記
ktcryomm
2
110
Honoをフロントエンドで使う 3つのやり方
yusukebe
7
3.6k
Djangoにおける複数ユーザー種別認証の設計アプローチ@DjangoCongress JP 2025
delhi09
PRO
4
490
How mixi2 Uses TiDB for SNS Scalability and Performance
kanmo
41
16k
Jakarta EE meets AI
ivargrimstad
0
460
パスキーのすべて ── 導入・UX設計・実装の紹介 / 20250213 パスキー開発者の集い
kuralab
3
910
コミュニティ駆動 AWS CDK ライブラリ「Open Constructs Library」 / community-cdk-library
gotok365
2
250
データの整合性を保つ非同期処理アーキテクチャパターン / Async Architecture Patterns
mokuo
55
19k
負債になりにくいCSSをデザイナとつくるには?
fsubal
10
2.6k
もう少しテストを書きたいんじゃ〜 #phpstudy
o0h
PRO
18
4.1k
Featured
See All Featured
Sharpening the Axe: The Primacy of Toolmaking
bcantrill
40
2k
BBQ
matthewcrist
87
9.5k
Why You Should Never Use an ORM
jnunemaker
PRO
55
9.2k
Java REST API Framework Comparison - PWX 2021
mraible
29
8.4k
Measuring & Analyzing Core Web Vitals
bluesmoon
6
260
Speed Design
sergeychernyshev
27
810
GitHub's CSS Performance
jonrohan
1030
460k
The Language of Interfaces
destraynor
156
24k
GraphQLの誤解/rethinking-graphql
sonatard
68
10k
Imperfection Machines: The Place of Print at Facebook
scottboms
267
13k
Unsuck your backbone
ammeep
669
57k
Chrome DevTools: State of the Union 2024 - Debugging React & Beyond
addyosmani
4
380
Transcript
Parsing For Humans
@yorickpeterse (Github, Twitter, Gmail, etc)
Olery http://olery.com/
Parsing “Analysing a string of symbols according to the rules
of a grammar.”
10 + 10
number = [0-9]+ operator = “+”
Vocabulary Token, AST, Lexer & Parser
Token A label for a specific part of the input.
10 [:INTEGER, “10”]
Abstract Syntax Tree A tree structure representing the input.
Markdown Lists * Item * Nested item
(unordered-list (list “Item” (unordered-list (list “Nested item”))))
Lexer Takes raw input and returns a sequence of tokens.
Parser Takes tokens as input and returns an AST.
Available Tools
ANTLR, Bison, Coco/R, Flex, Happy, Lemon, jQuery, Parslet, Racc, Ragel,
Rexical, Treetop, Yacc
Ragel & Racc
Ragel “A finite state machine compiler.”
“<!--” any* “-->” => { … };
$ ragel -R lexer.rl -o lexer.rb
Racc “A LALR(1) parser generator.”
expression : NUMBER OPERATOR NUMBER { … } ;
$ racc -o parser.rb parser.y
Oga Parsing XML/HTML in Ruby https://github.com/yorickpeterse/oga
Questions?