Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Goで実装する軽量マークアップ言語パーサー / Gohn: parser written in Go
Search
aereal
August 04, 2017
3
3.7k
Goで実装する軽量マークアップ言語パーサー / Gohn: parser written in Go
talked at builderscon tokyo 2017
aereal
August 04, 2017
Tweet
Share
More Decks by aereal
See All by aereal
How to send distibuted traces to Datadog using build own OpenTelemetry-Lambda distribution
aereal
3
200
好きな技術《コト》で、 生きていく技術 / life with what you like
aereal
5
2.3k
qron: Cloud Native Cron Alternativeの今
aereal
2
2.1k
自動作曲入門 / introduction to programatic music composition
aereal
1
530k
はてなブログ タグとCDK / The epic of AWS CDK and Hatena Blog Tag
aereal
3
200k
はてなブログ タグの技術選択 / The technical details of Hatena Blog Tag
aereal
3
200k
ブログサービスのHTTPS化を支えたAWSで作るピタゴラスイッチ / The construction of large scale TLS certificates management system with AWS
aereal
3
400k
AWSではてなブログの常時HTTPS配信をバーンとやる話 / The Epic of migration from HTTP to HTTPS on Hatena Blog with AWS
aereal
14
17k
ScalaとPerlでMicroservices in production / Building microservices with Perl and Scala in production
aereal
0
5.3k
Featured
See All Featured
How to Ace a Technical Interview
jacobian
276
23k
KATA
mclloyd
29
14k
Producing Creativity
orderedlist
PRO
341
39k
Imperfection Machines: The Place of Print at Facebook
scottboms
265
13k
Why You Should Never Use an ORM
jnunemaker
PRO
54
9.1k
The Myth of the Modular Monolith - Day 2 Keynote - Rails World 2024
eileencodes
16
2.1k
VelocityConf: Rendering Performance Case Studies
addyosmani
325
24k
GitHub's CSS Performance
jonrohan
1030
460k
The Language of Interfaces
destraynor
154
24k
Teambox: Starting and Learning
jrom
133
8.8k
Testing 201, or: Great Expectations
jmmastey
38
7.1k
A better future with KSS
kneath
238
17k
Transcript
GoͰ࣮͢Δ ܰྔϚʔΫΞοϓݴޠ ύʔαʔ id:aereal @ builderscon tokyo 2017
͢͜ͱ • ܰྔϚʔΫΞοϓݴޠͱͯͳه๏ʹ͍ͭͯ • ςΩετॲཧͱύʔαʔδΣωϨʔλʔͷඞཁੑ • Go/goyaccʹΑΔͯͳه๏ύʔαʔͷհ • goyaccͷԠ༻ࣝ
ࣗݾհ • id:aereal • GitHub: aereal • גࣜձࣾͯͳ ΞϓϦέʔγϣϯΤϯδχΞ
⚠͓͜ͱΘΓ⚠ • αʔϏεΛ৮͍ͬͯͯײͨ͡ ݸਓతͳ՝ҙࣝʹجͮ͘ϓϥΠϕʔτϫʔΫͰ͢ • αʔϏεʹ࠾༻͞ΕΔ͔ෆ໌
ࢀߟใ • http://b.hatena.ne.jp/aereal/2017gokyoto/ • ͯͳϒοΫϚʔΫͰλάΛ͚ͯϒΫϚ͍ͯ͠·͢
ܰྔϚʔΫΞοϓݴޠͱ ͯͳه๏
ܰྔϚʔΫΞοϓݴޠͱ • LML = Lightweight Markup Language • HTMLXMLͱϓϨʔϯςΩετͷதؒʹ͋Δ •
Markdown, Textile, ͯͳه๏, etc.
ܰྔϚʔΫΞοϓݴޠͱ • LML = Lightweight Markup Language • HTMLXMLͱϓϨʔϯςΩετͷதؒʹ͋Δ •
Markdown, Textile, ͯͳه๏, etc.
ͯͳه๏ͱ • ͯͳ͕ఏڙ͢Δ͍͔ͭ͘ͷαʔϏεͰ͑ΔLML • ͯͳϒϩάɺͯͳμΠΞϦʔɺetc. • HTMLʹม͞ΕΔศརͳه๏ • org-modeͱͪΐͬͱࣅ͍ͯΔจ๏
* ݟग़͠1 ** ݟग़͠2 [http://127.0.0.1/:title=΅͘ͷIPͰ͢] - Ruby - Perl -
Go + ى + ঝ + స + ݁
<h1>ݟग़͠1</h1> <h2>ݟग़͠2</h2> <p> <a href="http://127.0.0.1/">΅͘ͷIPͰ͢</a> </p> <ul> <li>Ruby</li> <li>Perl</li> <li>Go</li>
</ul> <ol> <li>ى</li> <li>ঝ</li> <li>స</li> <li>݁</li> </ol>
࣮͍Ζ͍Ζ • ͯͳϒϩάɺͯͳμΠΞϦʔɺͯͳάϧʔϓ • Text-Hatena (CPAN) • Text-Xatena (CPAN) •
chris4403/WikiTextConverter • motemen/pandoc
࣮͍Ζ͍Ζ • ༷ ≈ ࣮ • ࣮͕͍Ζ͍Ζ͋Δ • ͭ·Γ •
࣮ͷ͚༷͕ͩଘࡏ͢Δ • ༷ΛΔʹPerlͱਖ਼نදݱΛಡΈղ͘ඞཁ͕͋Δ
खࠒͳ࣮͕ແͯ͘ࠔΔ • PerlҎ֎Ͱॻ͔ΕͨΞϓϦέʔγϣϯͰ ͯͳه๏Λ͑ΔΑ͏ʹ͍ͨ͠ɺ͚Ͳ…… • Perlͷ֦ுਖ਼نදݱΛۦ͍ͯ͠ΔͷͰҠ২େม • HTMLม·ͰΔύʔαʔ͕ଟ͍
ϙʔλϏϦςΟ • ϒϥβͰϥΠϒϓϨϏϡʔͱ͔͍ͨ͠͡ΌΜ • ೖྗʹର͢Δग़ྗ (AST) ͚ͩΛܾΊ͍ͨ • PerlGoScala, JavaScriptͦͷଞͰॻ͖͍ͨ
HTMLม·ͰΓͨ͘ͳ͍ • ଟ͘ͷύʔαʔ࣮͕HTMLม·Ͱߦ͏ • ҰํɺೖྗʹͲΕ͘Β͍HTMLΛڐՄ͢Δ͔ αʔϏεຖ (!= ύʔαʔ࣮ຖ) ʹҟͳΔ •
→ ύʔαʔͱHTMLมΛ͍ͨ͠
͜Μͳͯͳه๏ύʔαʔ͕ ΄͍͠ • ϦϑΝϨϯεͨΓ͏Δૉͳ࣮ • = ਖ਼نදݱͰͳΜͱ͔͠Α͏ͱ͍͗ͯ͢͠ͳ͍ • ύʔε݁Ռ͕HTMLͰͳ͘தؒදݱ͕ಘΒΕΔ
ࡾߦͰ·ͱΊΔͱ • AST͘Ε!!!
࣍ճ༧ࠂ • ಛఆͷݴޠʹґଘ͠ͳ͍ ྑ͍͔Μ͡ͷςΩετॲཧάοζͳ͍ͷ͔ • ͨͩ͠ (֦ு) ਖ਼نදݱҎ֎ • Αͦ͞͏ͳςΩετॲཧٕज़Λ୳͠ʹ͍͖·͢
ςΩετॲཧͱ ύʔαʔδΣωϨʔλʔ
ςΩετॲཧͻͱΊ͙Γ • ςΩετॲཧͷςΫχοΫΛ͍Ζ͍Ζհ • έʔεʹΑͬͯύʔαʔΛॻ͘·Ͱͳ͔ͬͨΓ͢Δ
τʔΫϯͷग़ݱҐஔ "id:aereal".substring(3) // => "aereal"
τʔΫϯͷग़ݱҐஔ • τʔΫϯͷग़ݱҐஔ͕ݻఆͳΒ͜Ε͘Β͍Ͱ • Մมͩͱഁ͢Δ • ͓ͦͯ͠Αͦͷจ๏ՄมͷτʔΫϯ͔Γ
ਖ਼نදݱ /id:(.+)/.match("id:aereal")[1] // => "aereal"
ਖ਼نදݱ • ׅހͷඇରԠݕग़Ͱ͖ͳ͍ • (POSIXͷਖ਼نදݱͰෆՄɺ Perlͷ֦ுਖ਼نදݱͰͰ͖ͨͣ) • ҰຊΓ͕Ͱ͖ͳ͔ͬͨΒɺ ޙड़ͷঢ়ଶཧΛߦ͏ඞཁ͕͋Δ
ঢ়ଶભҠΛཧ var isInIdNotation = false; while (1) { if (isInIdNotation)
{ var name = readText(); // => "aereal" } else { switch (readChar()) { case ':': isInIdNotation = true; default: // ... } } }
ঢ়ଶભҠΛཧ var isInIdNotation = false; var isInHeading = false; var
isInUnorderedList = false; var isInOrderedList = false; while (1) { if (isInIdNotation) if (isInHeading) if (isInUnorderedList) if (isInOrderedList) }
None
• ͲΕจ๏Λ၆ᛌͮ͠Β͍ • ϞδϡʔϧԽ͕͍͠ • → খ͍͞෦ΛੵΈ্͍͛ͯ͘ελΠϧͰ࡞ΕͨΒ……
ͦ͜Ͱyacc • ύʔαʔδΣωϨʔλʔͷ1ͭ • BNFʹࣅͨߏจنଇ͔ΒύʔαʔΛੜ͢Δ • ෳͷنଇΛΈ߹Θͤͯ1ͭͷنଇΛ࡞Γ্͛Δ • ίʔϧόοΫελΠϧͰ نଇΛϓϩάϥϜʹม͢Δ
(ؐݩɺreduce)
https://tools.ietf.org/html/rfc7230 HTTP-Message = start-line *( header-field CRLF) CRLF [ message-body]
start-line = request-line / status-line
yacc • BNFͱ͍͏நతͳํ๏Ͱطड़Ͱ͖Δͷ͕Α͍ • ݴޠDSLʹରͯ͠ϙʔλϏϦςΟͰ༏Δ • ϨΩαʔ (ࣈ۟ղੳث) ผ్࣮͢Δඞཁ͕͋Δ •
ߏจنଇͷίʔϧόοΫ෦͕ ΤσΟλͰϋΠϥΠτ͞Εͳ͍ (ͳʹ͔͍͍ํ๏͋Γͦ͏)
࣍ճ༧ࠂ • yaccΑͦ͞͏ͱ͍͏͜ͱ͕Θ͔ͬͨ • GoͱyaccΛΈ߹ΘͤΒΕΔͷ͔ • ͨͯͯ͠ͳه๏ύʔαʔΛ࡞Δ͜ͱ͕Ͱ͖Δͷ͔
https://git.io/v7gcD github.com/aereal/gohn
gohn • Written in Go w/goyacc • pronounce as `gone`
• ओཁͳه๏࣮ࡁΈ
gohnͷσβΠϯ • ඪ४ೖྗ͔Βͯͳه๏Λड͚औΓɺ • ඪ४ग़ྗʹASTΛJSONʹγϦΞϥΠζͯ͠ग़ྗ͢Δ • → HTMLͷมผ్࣮͢Δ • ͱͯUNIXత
AST • JSONʹγϦΞϥΠζ • JSON schemaΛެ։͍ͯ͠Δ • εΩʔϚ͔ΒHTMLมثΛࣗಈੜ͢Δ͜ͱͰ͖ͦ͏ • https://github.com/aereal/gohn/blob/master/schema.json
Goͱyacc • goyaccͱ͍͏πʔϧ͕͋Δ • go get golang.org/x/tools/cmd/goyacc • ΞΫγϣϯΛGoͰॻ͚Δ
Goͱࣈ۟ղੳ • ࣈ۟ղੳ = ಡΜͩจࣈ͕ͲΜͳҙຯΛ࣋ͭͷ͔ฦ͢ • text/scannerͱ͍͏ඪ४ύοέʔδ͕ศར • ڍಈΛΧελϚΠζͰ͖Δ •
τʔΫϯΛফඅͨ͠࠷ޙͷҐஔΛهͯ͘͠ΕΔͷͰ Τϥʔϝοηʔδͷߏஙָ͕
σϞ
Ԡ༻ฤ
HTTPه๏ [http://example.com/] # <a href="http://example.com/"> # http://example.com/ # </a> [http://127.0.0.1/:title=΅͘ͷIP]
# <a href="http://example.com/"> # ΅͘ͷIP # </a>
HTTPه๏ • ΞϯΧʔϦϯΫʹม͞ΕΔه๏ • ඌʹల։࣌ͷΦϓγϣϯΛ `:` ʹଓ͚ͯطड़Ͱ͖Δ • `:` URLͷҰ෦ʹݱΕΔ͜ͱ͕͋Δ
• → ࣍ͷ1จࣈΛಡΉ͚ͩͰ:titleͷ։͔࢝அͰ͖ͳ͍
࠷ॳʹݱΕΔ `:` εΩʔϜ෦ͱݟͳͯ͠ແࢹ͢Δ͜ͱʹ if !l.seenColon { l.seenColon = true return
false // maybe part of URL } else { return true } https://github.com/aereal/gohn/blob/master/parser/ lex.go#L100
࠶ؼతͳϧʔϧ • N > 1ͷࢠنଇ͔ΒͳΔنଇͷॻ͖ํ • appendͷॱ൪͚ͩؒҧ͑ͳ͍Α͏ʹ
http_options: http_option { $$ = []string{$1} } | http_option http_options
{ options := $2 $$ = append([]string{$1}, options...) }
ςετ • Table-driven tests͕Φεεϝ • https://github.com/golang/go/wiki/TableDrivenTests • lexerΛؚΉparserͷػೳςετ͚ͩͰेͩͱࢥ͏ • https://github.com/aereal/gohn/blob/master/parser/
parser_test.go#L17
σόοά • tokenͷࣝผࢠ (int) ͔Β໊લ (string) Λ ٯҾ͖͢ΔϝιουΛఆ͓ٛͯ͘͠ͱศར • print͢ΔʹͤΑσόοΨΛ͏ʹͤΑ
• https://github.com/aereal/gohn/blob/master/parser/ lex.go#L29
·ͱΊ
Go/goyaccศར • GoෳࡶͳCLIΛϙʔλϒϧʹ࡞Δͷʹ͍͍ͯΔ • goyacc (yacc) ෳࡶͳจ๏ͷύʔαʔʹ͍͍ͯΔ
ܰྔϚʔΫΞοϓݴޠ ͍͠ • ਓؒʹͱͬͯͷಡΈॻ͖͢͠͞ͱ ػցʹͱͬͯͷಡΈॻ͖͢͠͞ҟͳΔ • ݫ֨ͳจ๏نଇʹैΘͤΔύʔαʔΑΓ ޡΓగਖ਼ͯ͘͠ΕΔ΄͏͕࣮༻తͳͷͰ?
ύʔαʔ࡞Γָ͍͠ • Ͱ͖Δ͜ͱɺΓ͍ͨ͜ͱɺؔ৺ͷ͋Δ͜ͱ͕ ͏·͘όϥϯε͞Εͨඪ • WebͱςΩετॲཧ • খ͞ͳඪΛগͣͭ͠ੵΈॏͶ͍͚ͯΔ • ʮࠓϦετه๏ͷ࣮͕Ͱ͖ͨͧʯ
ڵຯΛ࣋ͬͯ͘Εͨਓ • ·ͣJSONͷύʔαʔΛॻ͍ͯΈΔͱΑͦ͞͏ • RFC, relaxed JSON, etc. ʹൃలͤͯ͞ΈΔ •
࣍ࣈ۟ղੳثΛखॻ͖ͯ͠ΈΔ • ࣍ߏจղੳثखॻ͖ͯ͠ΈΔ