Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Goで実装する軽量マークアップ言語パーサー / Gohn: parser written in Go
Search
aereal
August 04, 2017
3
3.7k
Goで実装する軽量マークアップ言語パーサー / Gohn: parser written in Go
talked at builderscon tokyo 2017
aereal
August 04, 2017
Tweet
Share
More Decks by aereal
See All by aereal
盆栽転じて家具となる / Bonsai and Furnitures
aereal
0
3.7k
How to send distibuted traces to Datadog using build own OpenTelemetry-Lambda distribution
aereal
3
230
好きな技術《コト》で、 生きていく技術 / life with what you like
aereal
5
3.2k
qron: Cloud Native Cron Alternativeの今
aereal
2
2.3k
自動作曲入門 / introduction to programatic music composition
aereal
1
530k
はてなブログ タグとCDK / The epic of AWS CDK and Hatena Blog Tag
aereal
3
200k
はてなブログ タグの技術選択 / The technical details of Hatena Blog Tag
aereal
3
200k
ブログサービスのHTTPS化を支えたAWSで作るピタゴラスイッチ / The construction of large scale TLS certificates management system with AWS
aereal
3
400k
AWSではてなブログの常時HTTPS配信をバーンとやる話 / The Epic of migration from HTTP to HTTPS on Hatena Blog with AWS
aereal
14
18k
Featured
See All Featured
Fashionably flexible responsive web design (full day workshop)
malarkey
406
66k
Bash Introduction
62gerente
611
210k
Thoughts on Productivity
jonyablonski
69
4.5k
JavaScript: Past, Present, and Future - NDC Porto 2020
reverentgeek
47
5.2k
Performance Is Good for Brains [We Love Speed 2024]
tammyeverts
7
630
Creating an realtime collaboration tool: Agile Flush - .NET Oxford
marcduiker
27
1.9k
A Philosophy of Restraint
colly
203
16k
A Modern Web Designer's Workflow
chriscoyier
693
190k
YesSQL, Process and Tooling at Scale
rocio
172
14k
[Rails World 2023 - Day 1 Closing Keynote] - The Magic of Rails
eileencodes
33
2.1k
Visualizing Your Data: Incorporating Mongo into Loggly Infrastructure
mongodb
45
9.4k
Dealing with People You Can't Stand - Big Design 2015
cassininazir
366
25k
Transcript
GoͰ࣮͢Δ ܰྔϚʔΫΞοϓݴޠ ύʔαʔ id:aereal @ builderscon tokyo 2017
͢͜ͱ • ܰྔϚʔΫΞοϓݴޠͱͯͳه๏ʹ͍ͭͯ • ςΩετॲཧͱύʔαʔδΣωϨʔλʔͷඞཁੑ • Go/goyaccʹΑΔͯͳه๏ύʔαʔͷհ • goyaccͷԠ༻ࣝ
ࣗݾհ • id:aereal • GitHub: aereal • גࣜձࣾͯͳ ΞϓϦέʔγϣϯΤϯδχΞ
⚠͓͜ͱΘΓ⚠ • αʔϏεΛ৮͍ͬͯͯײͨ͡ ݸਓతͳ՝ҙࣝʹجͮ͘ϓϥΠϕʔτϫʔΫͰ͢ • αʔϏεʹ࠾༻͞ΕΔ͔ෆ໌
ࢀߟใ • http://b.hatena.ne.jp/aereal/2017gokyoto/ • ͯͳϒοΫϚʔΫͰλάΛ͚ͯϒΫϚ͍ͯ͠·͢
ܰྔϚʔΫΞοϓݴޠͱ ͯͳه๏
ܰྔϚʔΫΞοϓݴޠͱ • LML = Lightweight Markup Language • HTMLXMLͱϓϨʔϯςΩετͷதؒʹ͋Δ •
Markdown, Textile, ͯͳه๏, etc.
ܰྔϚʔΫΞοϓݴޠͱ • LML = Lightweight Markup Language • HTMLXMLͱϓϨʔϯςΩετͷதؒʹ͋Δ •
Markdown, Textile, ͯͳه๏, etc.
ͯͳه๏ͱ • ͯͳ͕ఏڙ͢Δ͍͔ͭ͘ͷαʔϏεͰ͑ΔLML • ͯͳϒϩάɺͯͳμΠΞϦʔɺetc. • HTMLʹม͞ΕΔศརͳه๏ • org-modeͱͪΐͬͱࣅ͍ͯΔจ๏
* ݟग़͠1 ** ݟग़͠2 [http://127.0.0.1/:title=΅͘ͷIPͰ͢] - Ruby - Perl -
Go + ى + ঝ + స + ݁
<h1>ݟग़͠1</h1> <h2>ݟग़͠2</h2> <p> <a href="http://127.0.0.1/">΅͘ͷIPͰ͢</a> </p> <ul> <li>Ruby</li> <li>Perl</li> <li>Go</li>
</ul> <ol> <li>ى</li> <li>ঝ</li> <li>స</li> <li>݁</li> </ol>
࣮͍Ζ͍Ζ • ͯͳϒϩάɺͯͳμΠΞϦʔɺͯͳάϧʔϓ • Text-Hatena (CPAN) • Text-Xatena (CPAN) •
chris4403/WikiTextConverter • motemen/pandoc
࣮͍Ζ͍Ζ • ༷ ≈ ࣮ • ࣮͕͍Ζ͍Ζ͋Δ • ͭ·Γ •
࣮ͷ͚༷͕ͩଘࡏ͢Δ • ༷ΛΔʹPerlͱਖ਼نදݱΛಡΈղ͘ඞཁ͕͋Δ
खࠒͳ࣮͕ແͯ͘ࠔΔ • PerlҎ֎Ͱॻ͔ΕͨΞϓϦέʔγϣϯͰ ͯͳه๏Λ͑ΔΑ͏ʹ͍ͨ͠ɺ͚Ͳ…… • Perlͷ֦ுਖ਼نදݱΛۦ͍ͯ͠ΔͷͰҠ২େม • HTMLม·ͰΔύʔαʔ͕ଟ͍
ϙʔλϏϦςΟ • ϒϥβͰϥΠϒϓϨϏϡʔͱ͔͍ͨ͠͡ΌΜ • ೖྗʹର͢Δग़ྗ (AST) ͚ͩΛܾΊ͍ͨ • PerlGoScala, JavaScriptͦͷଞͰॻ͖͍ͨ
HTMLม·ͰΓͨ͘ͳ͍ • ଟ͘ͷύʔαʔ࣮͕HTMLม·Ͱߦ͏ • ҰํɺೖྗʹͲΕ͘Β͍HTMLΛڐՄ͢Δ͔ αʔϏεຖ (!= ύʔαʔ࣮ຖ) ʹҟͳΔ •
→ ύʔαʔͱHTMLมΛ͍ͨ͠
͜Μͳͯͳه๏ύʔαʔ͕ ΄͍͠ • ϦϑΝϨϯεͨΓ͏Δૉͳ࣮ • = ਖ਼نදݱͰͳΜͱ͔͠Α͏ͱ͍͗ͯ͢͠ͳ͍ • ύʔε݁Ռ͕HTMLͰͳ͘தؒදݱ͕ಘΒΕΔ
ࡾߦͰ·ͱΊΔͱ • AST͘Ε!!!
࣍ճ༧ࠂ • ಛఆͷݴޠʹґଘ͠ͳ͍ ྑ͍͔Μ͡ͷςΩετॲཧάοζͳ͍ͷ͔ • ͨͩ͠ (֦ு) ਖ਼نදݱҎ֎ • Αͦ͞͏ͳςΩετॲཧٕज़Λ୳͠ʹ͍͖·͢
ςΩετॲཧͱ ύʔαʔδΣωϨʔλʔ
ςΩετॲཧͻͱΊ͙Γ • ςΩετॲཧͷςΫχοΫΛ͍Ζ͍Ζհ • έʔεʹΑͬͯύʔαʔΛॻ͘·Ͱͳ͔ͬͨΓ͢Δ
τʔΫϯͷग़ݱҐஔ "id:aereal".substring(3) // => "aereal"
τʔΫϯͷग़ݱҐஔ • τʔΫϯͷग़ݱҐஔ͕ݻఆͳΒ͜Ε͘Β͍Ͱ • Մมͩͱഁ͢Δ • ͓ͦͯ͠Αͦͷจ๏ՄมͷτʔΫϯ͔Γ
ਖ਼نදݱ /id:(.+)/.match("id:aereal")[1] // => "aereal"
ਖ਼نදݱ • ׅހͷඇରԠݕग़Ͱ͖ͳ͍ • (POSIXͷਖ਼نදݱͰෆՄɺ Perlͷ֦ுਖ਼نදݱͰͰ͖ͨͣ) • ҰຊΓ͕Ͱ͖ͳ͔ͬͨΒɺ ޙड़ͷঢ়ଶཧΛߦ͏ඞཁ͕͋Δ
ঢ়ଶભҠΛཧ var isInIdNotation = false; while (1) { if (isInIdNotation)
{ var name = readText(); // => "aereal" } else { switch (readChar()) { case ':': isInIdNotation = true; default: // ... } } }
ঢ়ଶભҠΛཧ var isInIdNotation = false; var isInHeading = false; var
isInUnorderedList = false; var isInOrderedList = false; while (1) { if (isInIdNotation) if (isInHeading) if (isInUnorderedList) if (isInOrderedList) }
None
• ͲΕจ๏Λ၆ᛌͮ͠Β͍ • ϞδϡʔϧԽ͕͍͠ • → খ͍͞෦ΛੵΈ্͍͛ͯ͘ελΠϧͰ࡞ΕͨΒ……
ͦ͜Ͱyacc • ύʔαʔδΣωϨʔλʔͷ1ͭ • BNFʹࣅͨߏจنଇ͔ΒύʔαʔΛੜ͢Δ • ෳͷنଇΛΈ߹Θͤͯ1ͭͷنଇΛ࡞Γ্͛Δ • ίʔϧόοΫελΠϧͰ نଇΛϓϩάϥϜʹม͢Δ
(ؐݩɺreduce)
https://tools.ietf.org/html/rfc7230 HTTP-Message = start-line *( header-field CRLF) CRLF [ message-body]
start-line = request-line / status-line
yacc • BNFͱ͍͏நతͳํ๏Ͱطड़Ͱ͖Δͷ͕Α͍ • ݴޠDSLʹରͯ͠ϙʔλϏϦςΟͰ༏Δ • ϨΩαʔ (ࣈ۟ղੳث) ผ్࣮͢Δඞཁ͕͋Δ •
ߏจنଇͷίʔϧόοΫ෦͕ ΤσΟλͰϋΠϥΠτ͞Εͳ͍ (ͳʹ͔͍͍ํ๏͋Γͦ͏)
࣍ճ༧ࠂ • yaccΑͦ͞͏ͱ͍͏͜ͱ͕Θ͔ͬͨ • GoͱyaccΛΈ߹ΘͤΒΕΔͷ͔ • ͨͯͯ͠ͳه๏ύʔαʔΛ࡞Δ͜ͱ͕Ͱ͖Δͷ͔
https://git.io/v7gcD github.com/aereal/gohn
gohn • Written in Go w/goyacc • pronounce as `gone`
• ओཁͳه๏࣮ࡁΈ
gohnͷσβΠϯ • ඪ४ೖྗ͔Βͯͳه๏Λड͚औΓɺ • ඪ४ग़ྗʹASTΛJSONʹγϦΞϥΠζͯ͠ग़ྗ͢Δ • → HTMLͷมผ్࣮͢Δ • ͱͯUNIXత
AST • JSONʹγϦΞϥΠζ • JSON schemaΛެ։͍ͯ͠Δ • εΩʔϚ͔ΒHTMLมثΛࣗಈੜ͢Δ͜ͱͰ͖ͦ͏ • https://github.com/aereal/gohn/blob/master/schema.json
Goͱyacc • goyaccͱ͍͏πʔϧ͕͋Δ • go get golang.org/x/tools/cmd/goyacc • ΞΫγϣϯΛGoͰॻ͚Δ
Goͱࣈ۟ղੳ • ࣈ۟ղੳ = ಡΜͩจࣈ͕ͲΜͳҙຯΛ࣋ͭͷ͔ฦ͢ • text/scannerͱ͍͏ඪ४ύοέʔδ͕ศར • ڍಈΛΧελϚΠζͰ͖Δ •
τʔΫϯΛফඅͨ͠࠷ޙͷҐஔΛهͯ͘͠ΕΔͷͰ Τϥʔϝοηʔδͷߏஙָ͕
σϞ
Ԡ༻ฤ
HTTPه๏ [http://example.com/] # <a href="http://example.com/"> # http://example.com/ # </a> [http://127.0.0.1/:title=΅͘ͷIP]
# <a href="http://example.com/"> # ΅͘ͷIP # </a>
HTTPه๏ • ΞϯΧʔϦϯΫʹม͞ΕΔه๏ • ඌʹల։࣌ͷΦϓγϣϯΛ `:` ʹଓ͚ͯطड़Ͱ͖Δ • `:` URLͷҰ෦ʹݱΕΔ͜ͱ͕͋Δ
• → ࣍ͷ1จࣈΛಡΉ͚ͩͰ:titleͷ։͔࢝அͰ͖ͳ͍
࠷ॳʹݱΕΔ `:` εΩʔϜ෦ͱݟͳͯ͠ແࢹ͢Δ͜ͱʹ if !l.seenColon { l.seenColon = true return
false // maybe part of URL } else { return true } https://github.com/aereal/gohn/blob/master/parser/ lex.go#L100
࠶ؼతͳϧʔϧ • N > 1ͷࢠنଇ͔ΒͳΔنଇͷॻ͖ํ • appendͷॱ൪͚ͩؒҧ͑ͳ͍Α͏ʹ
http_options: http_option { $$ = []string{$1} } | http_option http_options
{ options := $2 $$ = append([]string{$1}, options...) }
ςετ • Table-driven tests͕Φεεϝ • https://github.com/golang/go/wiki/TableDrivenTests • lexerΛؚΉparserͷػೳςετ͚ͩͰेͩͱࢥ͏ • https://github.com/aereal/gohn/blob/master/parser/
parser_test.go#L17
σόοά • tokenͷࣝผࢠ (int) ͔Β໊લ (string) Λ ٯҾ͖͢ΔϝιουΛఆ͓ٛͯ͘͠ͱศར • print͢ΔʹͤΑσόοΨΛ͏ʹͤΑ
• https://github.com/aereal/gohn/blob/master/parser/ lex.go#L29
·ͱΊ
Go/goyaccศར • GoෳࡶͳCLIΛϙʔλϒϧʹ࡞Δͷʹ͍͍ͯΔ • goyacc (yacc) ෳࡶͳจ๏ͷύʔαʔʹ͍͍ͯΔ
ܰྔϚʔΫΞοϓݴޠ ͍͠ • ਓؒʹͱͬͯͷಡΈॻ͖͢͠͞ͱ ػցʹͱͬͯͷಡΈॻ͖͢͠͞ҟͳΔ • ݫ֨ͳจ๏نଇʹैΘͤΔύʔαʔΑΓ ޡΓగਖ਼ͯ͘͠ΕΔ΄͏͕࣮༻తͳͷͰ?
ύʔαʔ࡞Γָ͍͠ • Ͱ͖Δ͜ͱɺΓ͍ͨ͜ͱɺؔ৺ͷ͋Δ͜ͱ͕ ͏·͘όϥϯε͞Εͨඪ • WebͱςΩετॲཧ • খ͞ͳඪΛগͣͭ͠ੵΈॏͶ͍͚ͯΔ • ʮࠓϦετه๏ͷ࣮͕Ͱ͖ͨͧʯ
ڵຯΛ࣋ͬͯ͘Εͨਓ • ·ͣJSONͷύʔαʔΛॻ͍ͯΈΔͱΑͦ͞͏ • RFC, relaxed JSON, etc. ʹൃలͤͯ͞ΈΔ •
࣍ࣈ۟ղੳثΛखॻ͖ͯ͠ΈΔ • ࣍ߏจղੳثखॻ͖ͯ͠ΈΔ