Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Goで実装する軽量マークアップ言語パーサー / Gohn: parser written in Go
Search
aereal
August 04, 2017
3
3.8k
Goで実装する軽量マークアップ言語パーサー / Gohn: parser written in Go
talked at builderscon tokyo 2017
aereal
August 04, 2017
Tweet
Share
More Decks by aereal
See All by aereal
盆栽転じて家具となる / Bonsai and Furnitures
aereal
0
4.8k
How to send distibuted traces to Datadog using build own OpenTelemetry-Lambda distribution
aereal
3
280
好きな技術《コト》で、 生きていく技術 / life with what you like
aereal
5
4k
qron: Cloud Native Cron Alternativeの今
aereal
2
2.7k
自動作曲入門 / introduction to programatic music composition
aereal
1
530k
はてなブログ タグとCDK / The epic of AWS CDK and Hatena Blog Tag
aereal
2
200k
はてなブログ タグの技術選択 / The technical details of Hatena Blog Tag
aereal
3
200k
ブログサービスのHTTPS化を支えたAWSで作るピタゴラスイッチ / The construction of large scale TLS certificates management system with AWS
aereal
3
400k
AWSではてなブログの常時HTTPS配信をバーンとやる話 / The Epic of migration from HTTP to HTTPS on Hatena Blog with AWS
aereal
14
18k
Featured
See All Featured
Balancing Empowerment & Direction
lara
1
450
Building a Scalable Design System with Sketch
lauravandoore
462
33k
Faster Mobile Websites
deanohume
308
31k
Scaling GitHub
holman
460
140k
For a Future-Friendly Web
brad_frost
179
9.8k
[Rails World 2023 - Day 1 Closing Keynote] - The Magic of Rails
eileencodes
35
2.4k
Unsuck your backbone
ammeep
671
58k
KATA
mclloyd
30
14k
Building Adaptive Systems
keathley
43
2.7k
Cheating the UX When There Is Nothing More to Optimize - PixelPioneers
stephaniewalter
282
13k
Chrome DevTools: State of the Union 2024 - Debugging React & Beyond
addyosmani
7
750
Gamification - CAS2011
davidbonilla
81
5.4k
Transcript
GoͰ࣮͢Δ ܰྔϚʔΫΞοϓݴޠ ύʔαʔ id:aereal @ builderscon tokyo 2017
͢͜ͱ • ܰྔϚʔΫΞοϓݴޠͱͯͳه๏ʹ͍ͭͯ • ςΩετॲཧͱύʔαʔδΣωϨʔλʔͷඞཁੑ • Go/goyaccʹΑΔͯͳه๏ύʔαʔͷհ • goyaccͷԠ༻ࣝ
ࣗݾհ • id:aereal • GitHub: aereal • גࣜձࣾͯͳ ΞϓϦέʔγϣϯΤϯδχΞ
⚠͓͜ͱΘΓ⚠ • αʔϏεΛ৮͍ͬͯͯײͨ͡ ݸਓతͳ՝ҙࣝʹجͮ͘ϓϥΠϕʔτϫʔΫͰ͢ • αʔϏεʹ࠾༻͞ΕΔ͔ෆ໌
ࢀߟใ • http://b.hatena.ne.jp/aereal/2017gokyoto/ • ͯͳϒοΫϚʔΫͰλάΛ͚ͯϒΫϚ͍ͯ͠·͢
ܰྔϚʔΫΞοϓݴޠͱ ͯͳه๏
ܰྔϚʔΫΞοϓݴޠͱ • LML = Lightweight Markup Language • HTMLXMLͱϓϨʔϯςΩετͷதؒʹ͋Δ •
Markdown, Textile, ͯͳه๏, etc.
ܰྔϚʔΫΞοϓݴޠͱ • LML = Lightweight Markup Language • HTMLXMLͱϓϨʔϯςΩετͷதؒʹ͋Δ •
Markdown, Textile, ͯͳه๏, etc.
ͯͳه๏ͱ • ͯͳ͕ఏڙ͢Δ͍͔ͭ͘ͷαʔϏεͰ͑ΔLML • ͯͳϒϩάɺͯͳμΠΞϦʔɺetc. • HTMLʹม͞ΕΔศརͳه๏ • org-modeͱͪΐͬͱࣅ͍ͯΔจ๏
* ݟग़͠1 ** ݟग़͠2 [http://127.0.0.1/:title=΅͘ͷIPͰ͢] - Ruby - Perl -
Go + ى + ঝ + స + ݁
<h1>ݟग़͠1</h1> <h2>ݟग़͠2</h2> <p> <a href="http://127.0.0.1/">΅͘ͷIPͰ͢</a> </p> <ul> <li>Ruby</li> <li>Perl</li> <li>Go</li>
</ul> <ol> <li>ى</li> <li>ঝ</li> <li>స</li> <li>݁</li> </ol>
࣮͍Ζ͍Ζ • ͯͳϒϩάɺͯͳμΠΞϦʔɺͯͳάϧʔϓ • Text-Hatena (CPAN) • Text-Xatena (CPAN) •
chris4403/WikiTextConverter • motemen/pandoc
࣮͍Ζ͍Ζ • ༷ ≈ ࣮ • ࣮͕͍Ζ͍Ζ͋Δ • ͭ·Γ •
࣮ͷ͚༷͕ͩଘࡏ͢Δ • ༷ΛΔʹPerlͱਖ਼نදݱΛಡΈղ͘ඞཁ͕͋Δ
खࠒͳ࣮͕ແͯ͘ࠔΔ • PerlҎ֎Ͱॻ͔ΕͨΞϓϦέʔγϣϯͰ ͯͳه๏Λ͑ΔΑ͏ʹ͍ͨ͠ɺ͚Ͳ…… • Perlͷ֦ுਖ਼نදݱΛۦ͍ͯ͠ΔͷͰҠ২େม • HTMLม·ͰΔύʔαʔ͕ଟ͍
ϙʔλϏϦςΟ • ϒϥβͰϥΠϒϓϨϏϡʔͱ͔͍ͨ͠͡ΌΜ • ೖྗʹର͢Δग़ྗ (AST) ͚ͩΛܾΊ͍ͨ • PerlGoScala, JavaScriptͦͷଞͰॻ͖͍ͨ
HTMLม·ͰΓͨ͘ͳ͍ • ଟ͘ͷύʔαʔ࣮͕HTMLม·Ͱߦ͏ • ҰํɺೖྗʹͲΕ͘Β͍HTMLΛڐՄ͢Δ͔ αʔϏεຖ (!= ύʔαʔ࣮ຖ) ʹҟͳΔ •
→ ύʔαʔͱHTMLมΛ͍ͨ͠
͜Μͳͯͳه๏ύʔαʔ͕ ΄͍͠ • ϦϑΝϨϯεͨΓ͏Δૉͳ࣮ • = ਖ਼نදݱͰͳΜͱ͔͠Α͏ͱ͍͗ͯ͢͠ͳ͍ • ύʔε݁Ռ͕HTMLͰͳ͘தؒදݱ͕ಘΒΕΔ
ࡾߦͰ·ͱΊΔͱ • AST͘Ε!!!
࣍ճ༧ࠂ • ಛఆͷݴޠʹґଘ͠ͳ͍ ྑ͍͔Μ͡ͷςΩετॲཧάοζͳ͍ͷ͔ • ͨͩ͠ (֦ு) ਖ਼نදݱҎ֎ • Αͦ͞͏ͳςΩετॲཧٕज़Λ୳͠ʹ͍͖·͢
ςΩετॲཧͱ ύʔαʔδΣωϨʔλʔ
ςΩετॲཧͻͱΊ͙Γ • ςΩετॲཧͷςΫχοΫΛ͍Ζ͍Ζհ • έʔεʹΑͬͯύʔαʔΛॻ͘·Ͱͳ͔ͬͨΓ͢Δ
τʔΫϯͷग़ݱҐஔ "id:aereal".substring(3) // => "aereal"
τʔΫϯͷग़ݱҐஔ • τʔΫϯͷग़ݱҐஔ͕ݻఆͳΒ͜Ε͘Β͍Ͱ • Մมͩͱഁ͢Δ • ͓ͦͯ͠Αͦͷจ๏ՄมͷτʔΫϯ͔Γ
ਖ਼نදݱ /id:(.+)/.match("id:aereal")[1] // => "aereal"
ਖ਼نදݱ • ׅހͷඇରԠݕग़Ͱ͖ͳ͍ • (POSIXͷਖ਼نදݱͰෆՄɺ Perlͷ֦ுਖ਼نදݱͰͰ͖ͨͣ) • ҰຊΓ͕Ͱ͖ͳ͔ͬͨΒɺ ޙड़ͷঢ়ଶཧΛߦ͏ඞཁ͕͋Δ
ঢ়ଶભҠΛཧ var isInIdNotation = false; while (1) { if (isInIdNotation)
{ var name = readText(); // => "aereal" } else { switch (readChar()) { case ':': isInIdNotation = true; default: // ... } } }
ঢ়ଶભҠΛཧ var isInIdNotation = false; var isInHeading = false; var
isInUnorderedList = false; var isInOrderedList = false; while (1) { if (isInIdNotation) if (isInHeading) if (isInUnorderedList) if (isInOrderedList) }
None
• ͲΕจ๏Λ၆ᛌͮ͠Β͍ • ϞδϡʔϧԽ͕͍͠ • → খ͍͞෦ΛੵΈ্͍͛ͯ͘ελΠϧͰ࡞ΕͨΒ……
ͦ͜Ͱyacc • ύʔαʔδΣωϨʔλʔͷ1ͭ • BNFʹࣅͨߏจنଇ͔ΒύʔαʔΛੜ͢Δ • ෳͷنଇΛΈ߹Θͤͯ1ͭͷنଇΛ࡞Γ্͛Δ • ίʔϧόοΫελΠϧͰ نଇΛϓϩάϥϜʹม͢Δ
(ؐݩɺreduce)
https://tools.ietf.org/html/rfc7230 HTTP-Message = start-line *( header-field CRLF) CRLF [ message-body]
start-line = request-line / status-line
yacc • BNFͱ͍͏நతͳํ๏Ͱطड़Ͱ͖Δͷ͕Α͍ • ݴޠDSLʹରͯ͠ϙʔλϏϦςΟͰ༏Δ • ϨΩαʔ (ࣈ۟ղੳث) ผ్࣮͢Δඞཁ͕͋Δ •
ߏจنଇͷίʔϧόοΫ෦͕ ΤσΟλͰϋΠϥΠτ͞Εͳ͍ (ͳʹ͔͍͍ํ๏͋Γͦ͏)
࣍ճ༧ࠂ • yaccΑͦ͞͏ͱ͍͏͜ͱ͕Θ͔ͬͨ • GoͱyaccΛΈ߹ΘͤΒΕΔͷ͔ • ͨͯͯ͠ͳه๏ύʔαʔΛ࡞Δ͜ͱ͕Ͱ͖Δͷ͔
https://git.io/v7gcD github.com/aereal/gohn
gohn • Written in Go w/goyacc • pronounce as `gone`
• ओཁͳه๏࣮ࡁΈ
gohnͷσβΠϯ • ඪ४ೖྗ͔Βͯͳه๏Λड͚औΓɺ • ඪ४ग़ྗʹASTΛJSONʹγϦΞϥΠζͯ͠ग़ྗ͢Δ • → HTMLͷมผ్࣮͢Δ • ͱͯUNIXత
AST • JSONʹγϦΞϥΠζ • JSON schemaΛެ։͍ͯ͠Δ • εΩʔϚ͔ΒHTMLมثΛࣗಈੜ͢Δ͜ͱͰ͖ͦ͏ • https://github.com/aereal/gohn/blob/master/schema.json
Goͱyacc • goyaccͱ͍͏πʔϧ͕͋Δ • go get golang.org/x/tools/cmd/goyacc • ΞΫγϣϯΛGoͰॻ͚Δ
Goͱࣈ۟ղੳ • ࣈ۟ղੳ = ಡΜͩจࣈ͕ͲΜͳҙຯΛ࣋ͭͷ͔ฦ͢ • text/scannerͱ͍͏ඪ४ύοέʔδ͕ศར • ڍಈΛΧελϚΠζͰ͖Δ •
τʔΫϯΛফඅͨ͠࠷ޙͷҐஔΛهͯ͘͠ΕΔͷͰ Τϥʔϝοηʔδͷߏஙָ͕
σϞ
Ԡ༻ฤ
HTTPه๏ [http://example.com/] # <a href="http://example.com/"> # http://example.com/ # </a> [http://127.0.0.1/:title=΅͘ͷIP]
# <a href="http://example.com/"> # ΅͘ͷIP # </a>
HTTPه๏ • ΞϯΧʔϦϯΫʹม͞ΕΔه๏ • ඌʹల։࣌ͷΦϓγϣϯΛ `:` ʹଓ͚ͯطड़Ͱ͖Δ • `:` URLͷҰ෦ʹݱΕΔ͜ͱ͕͋Δ
• → ࣍ͷ1จࣈΛಡΉ͚ͩͰ:titleͷ։͔࢝அͰ͖ͳ͍
࠷ॳʹݱΕΔ `:` εΩʔϜ෦ͱݟͳͯ͠ແࢹ͢Δ͜ͱʹ if !l.seenColon { l.seenColon = true return
false // maybe part of URL } else { return true } https://github.com/aereal/gohn/blob/master/parser/ lex.go#L100
࠶ؼతͳϧʔϧ • N > 1ͷࢠنଇ͔ΒͳΔنଇͷॻ͖ํ • appendͷॱ൪͚ͩؒҧ͑ͳ͍Α͏ʹ
http_options: http_option { $$ = []string{$1} } | http_option http_options
{ options := $2 $$ = append([]string{$1}, options...) }
ςετ • Table-driven tests͕Φεεϝ • https://github.com/golang/go/wiki/TableDrivenTests • lexerΛؚΉparserͷػೳςετ͚ͩͰेͩͱࢥ͏ • https://github.com/aereal/gohn/blob/master/parser/
parser_test.go#L17
σόοά • tokenͷࣝผࢠ (int) ͔Β໊લ (string) Λ ٯҾ͖͢ΔϝιουΛఆ͓ٛͯ͘͠ͱศར • print͢ΔʹͤΑσόοΨΛ͏ʹͤΑ
• https://github.com/aereal/gohn/blob/master/parser/ lex.go#L29
·ͱΊ
Go/goyaccศར • GoෳࡶͳCLIΛϙʔλϒϧʹ࡞Δͷʹ͍͍ͯΔ • goyacc (yacc) ෳࡶͳจ๏ͷύʔαʔʹ͍͍ͯΔ
ܰྔϚʔΫΞοϓݴޠ ͍͠ • ਓؒʹͱͬͯͷಡΈॻ͖͢͠͞ͱ ػցʹͱͬͯͷಡΈॻ͖͢͠͞ҟͳΔ • ݫ֨ͳจ๏نଇʹैΘͤΔύʔαʔΑΓ ޡΓగਖ਼ͯ͘͠ΕΔ΄͏͕࣮༻తͳͷͰ?
ύʔαʔ࡞Γָ͍͠ • Ͱ͖Δ͜ͱɺΓ͍ͨ͜ͱɺؔ৺ͷ͋Δ͜ͱ͕ ͏·͘όϥϯε͞Εͨඪ • WebͱςΩετॲཧ • খ͞ͳඪΛগͣͭ͠ੵΈॏͶ͍͚ͯΔ • ʮࠓϦετه๏ͷ࣮͕Ͱ͖ͨͧʯ
ڵຯΛ࣋ͬͯ͘Εͨਓ • ·ͣJSONͷύʔαʔΛॻ͍ͯΈΔͱΑͦ͞͏ • RFC, relaxed JSON, etc. ʹൃలͤͯ͞ΈΔ •
࣍ࣈ۟ղੳثΛखॻ͖ͯ͠ΈΔ • ࣍ߏจղੳثखॻ͖ͯ͠ΈΔ