Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Goで実装する軽量マークアップ言語パーサー / Gohn: parser written in Go
Search
aereal
August 04, 2017
3
3.8k
Goで実装する軽量マークアップ言語パーサー / Gohn: parser written in Go
talked at builderscon tokyo 2017
aereal
August 04, 2017
Tweet
Share
More Decks by aereal
See All by aereal
盆栽転じて家具となる / Bonsai and Furnitures
aereal
0
4.8k
How to send distibuted traces to Datadog using build own OpenTelemetry-Lambda distribution
aereal
3
280
好きな技術《コト》で、 生きていく技術 / life with what you like
aereal
5
4k
qron: Cloud Native Cron Alternativeの今
aereal
2
2.7k
自動作曲入門 / introduction to programatic music composition
aereal
1
530k
はてなブログ タグとCDK / The epic of AWS CDK and Hatena Blog Tag
aereal
2
200k
はてなブログ タグの技術選択 / The technical details of Hatena Blog Tag
aereal
3
200k
ブログサービスのHTTPS化を支えたAWSで作るピタゴラスイッチ / The construction of large scale TLS certificates management system with AWS
aereal
3
400k
AWSではてなブログの常時HTTPS配信をバーンとやる話 / The Epic of migration from HTTP to HTTPS on Hatena Blog with AWS
aereal
14
18k
Featured
See All Featured
Keith and Marios Guide to Fast Websites
keithpitt
411
22k
"I'm Feeling Lucky" - Building Great Search Experiences for Today's Users (#IAC19)
danielanewman
229
22k
Done Done
chrislema
184
16k
Visualization
eitanlees
146
16k
GitHub's CSS Performance
jonrohan
1031
460k
Why You Should Never Use an ORM
jnunemaker
PRO
58
9.4k
The Cost Of JavaScript in 2023
addyosmani
51
8.5k
Music & Morning Musume
bryan
46
6.6k
Cheating the UX When There Is Nothing More to Optimize - PixelPioneers
stephaniewalter
281
13k
Save Time (by Creating Custom Rails Generators)
garrettdimon
PRO
31
1.3k
How To Stay Up To Date on Web Technology
chriscoyier
790
250k
Design and Strategy: How to Deal with People Who Don’t "Get" Design
morganepeng
130
19k
Transcript
GoͰ࣮͢Δ ܰྔϚʔΫΞοϓݴޠ ύʔαʔ id:aereal @ builderscon tokyo 2017
͢͜ͱ • ܰྔϚʔΫΞοϓݴޠͱͯͳه๏ʹ͍ͭͯ • ςΩετॲཧͱύʔαʔδΣωϨʔλʔͷඞཁੑ • Go/goyaccʹΑΔͯͳه๏ύʔαʔͷհ • goyaccͷԠ༻ࣝ
ࣗݾհ • id:aereal • GitHub: aereal • גࣜձࣾͯͳ ΞϓϦέʔγϣϯΤϯδχΞ
⚠͓͜ͱΘΓ⚠ • αʔϏεΛ৮͍ͬͯͯײͨ͡ ݸਓతͳ՝ҙࣝʹجͮ͘ϓϥΠϕʔτϫʔΫͰ͢ • αʔϏεʹ࠾༻͞ΕΔ͔ෆ໌
ࢀߟใ • http://b.hatena.ne.jp/aereal/2017gokyoto/ • ͯͳϒοΫϚʔΫͰλάΛ͚ͯϒΫϚ͍ͯ͠·͢
ܰྔϚʔΫΞοϓݴޠͱ ͯͳه๏
ܰྔϚʔΫΞοϓݴޠͱ • LML = Lightweight Markup Language • HTMLXMLͱϓϨʔϯςΩετͷதؒʹ͋Δ •
Markdown, Textile, ͯͳه๏, etc.
ܰྔϚʔΫΞοϓݴޠͱ • LML = Lightweight Markup Language • HTMLXMLͱϓϨʔϯςΩετͷதؒʹ͋Δ •
Markdown, Textile, ͯͳه๏, etc.
ͯͳه๏ͱ • ͯͳ͕ఏڙ͢Δ͍͔ͭ͘ͷαʔϏεͰ͑ΔLML • ͯͳϒϩάɺͯͳμΠΞϦʔɺetc. • HTMLʹม͞ΕΔศརͳه๏ • org-modeͱͪΐͬͱࣅ͍ͯΔจ๏
* ݟग़͠1 ** ݟग़͠2 [http://127.0.0.1/:title=΅͘ͷIPͰ͢] - Ruby - Perl -
Go + ى + ঝ + స + ݁
<h1>ݟग़͠1</h1> <h2>ݟग़͠2</h2> <p> <a href="http://127.0.0.1/">΅͘ͷIPͰ͢</a> </p> <ul> <li>Ruby</li> <li>Perl</li> <li>Go</li>
</ul> <ol> <li>ى</li> <li>ঝ</li> <li>స</li> <li>݁</li> </ol>
࣮͍Ζ͍Ζ • ͯͳϒϩάɺͯͳμΠΞϦʔɺͯͳάϧʔϓ • Text-Hatena (CPAN) • Text-Xatena (CPAN) •
chris4403/WikiTextConverter • motemen/pandoc
࣮͍Ζ͍Ζ • ༷ ≈ ࣮ • ࣮͕͍Ζ͍Ζ͋Δ • ͭ·Γ •
࣮ͷ͚༷͕ͩଘࡏ͢Δ • ༷ΛΔʹPerlͱਖ਼نදݱΛಡΈղ͘ඞཁ͕͋Δ
खࠒͳ࣮͕ແͯ͘ࠔΔ • PerlҎ֎Ͱॻ͔ΕͨΞϓϦέʔγϣϯͰ ͯͳه๏Λ͑ΔΑ͏ʹ͍ͨ͠ɺ͚Ͳ…… • Perlͷ֦ுਖ਼نදݱΛۦ͍ͯ͠ΔͷͰҠ২େม • HTMLม·ͰΔύʔαʔ͕ଟ͍
ϙʔλϏϦςΟ • ϒϥβͰϥΠϒϓϨϏϡʔͱ͔͍ͨ͠͡ΌΜ • ೖྗʹର͢Δग़ྗ (AST) ͚ͩΛܾΊ͍ͨ • PerlGoScala, JavaScriptͦͷଞͰॻ͖͍ͨ
HTMLม·ͰΓͨ͘ͳ͍ • ଟ͘ͷύʔαʔ࣮͕HTMLม·Ͱߦ͏ • ҰํɺೖྗʹͲΕ͘Β͍HTMLΛڐՄ͢Δ͔ αʔϏεຖ (!= ύʔαʔ࣮ຖ) ʹҟͳΔ •
→ ύʔαʔͱHTMLมΛ͍ͨ͠
͜Μͳͯͳه๏ύʔαʔ͕ ΄͍͠ • ϦϑΝϨϯεͨΓ͏Δૉͳ࣮ • = ਖ਼نදݱͰͳΜͱ͔͠Α͏ͱ͍͗ͯ͢͠ͳ͍ • ύʔε݁Ռ͕HTMLͰͳ͘தؒදݱ͕ಘΒΕΔ
ࡾߦͰ·ͱΊΔͱ • AST͘Ε!!!
࣍ճ༧ࠂ • ಛఆͷݴޠʹґଘ͠ͳ͍ ྑ͍͔Μ͡ͷςΩετॲཧάοζͳ͍ͷ͔ • ͨͩ͠ (֦ு) ਖ਼نදݱҎ֎ • Αͦ͞͏ͳςΩετॲཧٕज़Λ୳͠ʹ͍͖·͢
ςΩετॲཧͱ ύʔαʔδΣωϨʔλʔ
ςΩετॲཧͻͱΊ͙Γ • ςΩετॲཧͷςΫχοΫΛ͍Ζ͍Ζհ • έʔεʹΑͬͯύʔαʔΛॻ͘·Ͱͳ͔ͬͨΓ͢Δ
τʔΫϯͷग़ݱҐஔ "id:aereal".substring(3) // => "aereal"
τʔΫϯͷग़ݱҐஔ • τʔΫϯͷग़ݱҐஔ͕ݻఆͳΒ͜Ε͘Β͍Ͱ • Մมͩͱഁ͢Δ • ͓ͦͯ͠Αͦͷจ๏ՄมͷτʔΫϯ͔Γ
ਖ਼نදݱ /id:(.+)/.match("id:aereal")[1] // => "aereal"
ਖ਼نදݱ • ׅހͷඇରԠݕग़Ͱ͖ͳ͍ • (POSIXͷਖ਼نදݱͰෆՄɺ Perlͷ֦ுਖ਼نදݱͰͰ͖ͨͣ) • ҰຊΓ͕Ͱ͖ͳ͔ͬͨΒɺ ޙड़ͷঢ়ଶཧΛߦ͏ඞཁ͕͋Δ
ঢ়ଶભҠΛཧ var isInIdNotation = false; while (1) { if (isInIdNotation)
{ var name = readText(); // => "aereal" } else { switch (readChar()) { case ':': isInIdNotation = true; default: // ... } } }
ঢ়ଶભҠΛཧ var isInIdNotation = false; var isInHeading = false; var
isInUnorderedList = false; var isInOrderedList = false; while (1) { if (isInIdNotation) if (isInHeading) if (isInUnorderedList) if (isInOrderedList) }
None
• ͲΕจ๏Λ၆ᛌͮ͠Β͍ • ϞδϡʔϧԽ͕͍͠ • → খ͍͞෦ΛੵΈ্͍͛ͯ͘ελΠϧͰ࡞ΕͨΒ……
ͦ͜Ͱyacc • ύʔαʔδΣωϨʔλʔͷ1ͭ • BNFʹࣅͨߏจنଇ͔ΒύʔαʔΛੜ͢Δ • ෳͷنଇΛΈ߹Θͤͯ1ͭͷنଇΛ࡞Γ্͛Δ • ίʔϧόοΫελΠϧͰ نଇΛϓϩάϥϜʹม͢Δ
(ؐݩɺreduce)
https://tools.ietf.org/html/rfc7230 HTTP-Message = start-line *( header-field CRLF) CRLF [ message-body]
start-line = request-line / status-line
yacc • BNFͱ͍͏நతͳํ๏Ͱطड़Ͱ͖Δͷ͕Α͍ • ݴޠDSLʹରͯ͠ϙʔλϏϦςΟͰ༏Δ • ϨΩαʔ (ࣈ۟ղੳث) ผ్࣮͢Δඞཁ͕͋Δ •
ߏจنଇͷίʔϧόοΫ෦͕ ΤσΟλͰϋΠϥΠτ͞Εͳ͍ (ͳʹ͔͍͍ํ๏͋Γͦ͏)
࣍ճ༧ࠂ • yaccΑͦ͞͏ͱ͍͏͜ͱ͕Θ͔ͬͨ • GoͱyaccΛΈ߹ΘͤΒΕΔͷ͔ • ͨͯͯ͠ͳه๏ύʔαʔΛ࡞Δ͜ͱ͕Ͱ͖Δͷ͔
https://git.io/v7gcD github.com/aereal/gohn
gohn • Written in Go w/goyacc • pronounce as `gone`
• ओཁͳه๏࣮ࡁΈ
gohnͷσβΠϯ • ඪ४ೖྗ͔Βͯͳه๏Λड͚औΓɺ • ඪ४ग़ྗʹASTΛJSONʹγϦΞϥΠζͯ͠ग़ྗ͢Δ • → HTMLͷมผ్࣮͢Δ • ͱͯUNIXత
AST • JSONʹγϦΞϥΠζ • JSON schemaΛެ։͍ͯ͠Δ • εΩʔϚ͔ΒHTMLมثΛࣗಈੜ͢Δ͜ͱͰ͖ͦ͏ • https://github.com/aereal/gohn/blob/master/schema.json
Goͱyacc • goyaccͱ͍͏πʔϧ͕͋Δ • go get golang.org/x/tools/cmd/goyacc • ΞΫγϣϯΛGoͰॻ͚Δ
Goͱࣈ۟ղੳ • ࣈ۟ղੳ = ಡΜͩจࣈ͕ͲΜͳҙຯΛ࣋ͭͷ͔ฦ͢ • text/scannerͱ͍͏ඪ४ύοέʔδ͕ศར • ڍಈΛΧελϚΠζͰ͖Δ •
τʔΫϯΛফඅͨ͠࠷ޙͷҐஔΛهͯ͘͠ΕΔͷͰ Τϥʔϝοηʔδͷߏஙָ͕
σϞ
Ԡ༻ฤ
HTTPه๏ [http://example.com/] # <a href="http://example.com/"> # http://example.com/ # </a> [http://127.0.0.1/:title=΅͘ͷIP]
# <a href="http://example.com/"> # ΅͘ͷIP # </a>
HTTPه๏ • ΞϯΧʔϦϯΫʹม͞ΕΔه๏ • ඌʹల։࣌ͷΦϓγϣϯΛ `:` ʹଓ͚ͯطड़Ͱ͖Δ • `:` URLͷҰ෦ʹݱΕΔ͜ͱ͕͋Δ
• → ࣍ͷ1จࣈΛಡΉ͚ͩͰ:titleͷ։͔࢝அͰ͖ͳ͍
࠷ॳʹݱΕΔ `:` εΩʔϜ෦ͱݟͳͯ͠ແࢹ͢Δ͜ͱʹ if !l.seenColon { l.seenColon = true return
false // maybe part of URL } else { return true } https://github.com/aereal/gohn/blob/master/parser/ lex.go#L100
࠶ؼతͳϧʔϧ • N > 1ͷࢠنଇ͔ΒͳΔنଇͷॻ͖ํ • appendͷॱ൪͚ͩؒҧ͑ͳ͍Α͏ʹ
http_options: http_option { $$ = []string{$1} } | http_option http_options
{ options := $2 $$ = append([]string{$1}, options...) }
ςετ • Table-driven tests͕Φεεϝ • https://github.com/golang/go/wiki/TableDrivenTests • lexerΛؚΉparserͷػೳςετ͚ͩͰेͩͱࢥ͏ • https://github.com/aereal/gohn/blob/master/parser/
parser_test.go#L17
σόοά • tokenͷࣝผࢠ (int) ͔Β໊લ (string) Λ ٯҾ͖͢ΔϝιουΛఆ͓ٛͯ͘͠ͱศར • print͢ΔʹͤΑσόοΨΛ͏ʹͤΑ
• https://github.com/aereal/gohn/blob/master/parser/ lex.go#L29
·ͱΊ
Go/goyaccศར • GoෳࡶͳCLIΛϙʔλϒϧʹ࡞Δͷʹ͍͍ͯΔ • goyacc (yacc) ෳࡶͳจ๏ͷύʔαʔʹ͍͍ͯΔ
ܰྔϚʔΫΞοϓݴޠ ͍͠ • ਓؒʹͱͬͯͷಡΈॻ͖͢͠͞ͱ ػցʹͱͬͯͷಡΈॻ͖͢͠͞ҟͳΔ • ݫ֨ͳจ๏نଇʹैΘͤΔύʔαʔΑΓ ޡΓగਖ਼ͯ͘͠ΕΔ΄͏͕࣮༻తͳͷͰ?
ύʔαʔ࡞Γָ͍͠ • Ͱ͖Δ͜ͱɺΓ͍ͨ͜ͱɺؔ৺ͷ͋Δ͜ͱ͕ ͏·͘όϥϯε͞Εͨඪ • WebͱςΩετॲཧ • খ͞ͳඪΛগͣͭ͠ੵΈॏͶ͍͚ͯΔ • ʮࠓϦετه๏ͷ࣮͕Ͱ͖ͨͧʯ
ڵຯΛ࣋ͬͯ͘Εͨਓ • ·ͣJSONͷύʔαʔΛॻ͍ͯΈΔͱΑͦ͞͏ • RFC, relaxed JSON, etc. ʹൃలͤͯ͞ΈΔ •
࣍ࣈ۟ղੳثΛखॻ͖ͯ͠ΈΔ • ࣍ߏจղੳثखॻ͖ͯ͠ΈΔ