2017.Nov 福岡Ruby会議02 でのセッション資料です。Parser書いたら楽しいよというお話。
RUBYͰॻ͘PARSER (ࣗྗ͔ϥΠϒϥϦ͔ɺͦΕ͕ͩʣ2017.Nov ԬRubyձٞ02 ௗҪઇ
View Slide
ௗҪઇࣗݾհw גࣜձࣾສ༿ۈw 3BJMTΞϓϦέʔγϣϯΤϯδχΞw 3BJMT(JSMT5PLZPOEΦʔΨφΠβʔw ༁ॻʹw ʰϓϩάϥϛϯά&MJYJSʱ %BWF5IPNBTɺΦʔϜࣾ ాߞҰͱڞ༁w ʰϧϏΟͷ΅͏͚ΜʱγϦʔζ ϦϯμɾϦΧεஶɹᠳӭࣾ
ௗҪઇࣗݾհੜ·ΕԬ౦۠ശ࡚খֶߍஜࢵঁֶԂதֶߍஜࢵঁֶԂߴֶߍ=> େֶ͔Β౦ژ͍ͭͷؒʹ͔ϓϩάϥϚʹ2012ԬRubyձٞ01 LT2015 RailsGirls Fukuoka ίʔν Ԭ ظ ؒ
RUBY Ͱॻ͘PARSER͘͡• ͳͥParserΛॻ͘͜ͱʹͳͬͨͷ͔• ઃܭ• ParserΉ͔͍ͣ͠ʂ• Treetop ͱ͍͏ gem• ͦΕͰָ͍ࣗ͠࡞ParserʢͨͿΜΊͨ΄͏͕͍͍ʣ
ͳͥPARSERΛॻ͘͜ͱʹͳͬͨͷ͔ͦΕͪΐͬͱͨ͠ग़དྷ৺ͩͬͨ
RUBYʹELIXIRʢ̍Έ͍ͨͳ ύλʔϯϚονϯάʢ2΄͍͠…1 Elixir: Erlang VM্Ͱಈ͘ϓϩάϥϛϯάݴޠ 2 ύλʔϯϚονϯάɿElixirʹ͋Δ͔͍͍ͬ͜ػೳ
=~ ͰϚονͤͯ͞ύλʔϯมͱ͍͍ͯͨ͠RUBYͰॻ͖͍ͨύλʔϯϚον
=== ͰϚον͍ͤͨ͞ʢCASEจॻ͖͍ͨʣRUBYͰॻ͖͍ͨύλʔϯϚον
͜Εʢ͕̍ಈ͘ͱ આಘྗ͕͋ΔΜ͡Όͳ͍͔…ʢ21 ࠷ॳ༷͚ͩເݟͯͨ 2 ݁ہઆಘྗ͕͔͋ͬͨෆ໌
ͬͯΈͨ ʢRUBY KAIGI YOUTUBE HTTPS://WWW.YOUTUBE.COM/WATCH?V=1M4IPJH0K0E&INDEX=19&T=6S&LIST=PL
RUBYΠϯλϓϦλͷCͷίʔυͷࠩ͜Ε͚ͩͬͯΈͨcompile.cparse.y
ʢಈ͚ྑ͠ɺύϑΥʔϚϯεͳͲߟ͑ͳ͍ͷͱ͢ΔʣઃܭRuby scriptParseCompileRuby byte code Evaluator
ʢಈ͚ྑ͠ɺύϑΥʔϚϯεͳͲߟ͑ͳ͍ͷͱ͢ΔʣઃܭRuby scriptParseCompileRuby byte codePatternMatching%p([a, ‘bc’]) =~ [3, ‘bc’]“[a, ‘bc’]”มϦετ[“a”]มͷఆٛEvaluatorpattern_match objParse patternBinding ΛͱΔͨΊʹ͝ʹΐΔASTߏஙϚον͢Δ͔νΣοΫมೖRubyͷClass
ʢಈ͚ྑ͠ɺύϑΥʔϚϯεͳͲߟ͑ͳ͍ͷͱ͢ΔʣઃܭRuby scriptParseCompileRuby byte codePatternMatching%p([a, ‘bc’]) =~ [3, ‘bc’]“[a, ‘bc’]”มϦετ[“a”]มͷఆٛEvaluatorpattern_match objParse patternBinding ΛͱΔͨΊʹ͝ʹΐΔRubyͷClassࠓͷίί‼︎ASTߏஙϚον͢Δ͔νΣοΫมೖ
ʢPATTERN MATCHING ΫϥεͷʣPARSERͷΔ͜ͱྫ͑ `%p ([a, ‘bc’])`ͱ͍͏ύλʔϯ͕ࢦఆ͞Εͨ߹ɺ “[a, ‘bc’]” ͱ͍͏จࣈྻΛड͚औͬͯ…• छྨ:ʮྻʯͰ͋Δ• ཁૉͷҰ൪͕มaͰ͋Δ• ཁૉͷೋ൪͕จࣈྻ ͷ ‘bc’ Ͱ͋Δ• ඞཁͳύλʔϯมͷϦετɿ[a]Ͱ͋Δ͜ͱΛղੳͯ͠ɺߏʹ͢Δ
%p([a, ‘bc’]) =~ [3, ‘bc’]PARSEͷྲྀΕ“[a, `bc`]”[ ͱa ͱ, ͱ`bc` ͱ]TokenizeจࣈྻTokensASTASTߏஙString Node (‘bc’)Array NodeVariable Node (a)1 AST࡞Δͱ͖ʹࠓճύλʔϯมϦετ࡞Δ
PARSEͷྲྀΕ“{status: 200, users: [a, b] }”{ ͱstatus: ͱ200 ͱ, ͱ users: ͱ[ ͱaͱ, ͱb ͱ] ͱ}TokenizeจࣈྻTokensASTߏங%p({status: 200, users: [a, b] }) =~ {status: 200, users: [1, 3] }ASTVariable Node (b)val:Array NodeVariable Node (a)Hash Nodeval: Integer Node (200)key: Symbol Node (:status)key: Symbol Node (:users)
࠷ऴతʹཉ͍͠ͷAST%p({status: 200, users: [a, b] }) =~ {status: 200, users: [1, 3] }ASTVariable Node (b)val:Array NodeVariable Node (a)Hash Nodeval: Integer Node (200)key: Symbole Node (:status)key: Symbole Node (:users){status: 200, users: [a, b] }ɹASTΛḷͬͯɺͱύλʔϯͱϚον͢Δ͔ΛௐΔϚονରͦͦhash?key ͕ status: ͷ val 200?key ͕users: ͷ val ྻʁྻͷཁૉ2?ྻͷཁૉͷ1൪Λมaʹ֨ೲ͠Αʔྻͷཁૉͷ2൪Λมbʹ֨ೲ͠Αʔ
Ή͔͔ͣͬͨ͠ ʢͱ͘ʹTOKENIZE ʣ
Tokenize“[a, `bc`]”[ ͱa ͱ, ͱ`bc` ͱ]TokenizeจࣈྻTokensTokensToken ͷλΠϓΛݟͯɺʮ͓ͬྻͷ։͖ه߸͕དྷ͔ͨΒɺ͜ͷޙྻ͕ด͡Δ·ͰྻͷதͩͳʯΈ͍ͨʹASTΛ࡞ͬͯΏ͘ λΠϓ[ ྻͷ։͖ه߸a ม, ΧϯϚ`bc` จࣈྻɹ] ྻͷด͡ه߸
ͬͨͷStringScanner#scanTokenize• StringScanner#scan• จࣈྻΛ಄͔ΒεΩϟϯͯ͠ɺਖ਼نදݱʹϚονͨ͠ΒϚον෦Λฦͯͦ͠ͷޙΖ·ͰindexΛ͢͢ΊΔ“[a, `bc`]” [a,`bc`]ਖ਼نදݱ λΠϓ/\[/ ྻͷ։͖ه߸/[a-z_][a-z0-9_]*/ ม/,/ ΧϯϚ/'.*?'/ จࣈྻɹ/\]/ ྻͷด͡ه߸“a, `bc`]”“`bc`]”“]”“[a, `bc`]”จࣈྻ TokensScan
ίϛοτ࣌ʹྫɿεϖʔε͕2ͭҎ্ʹͳΔͱࣦഊ͢Δόά“[a, `bc`]” “[a, `bc`]”
ͯ͠ͳ͍ʢ͕ΜΔʣྫɿࣗ͘͝વʹ{} Λলུͯ͠͏͔͝ͳ͍ϋογϡ%p({ user: 1, from: ‘Fukuoka’})%p( user: 1, from: ‘Fukuoka’ )
TOKENIZEʹҰͷਖ਼نදݱηοτ͔͠దԠͰ͖ͳ͍ྫɿ͋ΔλΠϓͷTOKENIZEಠࣗϧʔϧͳͲ͕ѻ͍͑ͯͳ͍“Name is #{user.name}”
• %p( [x, :y, { "array" => [5, v] }] ) ͘Β͍·ͰParseͰ͖ΔΑ͏ʹͳͬͨ• ࣗྗͰҰ͔ΒParserΛॻ͘ͷ͔ͳΓߝΓ• ֦ுੑʹݶք͋ΔʢΘͨ͠ʹʣ
PARSERΛॻ͍ͯΈΔͱ…• ࠓ·Ͱࣗ͘͝વʹಡΈॻ͖͍ͯͨ͠`[1, 2, 3]` `{status: 200, users:[1, 2] }`ͳͲ͕ɺ ಥવʮ͜Ε͔Βղऍ͞ΕΔʢ·ͩҙຯΛ࣋ͨͳ͍ʣจࣈྻʯͱͯ͠ͷલʹݱΕΔ• εϖʔεɺΧϯϚɺͯ͢ʹҙຯ͕͋Δ• Rubyຊମͷparse͍͢͝• ਓؒͷ͍͢͝
·͞ʹʮ͏ҰɺRUBYͱग़ձ͏ʯମݧ
https://github.com/cjheath/treetopͱ͜ΖͰTreetopͱ͍͏gem͕͋Γ·͢• PEGϕʔεͷಠࣗͷهड़ํࣜͰਖ਼نදݱͳͲΛͬͯจ๏ϧʔϧΛఆٛ͢Δ.treetopϑΝΠϧΛͭ͘Δ• ttίϚϯυʹͦͷϑΝΠϧΛ͢ͱɺͦΕΛݩʹrubyͷparserϑΝΠϧΛ࡞ͬͯ͘ΕΔ• ੜ͞ΕͨrubyϑΝΠϧΛrequire ͢Δ͜ͱͰɺsyntaxnode, ͍ΘΏΔASTΛߏங͢ΔParserΛ͏͜ͱ͕Ͱ͖Δ• ϧʔϧͷωετͷهड़༰қ
࠷ॳ͔ΒTREETOPΛ ͑ྑ͔ͬͨͷͰ…
ࣗ࡞PARSERͱTREETOPൺֱදࣗ࡞ Treetopهड़ͷચ࿅ϧʔϧͷωετόάͷग़ʹ͘͞Rubyͱग़ձ͑Δ
ʢෛ͚੯͠Έ͚ͩͰͳ͍ʣͦΕͰָ͍ࣗ͠࡞PARSER• ͦͦ࡞Γ࢝Ίͨஈ֊ͰʮParserʯͱ͍͏ͷ͕΅ΜΓ͔͠ཧղͰ͖ͯͳ͔ͬͨ• ͜ͷஈ֊ͰTreetopΛͬͯɺநෛ͚͍ͯ͜͠ͳͤͳ͔ͬͨͷͰͳ͍͔• ͍·͍ํ͕Θ͔Βͳͯ͘Treetopͷੜͨ͠Ruby ParserΛಡΉͱؾ͕࣋ͪΘ͔Δ• ࣗͷίʔυ͕શ෦จࣈྻʹݟ͑ΔମݧϓϥΠεϨε• ंྠͷ࠶ൃ໌Ͱ͍͍ɺंྠ͕৺ͷதʹΈཱͯΒΕΔͷେࣄ
ˎ͋ΔఔҎ্ෳࡶͳ͜ͱΛ͠Α͏ͱ͢Δͱߦ͖٧·ΔɺͦΖͦΖΓ͑Δͷ͕٢ˎ
ԿͰ RUBYʹग़ձ͍͖ͬͯ·͠ΐ͏ɺ ͋Γ͕ͱ͏͍͟͝·ͨ͠ɻ