Upgrade to Pro — share decks privately, control downloads, hide ads and more …

二郎系ラーメンのコールで学ぶ AST 解析

Avatar for memory memory PRO
April 12, 2024

二郎系ラーメンのコールで学ぶ AST 解析

PHP カンファレンス小田原 2024
English Title: How to study an AST; for example, consider the process similar to ordering JIRO Ramen.

Avatar for memory

memory PRO

April 12, 2024
Tweet

More Decks by memory

Other Decks in Programming

Transcript

  1. 

  2. AST ͱ͸Կ͔ - AST ͸ Abstract Syntax Tree ͷུͰɼந৅ߏจ໦Λࢦ͠·͢ɻ -

    ͯ͞ɼ͜ͷ AST ͱ͍͏ϫʔυͩͬͨΓɼந৅ߏจ໦ͱ͍͏ݴ༿ʹύοͱདྷͳ ͍ํ΋ଟ͍ͷͰ͸ͳ͍Ͱ͠ΐ͏͔ɻ - ͦ΋ͦ΋ɼ͜ͷ AST ͸Ͳ͏͍͏໾ׂͳͷ͔ɼԿͷ໾ʹཱͭͷ͔Λղઆ͠· ͢ɻ  AST ͱ͸Կ͔
  3. ϓϩάϥϛϯάݴޠͷղऍ  AST ͱ͸Կ͔ ߏจղੳʢParseʣ ίϯύΠϧʢCompileʣ ࣮ߦ ػցޠʹ຋༁͠·͢ɻʢPHP ͷΑ͏ͳʣݴޠʹ Αͬͯ͸

    VM ্Ͱѻ͍΍͘͢͢ΔͨΊɼ
 தؒݴޠʹม׵͞ΕΔ͜ͱ΋͋Γ·͢ɻ ࣈ۟ղੳʢLexical Analysisʣ
  4. ϓϩάϥϛϯάݴޠͷղऍ  AST ͱ͸Կ͔ ߏจղੳʢParseʣ ίϯύΠϧʢCompileʣ ࣮ߦ ͜ΕΒͷॲཧΛஞ࣍ʹߦ͏ͷ͕ ΠϯλϓϦλݴޠͰ͢ɻ ࣈ۟ղੳʢLexical

    Analysisʣ Rust ΍ Go ͳͲ͸ػցޠʹ຋༁͞Ε·͢ɻ
 PHP ΍ Ruby ͸Ұ౓தؒݴޠʢΦϖίʔυʣʹ
 ຋༁͞Ε͔ͯΒ Zend VM , RubyVM ্Ͱ
 ࣮ߦ͞Ε·͢ Rust ΍ Go ͳͲ͸ػցޠʹ຋༁͞Ε·͢ɻ
 PHP ΍ Ruby ͸Ұ౓தؒݴޠʢΦϖίʔυʣʹ
 ຋༁͞Ε͔ͯΒ Zend VM , RubyVM ্Ͱ
 ࣮ߦ͞Ε·͢
  5. ϓϩάϥϛϯάݴޠͷղऍ  AST ͱ͸Կ͔ ߏจղੳʢParseʣ ίϯύΠϧʢCompileʣ ࣮ߦ ※1 Java ͸

    JVM ༻ͷதؒݴޠʹίϯύΠϧ͞Ε·͢͠ɼC/C++, Rust ͳͲ͸ػցޠ΁ίϯύΠϧ͞Ε·͢ɻ ࣈ۟ղੳʢLexical Analysisʣ ίϯύΠϧݴޠ͸
 தؒݴޠʢ·ͨ͸࣮ߦϑΝΠϧʣ※1 ΁ͷ
 ίϯύΠϧͷϓϩηε͕ "ඞͣ" ͋Γ
  6. ϓϩάϥϛϯάݴޠͷղऍ  AST ͱ͸Կ͔ ߏจղੳʢParseʣ ίϯύΠϧʢCompileʣ ࣮ߦ ίϯύΠϧݴޠ͔Ͳ͏͔͸ίϯύΠϧޙͷ
 ϑΝΠϧΛ഑෍Ͱ͖ɼͦΕ͕७ਮʹ OS

    ্Ͱ࣮ߦ Ͱ͖Δ͔ΛҰͭ؍఺ͱஔ͘ͱྑ͍͔ͳͱࢥ͍·͢ C, C++, Java, Rust ͳͲ͕ίϯύΠϥݴޠͰ͢※1 ※1 Go ΋ίϯύΠϥݴޠͱݴΘΕ͍ͯ·͕͢ go run ͷΑ͏ͳҰ࿈ͷॲཧΛߦͬͯ͘ΕΔػೳ΋උΘ͓ͬͯΓ೥ʑ
 ΠϯλϓϦλݴޠ΋ίϯύΠϥݴޠ΋ڥք͕ͳ͘ͳ͖͍ͬͯͯΔΑ͏ʹࢥ͍·͢ɻ΄͔ʹ΋ TypeScript ͷΑ͏ͳ
 τϥϯεύΠϥΛඞཁͱ͢Δݴޠ͸ҰݟίϯύΠϥݴޠͷΑ͏ʹݟ͑·͕͢ɼ࣮ଶ͸ΠϯλϓϦλݴޠͷ JavaScript Ͱ ಈ࡞͍ͯ͠ΔͷͰ… ͜ͷล͸ͦΜͳ΋ΜͳΜͩͳʔ͘Β͍Ͱଊ͓͍͑ͯͯ໰୊ͳ͍Ͱ͠ΐ͏ ࣈ۟ղੳʢLexical Analysisʣ
  7. AST ͷ໾ׂ - ϓϩάϥϛϯάݴޠͷॲཧͷྲྀΕͷΠϝʔδ͕෇͍ͨͱ͜ΖͰɼAST ͷ໾ׂͷ ࿩ʹ໭Γ·͠ΐ͏ɻ - ઌ΄Ͳͷߏจղੳͷ͋ͱͷ۩৅ߏจΛΑΓந৅Խ͠ҙຯΛ༩͑Δʢσʔλߏ଄ ʹ͢ΔʣϓϩηεͷҰ͕ͭɼ͜ͷந৅ߏจ໦ʢASTʣͳͷͰ͢ɻͭ·Γ AST

    ͸ σʔλߏ଄ͷҰछͰ΋͋Γ·͢ɻ - ϓϩάϥϜ্Ͱ͸ॏཁͳҙຯΛ࣋ͭΧοί΍ηϛίϩϯͳͲؚ͕·Ε͍ͯΔ΋ͷ Λ۩৅ߏจͱ͍͍ɼͦΕΒΛऔΓআ͍ͨ΋ͷ͕ந৅ߏจ໦ͩͱߟ͑ΔͱΘ͔Γ ΍͍͢Ͱ͠ΐ͏ɻ  AST ͱ͸Կ͔
  8. AST ͷ໾ׂ - ϓϩάϥϛϯάݴޠΛॲཧ͢ΔࡍʹɼϊΠδʔͳ৘ใؚ͕·ΕΔͱѻ͍ͮΒ͍ Ͱ͢ɻྫ͑͹ (a + b) * c

    Λॲཧ͢Δͱ͖ɼ͍͍ͪͪߏจղੳͷࡍʹ (a + b) Λ ॲཧ͢Δͷ͸൥Θ͍͠Ͱ͢ɻ - ॳΊ͔Βɼ(a + b) ͸ * c ΑΓ΋ܭࢉͷ༏ઌ౓͕ߴ͍ͱ͍͏ҙຯʹ߹க͍ͯ͠Δ σʔλߏ଄ʹͳ͍ͬͯΕ͹ྑ͍͸ͣͰ͢ɻ͜ΕΛදݱ͢Δͷʹ͓͍ͯ͸πϦʔ ঢ়ʹѻ͏ͷ͕౎߹͕ྑ͍ͷͰ͢ɻ  AST ͱ͸Կ͔ (a + b) * c
  9. ʮχϯχΫ͍Ε·͔͢ʁʯ͔Β࣮ߦ·Ͱͷϑϩʔ  ೋ࿠ܥϥʔϝϯͷίʔϧΛ AST ʹ͢Δ ࣈ۟ղੳʢLexical Analysisʣ ߏจղੳʢParseʣ ίϯύΠϧʢCompileʣ ࣮ߦʢϥʔϝϯͷఏڙʣ

    ೋ࿠ܥϥʔϝϯ͸ɼࣈ۟ղੳ͔Β
 ߏจղੳɼ࣮ߦ·Ͱελοϑ͕୲͏ͷͰ
 ΠϯλϓϦλϥʔϝϯͰ͢ ίϯύΠϧ͸ಛʹ͠ͳ͍ɻ
 ͢Δͱͨ͠ΒϑϦʔζυϥΠʹͳΔ͔ʢʁʣ χϯχΫ͍Ε·͔͢ʁ ʮχϯχΫϚγϚγ໺ࡊϚγϚγΧϥϝʯ
  10.  T_GARLIC T_ADD T_VEG T_ADD T_ADD T_ADD T_STONG RAMEN T_ADD

    Χϥʮϝʯ΋ɼ௥ՃͳͷͰ
 T_ADD Ͱ໰୊ͳ͍ χϯχΫΛ T_GARLIC ʹ ໺ࡊΛ T_VEG (ETABLE) ʹ Χϥ(ϝ) Λ T_STRONG ʹ ϚγΛ T_ADD ʹ
  11.  T_GARLIC T_ADD T_VEG T_ADD T_ADD T_ADD T_STONG RAMEN T_ADD

    ϥʔϝϯ΁ͷτοϐϯά৘ใ ʮχϯχΫʯʹର͢Δݸ਺ͷࢦࣔ ʮ໺ࡊʯʹର͢Δݸ਺ͷࢦࣔ ຯͷೱ͞ʹର͢Δࢦࣔ
  12.  RAMEN Attr\Garlic Attr\Veg Attr\Strong ADD: 2 ADD: 2 ADD:

    1 T_ADD 2 ͭ͸ T_GARLIC ʹଐ͢Δ৘ใ΁ T_ADD 2 ͭ͸ T_VEG ʹଐ͢Δ৘ใ΁ T_ADD ͸ T_STRONG ʹଐ͢Δ৘ใ΁ Attr\Garlic ͸ RAMEN ʹ
 ଐ͢Δ෇Ճ৘ใ Attr\Strong ͸ RAMEN ʹ
 ଐ͢Δ৘ใ Attr\Strong ͸ RAMEN ʹ
 ଐ͢Δ৘ใ
  13.  RAMEN Attr\Garlic Attr\Veg Attr\Strong ADD: 2 ADD: 2 ADD:

    1 Attr\Garlic ͸ ADD ͱ͍͏
 ύϥϝʔλΛ࣋ͭ෇Ճ৘ใ Attr\Veg ͸ ADD ͱ͍͏
 ύϥϝʔλΛ࣋ͭ෇Ճ৘ใ Attr\Strong ͸ ADD ͱ͍͏
 ύϥϝʔλΛ࣋ͭ෇Ճ৘ใ
  14. ܾఆత༗ݶΦʔτϚτϯʢDFAʣͷҰྫ  ࣈ۟ղੳثɾߏจղੳثͷ࢓૊Έ S1 S2 S3 S4 ॳظঢ়ଶ ˕͸डཧঢ়ଶ ೋ

    ϯ ೋ Ϋ S5 ໺ S..N Χ ຊདྷ͸͜ΕΒ΋͋Γ·͢͠ɼ
 ΋ͬͱεςʔτͷ਺΋ଟ͍Ͱ͕͢
 ࢴ෯ͷ౎߹ͰղઆΛলུ͠·͢ Σ = {χ, ϯ, χ, Ϋ} ೖྗ ϥϝ ࡊ
  15. ܾఆత༗ݶΦʔτϚτϯʢDFAʣͷҰྫ  ࣈ۟ղੳثɾߏจղੳثͷ࢓૊Έ S1 S2 S3 S4 ॳظঢ়ଶ ೋ ϯ

    ೋ Ϋ Σ = {χ, ϯ, χ, Ϋ} ೖྗ डཧঢ়ଶͰऴྃͨ͠ͷͰʮडཧʯ͞ΕΔ ʢτʔΫϯͱͯ͠ѻΘΕΔʣ
  16. ܾఆత༗ݶΦʔτϚτϯʢDFAʣͷҰྫ  ࣈ۟ղੳثɾߏจղੳثͷ࢓૊Έ S1 S2 S3 S4 ॳظঢ়ଶ e c

    h o Σ = {e, c, h, o} ೖྗ डཧঢ়ଶͰऴྃͨ͠ͷͰʮडཧʯ͞ΕΔ ʢτʔΫϯͱͯ͠ѻΘΕΔʣ
  17. ࣈ۟ղੳثɾߏจղੳثͷ࢓૊Έ - χϯχΫ ΍ Ϛγ Ͱडཧঢ়ଶʹͳͬͨλΠϛϯάͰ T_GARLIC ΍ T_ADD ͷ

    Α͏ͳτʔΫϯʹ෼ղ͞Ε·͢ɻ - ͜Ε͸ܗଶૉղੳͰ΋ίʔύεʢૹΓԾ໊ͱ͔͋ΔࣙॻΈ͍ͨͳ΋ͷʣΛ༻͍ Δ఺Ҏ֎ʹ͓͍ͯ͸ɼ͓͓ΑͦࣅͨΑ͏ͳ࢓૊ΈͰ͢ɻ͕ɼ࿩͕๲ΕΔͷͰׂ Ѫ͠·͢ɻ - ͦͯ͠ɼτʔΫϯʹ෼ղͨ͋͠ͱʹɼઌ΄Ͳ৮Εͨ LL ๏΍ LR ๏ͳͲΛ༻͍ ͯந৅ߏจ໦ʢASTʣʹ͍͖ͯ͠·͢ɻ؆қతʹ࡞ΔͳΒ͹࣮૷͕ൺֱత༰қ ͳ LL ๏ͳͲ͕͓͢͢ΊͰ͢ɻ  ࣈ۟ղੳثɾߏจղੳثͷ࢓૊Έ χϯχΫ Ϛγ T_GARLIC T_ADD
  18. ࣈ۟ղੳثɾߏจղੳثͷ࢓૊Έ - ߏจղੳثͰ͸จͷ్தʹ͋Δ + ΍ - ͳͲ͔Βߏจ໦Λ࡞͍͖ͬͯ·͢ɻLL ๏ͩͱಛʹ͜͏͍ͬͨ΋ͷΛ BNF ه๏ͳͲΛ༻͍ͯهड़͠·͢ɻPHP

    ͷΑ͏ ͳݴޠΛߏจղੳ͢Δʹ͸ BNF ه๏ʹࣅͨܗͰఆٛͰ͖ΔɼYacc (Yet Another Compiler Compiler) ΍ Bison ͳͲͷύʔαδΣωϨʔλ͕༻͍ΒΕ ͍ͯ·͢ɻ - PHP ຊମʹ΋͋Γ·͢: https://github.com/php/php-src/blob/php-8.3.4/ Zend/zend_language_parser.y  ࣈ۟ղੳثɾߏจղੳثͷ࢓૊Έ
  19. ࣈ۟ղੳثɾߏจղੳثͷ࢓૊Έ - ΋ͪΖΜ࢖Θͳͯ͘΋ߏจղੳثΛؾ߹Ͱ࣮૷ͯ͠ந৅ߏจ໦ʹ͢Δ͜ͱ΋Մ ೳͰ͢ɻ - ڈ೥ PHP Ͱࡶʹ࡞ͬͨి୎Ͱɼ4 * sqrt((5!

    + 2 ** 8) - (4! + 2 ** 8)) + 5 ͷΑ͏ ͳෳࡶͳܭࢉࣜΛܭࢉͰ͖ΔΑ͏ͳࣈ۟ղੳͱߏจղੳثʢந৅ߏจ໦ʣʹม ׵࣮ͯ͠ߦ·ͰҰ࿈Ͱߦ͏αϯϓϧ͕͋ΔͷͰɼࢀߟʹͰ͖Δͱࢥ͍·͢ɻ - https://gist.github.com/m3m0r7/a5a5af3eb8d118fae78f6a9e18657ec4  ࣈ۟ղੳثɾߏจղੳثͷ࢓૊Έ ※1 PHP ຊମʹ΋͋Γ·͢ https://github.com/php/php-src/blob/php-8.3.4/Zend/zend_language_parser.y 4 * sqrt((5! + 2 ** 8) - (4! + 2 ** 8)) + 5
  20. PHP ʹஔ͖׵͑ͯΈΑ͏ - PHP Ͱந৅ߏจ໦ʢASTʣʹ͢Δ༗໊Ͳ͜Ζͱ͍͑͹ɼnikic/PHP-Parser Ͱ ͠ΐ͏͔ɻPHPStan ͳͲ͕༻͍͍ͯ·͢ʢ※1ʣɻ - ͦ΋ͦ΋ɼnikic/PHP-Parser

    ͸Ͳ͏͍͏࢓૊ΈͰಈ͍͍ͯΔͷͰ͠ΐ͏͔ɻ - ࣮͸ɼ͍͍ͩͨઌ΄Ͳͷϥʔϝϯೋ࿠ͷίʔϧΛߏจղੳ͢Δͷͱಉ͡࢓૊Έ Ͱ͢ɻͲ͏͍͏͜ͱͰ͠ΐ͏͔ɻ࣍ͷϖʔδͰղઆ͍͖ͯ͠·͢ɻ  PHP ʹஔ͖׵͑ͯΈΑ͏ ※1: https://github.com/phpstan/phpstan-src/blob/1.11.x/composer.json#L24
  21. 

  22.  Namespace Echo BinaryOp\Concat BinaryOp\Plus String BinaryOp\Plus Int Int Int

    …stmts …exprs BinaryOp\Concat exprs stmts value: \n value: 3 value: 2 value:1 left left left right right right ʢ஫ʣStmt_ ΍ Expr_ , Scalar_ ͸লུ
  23.  Namespace Echo BinaryOp\Concat BinaryOp\Plus String Int Int …stmts …exprs

    BinaryOp\Concat exprs stmts value: \n value: 3 value: 3 left left right right BinaryOp\Plus ʹΑͬͯՃࢉ
  24.  Namespace Echo BinaryOp\Concat Int String …stmts …exprs BinaryOp\Concat exprs

    stmts value: \n left right value: 6 BinaryOp\Plus ʹΑͬͯՃࢉ
  25.  Namespace Echo BinaryOp\Concat …stmts …exprs String exprs stmts right

    value: 6\n BinaryOp\Concat ʹΑͬͯ 6 ͱ \n ͕݁߹