Upgrade to Pro — share decks privately, control downloads, hide ads and more …

二郎系ラーメンのコールで学ぶ AST 解析

memory
April 12, 2024

二郎系ラーメンのコールで学ぶ AST 解析

PHP カンファレンス小田原 2024
English Title: How to study an AST; for example, consider the process similar to ordering JIRO Ramen.

memory

April 12, 2024
Tweet

More Decks by memory

Other Decks in Programming

Transcript

  1. 

  2. AST ͱ͸Կ͔ - AST ͸ Abstract Syntax Tree ͷུͰɼந৅ߏจ໦Λࢦ͠·͢ɻ -

    ͯ͞ɼ͜ͷ AST ͱ͍͏ϫʔυͩͬͨΓɼந৅ߏจ໦ͱ͍͏ݴ༿ʹύοͱདྷͳ ͍ํ΋ଟ͍ͷͰ͸ͳ͍Ͱ͠ΐ͏͔ɻ - ͦ΋ͦ΋ɼ͜ͷ AST ͸Ͳ͏͍͏໾ׂͳͷ͔ɼԿͷ໾ʹཱͭͷ͔Λղઆ͠· ͢ɻ  AST ͱ͸Կ͔
  3. ϓϩάϥϛϯάݴޠͷղऍ  AST ͱ͸Կ͔ ߏจղੳʢParseʣ ίϯύΠϧʢCompileʣ ࣮ߦ ػցޠʹ຋༁͠·͢ɻʢPHP ͷΑ͏ͳʣݴޠʹ Αͬͯ͸

    VM ্Ͱѻ͍΍͘͢͢ΔͨΊɼ
 தؒݴޠʹม׵͞ΕΔ͜ͱ΋͋Γ·͢ɻ ࣈ۟ղੳʢLexical Analysisʣ
  4. ϓϩάϥϛϯάݴޠͷղऍ  AST ͱ͸Կ͔ ߏจղੳʢParseʣ ίϯύΠϧʢCompileʣ ࣮ߦ ͜ΕΒͷॲཧΛஞ࣍ʹߦ͏ͷ͕ ΠϯλϓϦλݴޠͰ͢ɻ ࣈ۟ղੳʢLexical

    Analysisʣ Rust ΍ Go ͳͲ͸ػցޠʹ຋༁͞Ε·͢ɻ
 PHP ΍ Ruby ͸Ұ౓தؒݴޠʢΦϖίʔυʣʹ
 ຋༁͞Ε͔ͯΒ Zend VM , RubyVM ্Ͱ
 ࣮ߦ͞Ε·͢ Rust ΍ Go ͳͲ͸ػցޠʹ຋༁͞Ε·͢ɻ
 PHP ΍ Ruby ͸Ұ౓தؒݴޠʢΦϖίʔυʣʹ
 ຋༁͞Ε͔ͯΒ Zend VM , RubyVM ্Ͱ
 ࣮ߦ͞Ε·͢
  5. ϓϩάϥϛϯάݴޠͷղऍ  AST ͱ͸Կ͔ ߏจղੳʢParseʣ ίϯύΠϧʢCompileʣ ࣮ߦ ※1 Java ͸

    JVM ༻ͷதؒݴޠʹίϯύΠϧ͞Ε·͢͠ɼC/C++, Rust ͳͲ͸ػցޠ΁ίϯύΠϧ͞Ε·͢ɻ ࣈ۟ղੳʢLexical Analysisʣ ίϯύΠϧݴޠ͸
 தؒݴޠʢ·ͨ͸࣮ߦϑΝΠϧʣ※1 ΁ͷ
 ίϯύΠϧͷϓϩηε͕ "ඞͣ" ͋Γ
  6. ϓϩάϥϛϯάݴޠͷղऍ  AST ͱ͸Կ͔ ߏจղੳʢParseʣ ίϯύΠϧʢCompileʣ ࣮ߦ ίϯύΠϧݴޠ͔Ͳ͏͔͸ίϯύΠϧޙͷ
 ϑΝΠϧΛ഑෍Ͱ͖ɼͦΕ͕७ਮʹ OS

    ্Ͱ࣮ߦ Ͱ͖Δ͔ΛҰͭ؍఺ͱஔ͘ͱྑ͍͔ͳͱࢥ͍·͢ C, C++, Java, Rust ͳͲ͕ίϯύΠϥݴޠͰ͢※1 ※1 Go ΋ίϯύΠϥݴޠͱݴΘΕ͍ͯ·͕͢ go run ͷΑ͏ͳҰ࿈ͷॲཧΛߦͬͯ͘ΕΔػೳ΋උΘ͓ͬͯΓ೥ʑ
 ΠϯλϓϦλݴޠ΋ίϯύΠϥݴޠ΋ڥք͕ͳ͘ͳ͖͍ͬͯͯΔΑ͏ʹࢥ͍·͢ɻ΄͔ʹ΋ TypeScript ͷΑ͏ͳ
 τϥϯεύΠϥΛඞཁͱ͢Δݴޠ͸ҰݟίϯύΠϥݴޠͷΑ͏ʹݟ͑·͕͢ɼ࣮ଶ͸ΠϯλϓϦλݴޠͷ JavaScript Ͱ ಈ࡞͍ͯ͠ΔͷͰ… ͜ͷล͸ͦΜͳ΋ΜͳΜͩͳʔ͘Β͍Ͱଊ͓͍͑ͯͯ໰୊ͳ͍Ͱ͠ΐ͏ ࣈ۟ղੳʢLexical Analysisʣ
  7. AST ͷ໾ׂ - ϓϩάϥϛϯάݴޠͷॲཧͷྲྀΕͷΠϝʔδ͕෇͍ͨͱ͜ΖͰɼAST ͷ໾ׂͷ ࿩ʹ໭Γ·͠ΐ͏ɻ - ઌ΄Ͳͷߏจղੳͷ͋ͱͷ۩৅ߏจΛΑΓந৅Խ͠ҙຯΛ༩͑Δʢσʔλߏ଄ ʹ͢ΔʣϓϩηεͷҰ͕ͭɼ͜ͷந৅ߏจ໦ʢASTʣͳͷͰ͢ɻͭ·Γ AST

    ͸ σʔλߏ଄ͷҰछͰ΋͋Γ·͢ɻ - ϓϩάϥϜ্Ͱ͸ॏཁͳҙຯΛ࣋ͭΧοί΍ηϛίϩϯͳͲؚ͕·Ε͍ͯΔ΋ͷ Λ۩৅ߏจͱ͍͍ɼͦΕΒΛऔΓআ͍ͨ΋ͷ͕ந৅ߏจ໦ͩͱߟ͑ΔͱΘ͔Γ ΍͍͢Ͱ͠ΐ͏ɻ  AST ͱ͸Կ͔
  8. AST ͷ໾ׂ - ϓϩάϥϛϯάݴޠΛॲཧ͢ΔࡍʹɼϊΠδʔͳ৘ใؚ͕·ΕΔͱѻ͍ͮΒ͍ Ͱ͢ɻྫ͑͹ (a + b) * c

    Λॲཧ͢Δͱ͖ɼ͍͍ͪͪߏจղੳͷࡍʹ (a + b) Λ ॲཧ͢Δͷ͸൥Θ͍͠Ͱ͢ɻ - ॳΊ͔Βɼ(a + b) ͸ * c ΑΓ΋ܭࢉͷ༏ઌ౓͕ߴ͍ͱ͍͏ҙຯʹ߹க͍ͯ͠Δ σʔλߏ଄ʹͳ͍ͬͯΕ͹ྑ͍͸ͣͰ͢ɻ͜ΕΛදݱ͢Δͷʹ͓͍ͯ͸πϦʔ ঢ়ʹѻ͏ͷ͕౎߹͕ྑ͍ͷͰ͢ɻ  AST ͱ͸Կ͔ (a + b) * c
  9. ʮχϯχΫ͍Ε·͔͢ʁʯ͔Β࣮ߦ·Ͱͷϑϩʔ  ೋ࿠ܥϥʔϝϯͷίʔϧΛ AST ʹ͢Δ ࣈ۟ղੳʢLexical Analysisʣ ߏจղੳʢParseʣ ίϯύΠϧʢCompileʣ ࣮ߦʢϥʔϝϯͷఏڙʣ

    ೋ࿠ܥϥʔϝϯ͸ɼࣈ۟ղੳ͔Β
 ߏจղੳɼ࣮ߦ·Ͱελοϑ͕୲͏ͷͰ
 ΠϯλϓϦλϥʔϝϯͰ͢ ίϯύΠϧ͸ಛʹ͠ͳ͍ɻ
 ͢Δͱͨ͠ΒϑϦʔζυϥΠʹͳΔ͔ʢʁʣ χϯχΫ͍Ε·͔͢ʁ ʮχϯχΫϚγϚγ໺ࡊϚγϚγΧϥϝʯ
  10.  T_GARLIC T_ADD T_VEG T_ADD T_ADD T_ADD T_STONG RAMEN T_ADD

    Χϥʮϝʯ΋ɼ௥ՃͳͷͰ
 T_ADD Ͱ໰୊ͳ͍ χϯχΫΛ T_GARLIC ʹ ໺ࡊΛ T_VEG (ETABLE) ʹ Χϥ(ϝ) Λ T_STRONG ʹ ϚγΛ T_ADD ʹ
  11.  T_GARLIC T_ADD T_VEG T_ADD T_ADD T_ADD T_STONG RAMEN T_ADD

    ϥʔϝϯ΁ͷτοϐϯά৘ใ ʮχϯχΫʯʹର͢Δݸ਺ͷࢦࣔ ʮ໺ࡊʯʹର͢Δݸ਺ͷࢦࣔ ຯͷೱ͞ʹର͢Δࢦࣔ
  12.  RAMEN Attr\Garlic Attr\Veg Attr\Strong ADD: 2 ADD: 2 ADD:

    1 T_ADD 2 ͭ͸ T_GARLIC ʹଐ͢Δ৘ใ΁ T_ADD 2 ͭ͸ T_VEG ʹଐ͢Δ৘ใ΁ T_ADD ͸ T_STRONG ʹଐ͢Δ৘ใ΁ Attr\Garlic ͸ RAMEN ʹ
 ଐ͢Δ෇Ճ৘ใ Attr\Strong ͸ RAMEN ʹ
 ଐ͢Δ৘ใ Attr\Strong ͸ RAMEN ʹ
 ଐ͢Δ৘ใ
  13.  RAMEN Attr\Garlic Attr\Veg Attr\Strong ADD: 2 ADD: 2 ADD:

    1 Attr\Garlic ͸ ADD ͱ͍͏
 ύϥϝʔλΛ࣋ͭ෇Ճ৘ใ Attr\Veg ͸ ADD ͱ͍͏
 ύϥϝʔλΛ࣋ͭ෇Ճ৘ใ Attr\Strong ͸ ADD ͱ͍͏
 ύϥϝʔλΛ࣋ͭ෇Ճ৘ใ
  14. ܾఆత༗ݶΦʔτϚτϯʢDFAʣͷҰྫ  ࣈ۟ղੳثɾߏจղੳثͷ࢓૊Έ S1 S2 S3 S4 ॳظঢ়ଶ ˕͸डཧঢ়ଶ ೋ

    ϯ ೋ Ϋ S5 ໺ S..N Χ ຊདྷ͸͜ΕΒ΋͋Γ·͢͠ɼ
 ΋ͬͱεςʔτͷ਺΋ଟ͍Ͱ͕͢
 ࢴ෯ͷ౎߹ͰղઆΛলུ͠·͢ Σ = {χ, ϯ, χ, Ϋ} ೖྗ ϥϝ ࡊ
  15. ܾఆత༗ݶΦʔτϚτϯʢDFAʣͷҰྫ  ࣈ۟ղੳثɾߏจղੳثͷ࢓૊Έ S1 S2 S3 S4 ॳظঢ়ଶ ೋ ϯ

    ೋ Ϋ Σ = {χ, ϯ, χ, Ϋ} ೖྗ डཧঢ়ଶͰऴྃͨ͠ͷͰʮडཧʯ͞ΕΔ ʢτʔΫϯͱͯ͠ѻΘΕΔʣ
  16. ܾఆత༗ݶΦʔτϚτϯʢDFAʣͷҰྫ  ࣈ۟ղੳثɾߏจղੳثͷ࢓૊Έ S1 S2 S3 S4 ॳظঢ়ଶ e c

    h o Σ = {e, c, h, o} ೖྗ डཧঢ়ଶͰऴྃͨ͠ͷͰʮडཧʯ͞ΕΔ ʢτʔΫϯͱͯ͠ѻΘΕΔʣ
  17. ࣈ۟ղੳثɾߏจղੳثͷ࢓૊Έ - χϯχΫ ΍ Ϛγ Ͱडཧঢ়ଶʹͳͬͨλΠϛϯάͰ T_GARLIC ΍ T_ADD ͷ

    Α͏ͳτʔΫϯʹ෼ղ͞Ε·͢ɻ - ͜Ε͸ܗଶૉղੳͰ΋ίʔύεʢૹΓԾ໊ͱ͔͋ΔࣙॻΈ͍ͨͳ΋ͷʣΛ༻͍ Δ఺Ҏ֎ʹ͓͍ͯ͸ɼ͓͓ΑͦࣅͨΑ͏ͳ࢓૊ΈͰ͢ɻ͕ɼ࿩͕๲ΕΔͷͰׂ Ѫ͠·͢ɻ - ͦͯ͠ɼτʔΫϯʹ෼ղͨ͋͠ͱʹɼઌ΄Ͳ৮Εͨ LL ๏΍ LR ๏ͳͲΛ༻͍ ͯந৅ߏจ໦ʢASTʣʹ͍͖ͯ͠·͢ɻ؆қతʹ࡞ΔͳΒ͹࣮૷͕ൺֱత༰қ ͳ LL ๏ͳͲ͕͓͢͢ΊͰ͢ɻ  ࣈ۟ղੳثɾߏจղੳثͷ࢓૊Έ χϯχΫ Ϛγ T_GARLIC T_ADD
  18. ࣈ۟ղੳثɾߏจղੳثͷ࢓૊Έ - ߏจղੳثͰ͸จͷ్தʹ͋Δ + ΍ - ͳͲ͔Βߏจ໦Λ࡞͍͖ͬͯ·͢ɻLL ๏ͩͱಛʹ͜͏͍ͬͨ΋ͷΛ BNF ه๏ͳͲΛ༻͍ͯهड़͠·͢ɻPHP

    ͷΑ͏ ͳݴޠΛߏจղੳ͢Δʹ͸ BNF ه๏ʹࣅͨܗͰఆٛͰ͖ΔɼYacc (Yet Another Compiler Compiler) ΍ Bison ͳͲͷύʔαδΣωϨʔλ͕༻͍ΒΕ ͍ͯ·͢ɻ - PHP ຊମʹ΋͋Γ·͢: https://github.com/php/php-src/blob/php-8.3.4/ Zend/zend_language_parser.y  ࣈ۟ղੳثɾߏจղੳثͷ࢓૊Έ
  19. ࣈ۟ղੳثɾߏจղੳثͷ࢓૊Έ - ΋ͪΖΜ࢖Θͳͯ͘΋ߏจղੳثΛؾ߹Ͱ࣮૷ͯ͠ந৅ߏจ໦ʹ͢Δ͜ͱ΋Մ ೳͰ͢ɻ - ڈ೥ PHP Ͱࡶʹ࡞ͬͨి୎Ͱɼ4 * sqrt((5!

    + 2 ** 8) - (4! + 2 ** 8)) + 5 ͷΑ͏ ͳෳࡶͳܭࢉࣜΛܭࢉͰ͖ΔΑ͏ͳࣈ۟ղੳͱߏจղੳثʢந৅ߏจ໦ʣʹม ׵࣮ͯ͠ߦ·ͰҰ࿈Ͱߦ͏αϯϓϧ͕͋ΔͷͰɼࢀߟʹͰ͖Δͱࢥ͍·͢ɻ - https://gist.github.com/m3m0r7/a5a5af3eb8d118fae78f6a9e18657ec4  ࣈ۟ղੳثɾߏจղੳثͷ࢓૊Έ ※1 PHP ຊମʹ΋͋Γ·͢ https://github.com/php/php-src/blob/php-8.3.4/Zend/zend_language_parser.y 4 * sqrt((5! + 2 ** 8) - (4! + 2 ** 8)) + 5
  20. PHP ʹஔ͖׵͑ͯΈΑ͏ - PHP Ͱந৅ߏจ໦ʢASTʣʹ͢Δ༗໊Ͳ͜Ζͱ͍͑͹ɼnikic/PHP-Parser Ͱ ͠ΐ͏͔ɻPHPStan ͳͲ͕༻͍͍ͯ·͢ʢ※1ʣɻ - ͦ΋ͦ΋ɼnikic/PHP-Parser

    ͸Ͳ͏͍͏࢓૊ΈͰಈ͍͍ͯΔͷͰ͠ΐ͏͔ɻ - ࣮͸ɼ͍͍ͩͨઌ΄Ͳͷϥʔϝϯೋ࿠ͷίʔϧΛߏจղੳ͢Δͷͱಉ͡࢓૊Έ Ͱ͢ɻͲ͏͍͏͜ͱͰ͠ΐ͏͔ɻ࣍ͷϖʔδͰղઆ͍͖ͯ͠·͢ɻ  PHP ʹஔ͖׵͑ͯΈΑ͏ ※1: https://github.com/phpstan/phpstan-src/blob/1.11.x/composer.json#L24
  21. 

  22.  Namespace Echo BinaryOp\Concat BinaryOp\Plus String BinaryOp\Plus Int Int Int

    …stmts …exprs BinaryOp\Concat exprs stmts value: \n value: 3 value: 2 value:1 left left left right right right ʢ஫ʣStmt_ ΍ Expr_ , Scalar_ ͸লུ
  23.  Namespace Echo BinaryOp\Concat BinaryOp\Plus String Int Int …stmts …exprs

    BinaryOp\Concat exprs stmts value: \n value: 3 value: 3 left left right right BinaryOp\Plus ʹΑͬͯՃࢉ
  24.  Namespace Echo BinaryOp\Concat Int String …stmts …exprs BinaryOp\Concat exprs

    stmts value: \n left right value: 6 BinaryOp\Plus ʹΑͬͯՃࢉ
  25.  Namespace Echo BinaryOp\Concat …stmts …exprs String exprs stmts right

    value: 6\n BinaryOp\Concat ʹΑͬͯ 6 ͱ \n ͕݁߹