Upgrade to Pro — share decks privately, control downloads, hide ads and more …

From seed to Twig: Soft introduction to the Com...

From seed to Twig: Soft introduction to the Compiling Theory

Compiler Theory Fundamentals

What is Compilation?
Compilation is the process of translating code from one language to another language. This translation usually moves from a higher-level (human-friendly) language to a lower-level (machine-friendly) language.

Why Study Compilation for Twig?
Understanding compiler theory is critical for developers working with Twig, providing benefits in several key areas:
• Performance Optimization: Better understanding of caching mechanisms, execution flow, and bottleneck identification.
• Debugging and Troubleshooting: Facilitates error tracing, advanced debugging, and generated code inspection.
• Security Considerations: Aids in understanding escape contexts, implementing the sandbox, and preventing injection vulnerabilities.
• Template Design and Architecture: Enables the creation of better abstractions, custom extensions, and template optimization.
• System Integration: Optimizes framework integration, build processes, and custom loaders.

The Twig Compilation Pipeline

The pipeline converts TWIG TEMPLATES into COMPILED PHP through several stages, using TOKENS and the AST (Abstract Syntax Tree, which the speaker refers to as "Ice-Tea") as intermediate steps.

Stage 1: LEXER (Lexical Analysis / Tokenization)
This stage involves the Tokenizer, which, in Twig, is the same component as the Lexer.
• Aim: To break down the input stream of characters into tokens, which are the smallest meaningful units in the language syntax.
• Mechanism: The lexer transforms raw source code (a string of characters) into meaningful tokens. It reads the input character by character and groups them into lexical units.
• Core Functions: Tokenization, Classification, Whitespace & Comment Handling, and Error Detection.
• Output: The TokenStream.

Stage 2: PARSER (Syntax Analysis / Parsing)
• Aim: To transform the token stream into a grammar (the AST). The Parser provides meaning to the tokens by organizing them according to Twig's grammatical rules.
• Mechanism: Verifies that the tokens follow the grammatical rules and organizes them into a hierarchical structure, the AST (Abstract Syntax Tree).
• Workflow:
◦ The MAIN PARSER reads tokens sequentially.
◦ It delegates parsing tasks to specialized TokenParsers based on tag names (e.g., IfTokenParser, ForTokenParser, BlockTokenParser).
◦ It uses the ExpressionParser for variables, operators, functions, and filters.
◦ Output: A hierarchy of Node objects (the AST).

Stage 3: SEMANTIC ANALYSIS
While not typically discussed as a separate component, this analysis phase is handled primarily by the Parser in Twig's architecture.
• Key Checks:
◦ Variable scope validation: Checking that variables are used within their proper scope.
◦ Function and filter validation: Verifying existence and correct parameter usage.
◦ Type checking: Ensuring operations are performed on compatible types.
◦ Security checks: Enforcing sandbox restrictions and checking for potentially unsafe operations.
◦ Namespace resolution: Resolving imports and references to external templates.

Stage 4: OPTIMIZATION
The optimization stage is a refining process that occurs after the structure (AST) is built but before it is converted to executable code.
• Mechanism: Twig's optimizer works through a series of "node visitors" that traverse the AST using the Visitor design pattern.
• Typical Optimizations:
◦ Constant Expression Folding
◦ Expression Simplification
◦ Node Merging
◦ Dead Code Elimination

Stage 5: COMPILER (Code Generation)
• Aim: To convert the AST, which is abstract and formal, into concrete PHP code.
• Input/Root Node: The input is the ModuleNode, a special node class that acts as the root of the AST. The ModuleNode maintains all structural information (inheritance, blocks, macros) and serves as the bridge to the final PHP execution environment.
• Mechanism: Every Node of the AST is "compiling" into PHP code, starting from the ModuleNode.
• Core Functions: Generating PHP code from nodes, managing the compilation context, and handling indentation and variable scope.
• Output: The final PHP code is dumped into a custom __Template_[Hash].php class.

--------------------------------------------------------------------------------
Resources
The presentation references the following resources for further study:
https://twig.symfony.com/doc/3.x/internals.html
https://github.com/twigphp/Twig

Avatar for Perussel Nicolas

Perussel Nicolas

April 15, 2025
Tweet

More Decks by Perussel Nicolas

Other Decks in Programming

Transcript

  1. Drupal Dev Days April 2025 From seed to Twig Soft

    introduction to the Compiling Theory CTO PHP Nicolas Perussel [email protected] @mamoot
  2. More than 20 years XP with PHP Polyglot : Ruby,

    Typescript, Python, C Nicolas PERUSSEL aka @mamoot 2 My specialities: • Technical Architecture • Facilitator • Lead technical teams Mario fan / Play Piano / Warhammer painting Gaston Lagaffe ultra fan ! PHP CTO at
  3. What is Compilation? 4 Nicolas Perussel • From Seed to

    Twig | Drupal Dev Days 2025 • Usually from a higher-level language to a lower-level language • Allows us to write in human-friendly languages while computers execute machine-friendly code Compilation is the process of translating code from one language to another language. Example 1 Example 2 Example 3
  4. Why Study Compilation for Twig? COMPILER THEORY FUNDAMENTALS Nicolas Perussel

    • From Seed to Twig | Drupal Dev Days 2025 UNDERSTANDING PERFORMANCE OPTIMIZATION Caching mechanisms Execution flow Bottleneck identification DEBUGGING AND TROUBLESHOOTING Error tracing Advanced debugging Generated code inspection SECURITY CONSIDERATIONS Understanding escape contexts Preventing injection vulnerabilities Sandbox implementation TEMPLATE DESIGN AND ARCHITECTURE Better abstractions Custom extensions Template optimization SYSTEM INTEGRATION Framework integration / Build process optimization / Custom loaders 5
  5. From this minimal example… 7 Nicolas Perussel • From Seed

    to Twig | Drupal Dev Days 2025 The Twig Template The minimal bootstrap
  6. We will study, the Compilation Pipeline! 8 Nicolas Perussel •

    From Seed to Twig | Drupal Dev Days 2025 TOKENIZER PARSER SEMANTIC ANALYZER OPTIMIZER CODE GENERATION TWIG TEMPLATES TOKENS AST COMPILED PHP LEXER COMPONENTS STAGES WITH TWIG, IT’S THE SAME COMPONENT
  7. 9 ekino • TECHNICAL SQUAD • [RESTREINT] This is Ice-Tea,

    I like it BUT when you are hearing this word from my mouth, Compile it to « AST »
  8. STAGE 1: Lexical Analysis (Tokenization) 11 Nicolas Perussel • From

    Seed to Twig | Drupal Dev Days 2025 The Tokenization mechanism is simple. Only 3 steps! 1. Create the Twig Environment 2. Load Twig content file 3. Generate tokens and organise them into a TokenStream
  9. STAGE 1: Lexical Analysis (Tokenization) 12 Nicolas Perussel • From

    Seed to Twig | Drupal Dev Days 2025 To generate the TokenStream, we need to have a LEXER Why? The aim is to break down the input stream of characters into tokens, which are the smallest meaningful units in the language syntax.
  10. STAGE 1: The LEXER 13 Nicolas Perussel • From Seed

    to Twig | Drupal Dev Days 2025 The lexer's primary responsibility is to transform raw source code (just a string of characters) into meaningful tokens. It's essentially the component that reads the input character by character and groups these characters into lexical units (tokens) that have meaning in the language. Core functions • Tokenization • Classification • Whitespace & Comment Handling • Error Detection
  11. STAGE 2: Syntax Analysis (Parsing) 15 Nicolas Perussel • From

    Seed to Twig | Drupal Dev Days 2025 To generate the AST, we need to have a PARSER Why? The aim is to transform the token stream into a grammar. The Parser gives meaning to these tokens by organizing them according to Twig's grammatical rules.
  12. STAGE 2: Syntax Analysis (Parsing) 16 Nicolas Perussel • From

    Seed to Twig | Drupal Dev Days 2025 • Takes the sequence of tokens produced by the Lexer • Verifies that these tokens follow the grammatical rules of the Twig language • Organizes these tokens into a hierarchical structure (the AST - Abstract Syntax Tree) • Gives meaning to the relationships between the different elements of the code From Tokens To AST
  13. STAGE 2: PARSER workflow 17 Nicolas Perussel • From Seed

    to Twig | Drupal Dev Days 2025 MAIN PARSER LEXER TOKEN STREAM Token Stream Iterator Expression Parser • Variables • Operators • Functions • Filters • … TokenParser Registry • IfTokenParser • ForTokenParser • SetTokenParser • BlockTokenParser • … AST (NODES) • Parser reads tokens sequentially • Delegates to specialized TokenParsers based on tag names • Uses ExpressionParser for parsing expressions • Builds a hierarchy of Node objects
  14. STAGE 2: PARSING example 18 Nicolas Perussel • From Seed

    to Twig | Drupal Dev Days 2025 Parsing Process (simplified): 1. Main parser sees BLOCK_START followed by if tag 2. Delegates to IfTokenParser 3. IfTokenParser uses ExpressionParser to parse user.isAdmin 4. Creates an IfNode with: 1. Condition: user.isAdmin expression 2. Body: Nodes for the "if" branch content 3. Else: Nodes for the "else" branch content 4. Returns the constructed IfNode to the main parser 1 2 $twig->tokenize($srcTpl) 3 $twig->parse($tokenStream)
  15. 19 Nicolas Perussel • From Seed to Twig | Drupal

    Dev Days 2025 SEMANTIC ANALYSIS STAGE 3
  16. STAGE 3: Semantic Analysis 20 Nicolas Perussel • From Seed

    to Twig | Drupal Dev Days 2025 The semantic analyzer isn't typically discussed as a separate component in Twig's architecture, but the semantic analysis phase is indeed handled primarily by the Parser in the Twig template engine. Type checking Ensuring that operations are performed on compatible types. Variable scope validation Checking that variables are used within their proper scope. Function and filter validation Verifying that functions and filters exist and are used with correct parameters. Security checks Enforcing sandbox restrictions and checking for potentially unsafe operations. Custom tag semantics Ensuring that custom tags are used correctly according to their definitions. Context awareness Understanding the context of certain expressions (like inside a loop vs. outside). Namespace resolution Resolving imports and references to external templates.
  17. 21 Nicolas Perussel • From Seed to Twig | Drupal

    Dev Days 2025 OPTIMIZER STAGE 4
  18. STAGE 4: Optimization 22 Nicolas Perussel • From Seed to

    Twig | Drupal Dev Days 2025 The optimization stage is like a refining process that happens after the raw structure is built but before it's converted to executable code. The NodeVisitor Pattern Twig's optimizer works through a series of "node visitors" that traverse the AST using the Visitor design pattern.
  19. STAGE 4: Optimization 23 Nicolas Perussel • From Seed to

    Twig | Drupal Dev Days 2025 Ex : The OptimizerNodeVisitor Constant Expression Folding Expression Simplification Node Merging Dead Code Elimination
  20. STAGE 5: Compiler 25 Nicolas Perussel • From Seed to

    Twig | Drupal Dev Days 2025 To compile the AST into PHP, we need to have a COMPILER Why? Because we need to convert something whici is abstract and formal to a concrete thing.
  21. STAGE 5: Compiler 26 Nicolas Perussel • From Seed to

    Twig | Drupal Dev Days 2025 Core functions • Managing the compilation context • Generating PHP code from nodes • Handling indentation and code formatting • Managing imports and variable scope • Writing the final PHP code
  22. STAGE 5: The ModuleNode 27 Nicolas Perussel • From Seed

    to Twig | Drupal Dev Days 2025 The Root of the Twig AST The ModuleNode is a special node class that acts as the root node of the Abstract Syntax Tree (AST) for any Twig template. In the Context of Twig's Architecture 1. It's the output of the Parser phase, representing the fully parsed template. 2. It's the input to the Compiler phase, which transforms it into PHP code. 3. It maintains all the structural information about a template (inheritance, blocks, macros). The ModuleNode essentially serves as the bridge between Twig's syntax and the final PHP execution environment.
  23. STAGE 5: Node compiling 28 Nicolas Perussel • From Seed

    to Twig | Drupal Dev Days 2025 Each Nodes of the AST is « compiling » into PHP code ! From ModuleNode (root node) to the last one. The PHP Code is then, dumped into custom __Template_[Hash].php class.
  24. Fred R. Barnard 30 A picture is worth a thousand

    words Nicolas Perussel • From Seed to Twig | Drupal Dev Days 2025
  25. Add a new Token language 31 Nicolas Perussel • From

    Seed to Twig | Drupal Dev Days 2025
  26. Let’s write the custom Node 33 Nicolas Perussel • From

    Seed to Twig | Drupal Dev Days 2025 We are writing PHP line by line FROM TO
  27. Two ressources 36 Nicolas Perussel • From Seed to Twig

    | Drupal Dev Days 2025 https://twig.symfony.com/doc/3.x/internals.html https://github.com/twigphp/Twig