Upgrade to Pro — share decks privately, control downloads, hide ads and more …

UP Lecture 04

UP Lecture 04

Compilers
Lexer Design
(202402)

Javier Gonzalez-Sanchez

December 07, 2023
Tweet

More Decks by Javier Gonzalez-Sanchez

Other Decks in Programming

Transcript

  1. Dr. Javier Gonzalez-Sanchez | Compilers | 3 jgs High-Level Languages

    X,E,G,O,O #e1,I,I,0,7 @ OPR 19, AX STO x, AX LIT 5, AX OPR 21, AX LOD #e1,AX CAL 1, AX OPR 0, AX 5 Virtual Machine (interpreter) // sorce code int x; int foo () { read (x); print (5); } main () { foo (); } Lexer Parser Semantic Analyzer Code Generation 01001010101000010 01010100101010010 10100100000011011 11010010110101111 00010010101010010 10101001010101011 Assembler compilation execution
  2. Dr. Javier Gonzalez-Sanchez | Compilers | 4 jgs Key Ideas

    Lexical Alphabet Symbol String (items) Word Token Rules Regular Expression Deterministic Finite Automata
  3. Dr. Javier Gonzalez-Sanchez | Compilers | 5 jgs Key Ideas

    Lexical Alphabet Symbol String (items) Word Token Rules Regular Expression Deterministic Finite Automata text visual
  4. Dr. Javier Gonzalez-Sanchez | Compilers | 6 jgs Programming a

    Lexer Regular Expresion O p e r a t o r D e l i m i t e r I n t e g e r F l o a t I D S t r i n g C h a r
  5. Dr. Javier Gonzalez-Sanchez | Compilers | 7 jgs Programming a

    Lexer 1. Read a File; Split the lines using the System.lineSeparator (enter) 2. For each line read character by character and use the character as an input for a Deterministic Finite Automata 3. Concatenate the character, creating the largest STRING possible. Stop when a delimiter, white space, operator, or quotation mark and the current state allowed. If there are more characters in the line, create a new line with those characters and go to step 2. 4. For each WORD report its TOKEN. Report ERROR as a token value for STRINGs, i.e., (wrong items)
  6. Dr. Javier Gonzalez-Sanchez | Compilers | 9 jgs Deterministic Finite

    Automata ▪ A DFA consists of a finite set of states (graphically represented as circles) and transition arrows that dictate how the automaton moves between states. A subset of states are designated as acceptance states (or final states) ▪ As it reads each symbol from an input, the DFA deterministically transitions to a new state based on the current state and the symbol, following predefined transition rules.
  7. Dr. Javier Gonzalez-Sanchez | Compilers | 10 jgs Integer Values

    | ^[0-9]+$ 1-9 0 … Delimiter, operator, whitespace, quotation mark S0 S1 SE SE Stop S1 S1 S1 SE Stop SE SE SE SE Stop
  8. Dr. Javier Gonzalez-Sanchez | Compilers | 11 jgs Hexadecimal Values

    | ^0[xX][0-9A-Fa-f]+$ x,X 0 1-9 A-F a-f … Delimiter, operator, whitespace, quotation mark S0 SE S1 SE SE Stop S1 S2 SE SE SE Stop S2 SE S3 S3 SE Stop S3 SE S3 S3 SE Stop SE SE SE SE SE Stop
  9. Dr. Javier Gonzalez-Sanchez | Compilers | 12 jgs Binary Values

    | ^0[bB][01]+$ B,b 0 1 . . . Delimiter, operator, whitespace, quotation mark S0 SE S1 SE SE Stop S1 S2 SE SE SE Stop S2 SE S3 S3 SE Stop S3 SE S3 S3 SE Stop SE SE SE SE SE Stop
  10. Dr. Javier Gonzalez-Sanchez | Compilers | 13 jgs Integer, Hexadecimal,

    and Binary Values B,b X,x 0 1 2-9 A-F a-f ... Delimiter, operator, whitespace, quotation mark S0 SE SE S1 IS1 IS1 SE SE Stop S1 BS2 HS2 SE SE SE SE SE Stop BS2 SE SE BS3 BS3 SE SE SE Stop HS2 SE SE HS3 HS3 HS3 HS3 SE Stop BS3 SE SE BS3 BS3 SE SE SE Stop HS3 SE SE HS3 HS3 HS3 HS3 SE Stop IS1 SE SE IS1 IS1 IS1 SE SE Stop SE SE SE SE SE SE SE SE Stop
  11. Dr. Javier Gonzalez-Sanchez | Compilers | 14 jgs And it

    can continue Operator Start Delimiter Integer Float ID String Char {+,-,*,/,%, <,>,=,!,…} {(, ), {, }, [, ]} {0-9} {$, _, 0-9, a-z} { ‘ } { ‘ } {0-9} {0-9} {0-9} {\.} {\.} { “ } { “ } {.} {.} {a-z} {_} {$}
  12. Dr. Javier Gonzalez-Sanchez | Compilers | 15 jgs Question Which

    tokens (lexical rules) are needed for a programming language?
  13. Dr. Javier Gonzalez-Sanchez | Compilers | 16 jgs Drafting a

    Lexer ▪ Identifiers = ▪ Keywords = ▪ Operators = ▪ Delimiters = ▪ Float = ▪ Integer = ▪ Hexadecimal = ▪ Octal = ▪ Binary = ▪ String = ▪ Char =
  14. jgs Compilers Javier Gonzalez-Sanchez, Ph.D. [email protected] Spring 2025 Copyright. These

    slides can only be used as study material for the Compilers course at Universidad Panamericana. They cannot be distributed or used for another purpose.