Upgrade to Pro — share decks privately, control downloads, hide ads and more …

UP Lecture 04

Sponsored · SiteGround - Reliable hosting with speed, security, and support you can count on.

UP Lecture 04

Compilers
Lexer Design
(202402)

Avatar for Javier Gonzalez-Sanchez

Javier Gonzalez-Sanchez PRO

December 07, 2023
Tweet

More Decks by Javier Gonzalez-Sanchez

Other Decks in Programming

Transcript

  1. Dr. Javier Gonzalez-Sanchez | Compilers | 2 jgs Key Ideas

    Lexical Alphabet Symbol String (items) Word Token Rules Regular Expression Deterministic Finite Automata text visual
  2. Dr. Javier Gonzalez-Sanchez | Compilers | 3 jgs Programming a

    Lexer Regular Expresion O p e r a t o r D e l i m i t e r I n t e g e r F l o a t I D S t r i n g C h a r
  3. Dr. Javier Gonzalez-Sanchez | Compilers | 4 jgs Programming a

    Lexer 1. Read a File; Split the lines using the System.lineSeparator (enter) 2. For each line read character by character and use the character as an input for a Deterministic Finite Automata 3. Concatenate the character, creating the largest STRING possible. Stop when a delimiter, white space, operator, apostrophe, or quotation mark and the current state allowed. Store the item and its TOKEN or ERROR. 4. If there are more characters in the line, go to step 2. 5. For each WORD report its TOKEN. Report ERROR as a token value for wrong items
  4. Dr. Javier Gonzalez-Sanchez | Compilers | 6 jgs Deterministic Finite

    Automata ▪ A DFA consists of a finite set of states (graphically represented as circles) and transition arrows that dictate how the automaton moves between states. A subset of states are designated as acceptance states (or final states) ▪ As it reads each symbol from an input, the DFA deterministically transitions to a new state based on the current state and the symbol, following predefined transition rules.
  5. Dr. Javier Gonzalez-Sanchez | Compilers | 7 jgs Integer Values

    | ^[0-9]+$ 1-9 0 … Delimiter, operator, whitespace, quotation mark S0 S1 SE SE Stop S1 S1 S1 SE Stop SE SE SE SE Stop
  6. Dr. Javier Gonzalez-Sanchez | Compilers | 8 jgs Hexadecimal Values

    | ^0[xX][0-9A-Fa-f]+$ x,X 0 1-9 A-F a-f … Delimiter, operator, whitespace, quotation mark S0 SE S1 SE SE Stop S1 S2 SE SE SE Stop S2 SE S3 S3 SE Stop S3 SE S3 S3 SE Stop SE SE SE SE SE Stop
  7. Dr. Javier Gonzalez-Sanchez | Compilers | 9 jgs Binary Values

    | ^0[bB][01]+$ B,b 0 1 . . . Delimiter, operator, whitespace, quotation mark S0 SE S1 SE SE Stop S1 S2 SE SE SE Stop S2 SE S3 S3 SE Stop S3 SE S3 S3 SE Stop SE SE SE SE SE Stop
  8. Dr. Javier Gonzalez-Sanchez | Compilers | 10 jgs Integer, Hexadecimal,

    and Binary Values B,b X,x 0 1 2-9 A-F a-f ... Delimiter, operator, whitespace, quotation mark S0 SE SE S1 IS1 IS1 SE SE Stop S1 BS2 HS2 SE SE SE SE SE Stop BS2 SE SE BS3 BS3 SE SE SE Stop HS2 SE SE HS3 HS3 HS3 HS3 SE Stop BS3 SE SE BS3 BS3 SE SE SE Stop HS3 SE SE HS3 HS3 HS3 HS3 SE Stop IS1 SE SE IS1 IS1 IS1 SE SE Stop SE SE SE SE SE SE SE SE Stop
  9. Dr. Javier Gonzalez-Sanchez | Compilers | 12 jgs Question Which

    tokens (lexical rules) are needed for a programming language?
  10. Dr. Javier Gonzalez-Sanchez | Compilers | 13 jgs Drafting a

    Lexer ▪ Identifiers = ▪ Keywords = ▪ Integer = ▪ Hexadecimal = ▪ Octal = ▪ Binary = ▪ Float = ▪ Char = ▪ String = ▪ Boolean = ▪ Operators = ▪ Delimiters =
  11. Dr. Javier Gonzalez-Sanchez | Compilers | 14 jgs Identifiers ▪

    User-defined names for variables, methods ▪ Pattern: (letter | _ | $) (letter | digit | _ | $)* Examples: ▪ count, studentName, MAX_SIZE, computeAverage
  12. Dr. Javier Gonzalez-Sanchez | Compilers | 15 jgs Keywords (Reserved

    Words) ▪ final ▪ if, else, switch, case, default ▪ for, while, do, break, continue ▪ return, ▪ void, int, double, boolean, char ▪ true, false, null ▪ constante, ▪ si, sino, seleccionar, caso, predeterminado, ▪ para, mientras, hacer, romper, continuar, ▪ retornar, ▪ vacío, entero, doble, lógico, carácter, ▪ verdadero, falso, nulo
  13. Dr. Javier Gonzalez-Sanchez | Compilers | 16 jgs Operator +

    Delimiter ▪ Arithmetic: + - * / % ▪ Relational: == != < > <= >= ▪ Logical: && || ! ▪ Assignment: = += -= *= /= %= ▪ Increment / Decrement: ++ -- ▪ Conditional: ? : ▪ ( ) Parentheses ▪ { } Braces ▪ [ ] Brackets ▪ ; Statement terminator ▪ , Separator
  14. Dr. Javier Gonzalez-Sanchez | Compilers | 17 jgs Literals ▪

    Integer literals: 42, 0, 123, 0xFF, 0b1010, 05, 5, 50 ▪ Floating-point literals: 3.14, 2.0, 1e-9 ▪ Character literals: 'a', '\n', '\u0041’ ▪ String literals: "Hello", "CS 307", "Line\nBreak" ▪ Boolean literals: true, false ▪ Null literal: null
  15. Dr. Javier Gonzalez-Sanchez | Compilers | 18 jgs Homework Work

    with your team and complete a DFA for ALL tokens in our language
  16. Dr. Javier Gonzalez-Sanchez | Compilers | 21 jgs Lexer B,b

    0 1 .. . Delimiter, operator, whitespace, quotation mark S0 SE S1 SE SE Stop S1 S2 SE SE SE Stop S2 SE S3 S3 SE Stop S3 SE S3 S3 SE Stop SE SE SE SE SE Stop
  17. Dr. Javier Gonzalez-Sanchez | Compilers | 22 jgs Lexer Review

    this code: https://github.com/javiergs/TheLexer
  18. jgs Compilers Javier Gonzalez-Sanchez, Ph.D. [email protected] Spring 2026 Copyright. These

    slides can only be used as study material for the Compilers course at Universidad Panamericana. They cannot be distributed or used for another purpose.