Upgrade to Pro — share decks privately, control downloads, hide ads and more …

UP Lecture 05

UP Lecture 05

Compilers
Lexer Implementation
(202402)

Javier Gonzalez-Sanchez

December 08, 2023
Tweet

More Decks by Javier Gonzalez-Sanchez

Other Decks in Programming

Transcript

  1. Dr. Javier Gonzalez-Sanchez | Compilers | 3 jgs Key Ideas

    Lexical Alphabet Symbol String Word Token Rules Regular Expression Deterministic Finite Automata
  2. Dr. Javier Gonzalez-Sanchez | Compilers | 6 jgs Programming a

    Lexer 1. Read a File; Split the lines using the System.lineSeparator (enter) 2. For each line read character by character and use the character as an input for a Deterministic Finite Automata 3. Concatenate the character, creating the largest STRING possible. Stop when a delimiter, white space, operator, or quotation mark and the current state allowed. If there are more characters in the line, create a new line with those characters and go to step 2. 4. For each WORD report its TOKEN. Report ERROR as a token value for STRINGs, i.e., (wrong items)
  3. Dr. Javier Gonzalez-Sanchez | Compilers | 7 jgs Start With

    a DFA B,b 0 1 .. . Delimiter, operator, whitespace, quotation mark S0 SE S1 SE SE Stop S1 S2 SE SE SE Stop S2 SE S3 S3 SE Stop S3 SE S3 S3 SE Stop SE SE SE SE SE Stop
  4. Compilers | Dr. Javier Gonzalez-Sanchez | 8 jgs s0/0 =

    s1 s1/B = s2 s1/b = s2 s2/0 = s3 s2/1 = s3 s3/0 = s3 s3/1 = s3
  5. Dr. Javier Gonzalez-Sanchez | Compilers | 9 jgs Class Lexer

    B,b 0 1 .. . Delimiter, operator, whitespace, quotation mark S0 SE S1 SE SE Stop S1 S2 SE SE SE Stop S2 SE S3 S3 SE Stop S3 SE S3 S3 SE Stop SE SE SE SE SE Stop
  6. Dr. Javier Gonzalez-Sanchez | Compilers | 10 jgs Algorithm 1.

    Initialize currentState to “s0” 2. Create an empty string to store words, 3. Index to track position in the input line. 4. Loop through each character in the input line. ▪ if (the character is not an operator, delimiter, or space) { - get the next state from the DFA, - append the character to the current token, - and update currentState. ▪ } otherwise { - If currentState is an accepting state, store the token with its state name; Otherwise, store it as an error. - If the currentCharacter is an operator, store it as an “OPERATOR” token; - if the currentCharacter is a delimiter, store it as a “DELIMITER” token. - Reset currentState to “s0” and clear the string storage. } 5. After processing all characters, check the last string/word and store it accordingly.
  7. Dr. Javier Gonzalez-Sanchez | Compilers | 16 jgs Homework Review

    this code: https://github.com/javiergs/TheLexer
  8. jgs Compilers Javier Gonzalez-Sanchez, Ph.D. [email protected] Spring 2024 Copyright. These

    slides can only be used as study material for the Compilers course at Universidad Panamericana. They cannot be distributed or used for another purpose.