Upgrade to Pro — share decks privately, control downloads, hide ads and more …

UP Lecture 04

UP Lecture 04

Compilers
Lexer
(202402)

Javier Gonzalez-Sanchez

December 07, 2023
Tweet

More Decks by Javier Gonzalez-Sanchez

Other Decks in Programming

Transcript

  1. Dr. Javier Gonzalez-Sanchez | Compilers | 4 jgs High-Level Languages

    X,E,G,O,O #e1,I,I,0,7 @ OPR 19, AX STO x, AX LIT 5, AX OPR 21, AX LOD #e1,AX CAL 1, AX OPR 0, AX 5 Virtual Machine (interpreter) // sorce code int x; int foo () { read (x); print (5); } main () { foo (); } Lexer Parser Semantic Analyzer Code Generation 01001010101000010 01010100101010010 10100100000011011 11010010110101111 00010010101010010 10101001010101011 Assembler compilation execution
  2. Dr. Javier Gonzalez-Sanchez | Compilers | 5 jgs Keywords Lexical

    Alphabet Symbol String Word Token Rules Regular Expression Deterministic Finite Automata
  3. Dr. Javier Gonzalez-Sanchez | Compilers | 6 jgs DFA |

    Examples Operator Start Delimiter Integer Float ID String Char {+,-,*,/,%, <,>,=,!,…} {(, ), {, }, [, ]} {0-9} {$, _, 0-9, a-z} { ‘ } { ‘ } {0-9} {0-9} {0-9} {\.} {\.} { “ } { “ } {.} {.} {a-z} {_} {$}
  4. Dr. Javier Gonzalez-Sanchez | Compilers | 13 jgs Programming a

    Lexer Regular Expresion O p e r a t o r D e l i m i t e r I n t e g e r F l o a t I D S t r i n g C h a r
  5. Dr. Javier Gonzalez-Sanchez | Compilers | 14 jgs Programming a

    Lexer Regular Expresion O p e r a t o r D e l i m i t e r I n t e g e r F l o a t I D S t r i n g C h a r
  6. Dr. Javier Gonzalez-Sanchez | Compilers | 15 jgs Using IF-ELSE

    It is not a good idea! February 13th, 2008 by Rich Sharpe. Posted in Software Quality, Software Quality Metrics
  7. Dr. Javier Gonzalez-Sanchez | Compilers | 16 jgs Using a

    State Machine 1. Put the DFA in a Table S0 S1 S2 S3 b 0 1 ... Delimiter, operator, whitespace, quotation mark S0 SE S1 SE SE Stop S1 S2 SE SE SE Stop S2 SE S3 S3 SE Stop S3 SE S3 S3 SE Stop SE SE SE SE SE Stop
  8. Dr. Javier Gonzalez-Sanchez | Compilers | 17 jgs Using a

    State Machine 2. Put the Table in Java b 0 1 ... Delimiter, operator, whitespace, quotation mark S0 SE S1 SE SE Stop S1 S2 SE SE SE Stop S2 SE S3 S3 SE Stop S3 SE S3 S3 SE Stop SE SE SE SE SE Stop // constants private static final int ZERO = 1; private static final int ONE = 2; private static final int B = 0; private static final int OTHER = 3; private static final int DELIMITER = 4; private static final int ERROR = 4; private static final int STOP = -2; // table as a 2D array private static int[][] stateTable = { {ERROR, 1, ERROR, ERROR, STOP}, { 2, ERROR, ERROR, ERROR, STOP}, {ERROR, 3, 3, ERROR, STOP}, {ERROR, 3, 3, ERROR, STOP}, {ERROR, ERROR, ERROR, ERROR, STOP} };
  9. Dr. Javier Gonzalez-Sanchez | Compilers | 18 jgs Using a

    State Machine STEP 3. Algorithm void splitLine (line) { state = S0 ; String string =""; do { l = line.readNextLetter(); go = calculateNextState(state, l); if( go != STOP ) { string = string + l; state = go; } } while (line.hasLetters() && go != STOP); if (state == S3 ) print (“It is a BINARY number”); else print (“error”); if( isDelimiter(currentChar)) print (“Also, there is a DELIMITER”); else if (isOperator(currentChar) ) print (“Also, there is an OPERATOR”); // loop if (line.hasLetters() )) splitLine( line – string ); } b 0 1 ... S0 SE S1 SE SE Stop S1 S2 SE SE SE Stop S2 SE S3 S3 SE Stop S3 SE S3 S3 SE Stop SE SE SE SE SE Stop
  10. Dr. Javier Gonzalez-Sanchez | Compilers | 19 jgs Programming Assignment

    #1 1. Read a File; Split the lines using the System.lineSeparator 2. For each line read character by character and use the character as an input for the state machine 3. Concatenate the character, creating the largest STRING possible. Stop when a delimiter, white space, operator, or quotation mark and the current state allowed. If there are more characters in the line, create a new line with those characters and go to step 2. 4. For each STRING and WORD report its TOKEN or ERROR as correspond.
  11. Dr. Javier Gonzalez-Sanchez | Compilers | 23 jgs Lexer –

    Step by Step 1 = INTEGER 2 = INTEGER 3 = IDENTIFIER 5 = OCTAL 7 = INTEGER 8 = IDENTIFIER 9 = BINARY 10 = HEXADECIMAL columns [a-z] = [A] B [C-F] [G-W] X [Y-Z] [a-f] = [A] B [C-F] 0 $ _ [1] [2-7] [8-9] [A] B [C-F] [G-W] X [Y-Z] states
  12. Dr. Javier Gonzalez-Sanchez | Compilers | 24 jgs Lexer –

    Step by Step 0 $ _ [1] [2-7] [8-9] A B [C-F] [G-W] X [Y-Z] ... Delimiter, operator, whitespace, quotation mark S0 S1 S3 S3 S2 S2 S2 S3 S3 S3 S3 S3 S3 SE Stop S1 S5 SE SE S5 S5 SE SE S4 SE SE S6 SE SE Stop S2 S7 SE SE S7 S7 S7 SE SE SE SE SE SE SE Stop S3 S8 S8 S8 S8 S8 S8 S8 S8 S8 S8 S8 S8 SE Stop S4 S9 SE SE S9 SE SE SE SE SE SE SE SE SE Stop S5 S5 SE SE S5 S5 SE SE SE SE SE SE SE SE Stop S6 S10 SE SE S10 S10 S10 S10 S10 S10 SE SE SE SE Stop S7 S7 SE SE S7 S7 S7 SE SE SE SE SE SE SE Stop S8 S8 S8 S8 S8 S8 S8 S8 S8 S8 S8 S8 S8 SE Stop S9 S9 SE SE S9 SE SE SE SE SE SE SE SE SE Stop S10 S10 SE SE S10 S10 S10 S10 S10 S10 SE SE SE SE Stop SE SE SE SE SE SE SE SE SE SE SE SE SE SE Stop
  13. Dr. Javier Gonzalez-Sanchez | Compilers | 26 jgs Programming Assignment

    #1 • Only BINARY, DELIMITER, and OPERATOR are implemented. You will implement the rest of the required tokens (rules).
  14. Dr. Javier Gonzalez-Sanchez | Compilers | 27 jgs Programming Assignment

    #1 * Lexer.java is the only file that you are allowed to modify
  15. Dr. Javier Gonzalez-Sanchez | Compilers | 36 jgs Code |

    input.txt hello;world cse340 asu 2013/05/31 // end boolean $xx= ((((((((23WE + 44 - 3 / 2 % 45 <=17) > 0xfffff.34.45;
  16. Dr. Javier Gonzalez-Sanchez | Compilers | 37 jgs Code |

    output.txt IDENTIFIER hello DELIMITER ; IDENTIFIER world IDENTIFIER cse340 IDENTIFIER asu INTEGER 2013 OPERATOR / OCTAL 05 OPERATOR / INTEGER 31 OPERATOR / OPERATOR / IDENTIFIER end KEYWORD boolean IDENTIFIER $xx OPERATOR = DELIMITER ( DELIMITER ( DELIMITER ( DELIMITER ( DELIMITER ( DELIMITER ( DELIMITER ( DELIMITER ( ERROR 23WE OPERATOR + INTEGER 44 OPERATOR - INTEGER 3 OPERATOR / INTEGER 2 OPERATOR % INTEGER 45 OPERATOR < OPERATOR = INTEGER 17 DELIMITER ) OPERATOR > ERROR 0xfffff.34.45 DELIMITER ;
  17. Dr. Javier Gonzalez-Sanchez | Compilers | 39 jgs Lexical Analysis

    int x = 5; float y = "hello; String@z="9.5”;intx=cse340;if(x> 14) while (5 == 5) if (int a) a = 1; x = x; for ( ; ; );y = 13.45.0;int me =99999000001111222000000111111222 223443483045830948;while { x != 9} ();int {x} = 10; "hello "world" bye" # of STRINGS ? ? ? ? ? ? ? ? ?
  18. Dr. Javier Gonzalez-Sanchez | Compilers | 40 jgs Lexical Analysis

    int x = 5; float y = "hello; String@z="9.5”;intx=cse340;if(x> 14) while (5 == 5) if (int a) a = 1; x = x; for ( ; ; );y = 13.45.0;int me =99999000001111222000000111111222 223443483045830948;while { x != 9} ();int {x} = 10; ”hello "world" bye" # of STRINGS 9 12 3 18 12 2 6 12 3
  19. Dr. Javier Gonzalez-Sanchez | Compilers | 41 jgs Lexical Analysis

    int x = 5; float y = "hello; String @z = "9.5"; int x = cse340; if ( x > 14) while (5 == 5) if (int a) a = 1; x = x; for ( ; ; ); y = 13.45.0; int me = 99999000001111222000000111111222223443483045830948; while { x != 9} (); int {x} = 10;
  20. Dr. Javier Gonzalez-Sanchez | Compilers | 43 jgs Homework Programming

    Assignment #1 Develop a Lexical Analyzer by coding a DFA
  21. jgs Compilers Javier Gonzalez-Sanchez, Ph.D. [email protected] Spring 2024 Copyright. These

    slides can only be used as study material for the Compilers course at Universidad Panamericana. They cannot be distributed or used for another purpose.