Slide 3
Slide 3 text
Regex-based Tokenizers
● Can be simple to implement, understand and reasonably fast
● Regexes match basic constructs like numbers, keywords, operators
● Often relies on common patterns in programming language grammars
○ Keywords, paired delimiters, strings, comments
○ Example: Emacs only supports delimiters up to two characters without fancy features
● Can add features until it highlights everything you want
○ Multi-line strings/comments: special paired delimiter functionality
○ String escapes: ability to set rules to run while between delimiters
○ Heredocs: feature to put capture group from start regex into end regex
● Give standardized names to language constructs for themes:
○ Vim examples: Comment, String, Keyword, Type, Function