Upgrade to Pro — share decks privately, control downloads, hide ads and more …

All About Regular Expressions

Jade Allen
September 23, 2022

All About Regular Expressions

Delivered at Strange Loop 2022 in St. Louis, MO. This talk discusses the theory, origin and history the concept of regular expressions, as well as their unusual syntax and different approaches to searching text for matching patterns (depth-first or breadth-first)

Jade Allen

September 23, 2022
Tweet

More Decks by Jade Allen

Other Decks in Technology

Transcript

  1. OVERVIEW • ORIGIN, THEORY AND BASIC ALGEBRA • DETERMINISTIC AND

    NONDETERMINISTIC FINITE AUTOMATA • WAIT. WHAT? WHY ARE WE TALKING ABOUT EDITORS? • FERAL REGULAR EXPRESSIONS
  2. DETERMINISTIC FINITE AUTOMATA •EACH STATE TRANSITION IS UNIQUELY DETERMINED BY

    THE SOURCE STATE AND INPUT SYMBOL •READING AN INPUT SYMBOL IS REQUIRED FOR EACH STATE TRANSITION •ADVANTAGE: EASY TO IMPLEMENT AS A PROGRAM •DRAWBACK: STATE EXPLOSION
  3. WILL IT MATCH? •INPUT: AD – MATCH? ✅ •INPUT: ABBBBBBBB

    – MATCH? •INPUT: ADCD – MATCH? 🤔
  4. WILL IT MATCH? •INPUT: AD – MATCH? •INPUT: ABBBBBBBB –

    MATCH? ✅ •INPUT: ADCD – MATCH? 🤔
  5. WILL IT MATCH? •INPUT: AD – MATCH? •INPUT: ABBBBBBBB –

    MATCH? •INPUT: ADCD – MATCH? ❌ 🤔
  6. I HAVE NO ^ AND I MUST MATCH BACK IN

    THE DAY, NO ONE HAD A “STANDARD KEYBOARD” (*NOR A STANDARD DISPLAY TERMINAL, BUT THAT IS A FUTURE TALK) YOU GET WHAT YOU GOT, AND YOU WERE GRATEFUL EURO-FLAVORED KEYBOARDS MAY OR MAY NOT HAVE “STANDARD” US ASCII CHARACTERS
  7. INTO THE WILD… HENRY SPENCER WROTE AND PUBLISHED AN EGREP

    COMPATIBLE “NEARLY PUBLIC DOMAIN” REGULAR EXPRESSION LIBRARY IN 1986
  8. RE2

  9. SUMMARY Regular expressions originated from mathematical concepts in matching sequences

    from sets of symbols from alphabets Implementations came from editors and compilers Unix provided facilities to use regex from tools Awk provided a way to use them to trigger actions in a dynamic script Perl added a bunch of features people liked …until they didn’t
  10. BIBLIOGRAPHY • [1] MCCULLOCH, W.S. AND PITTS, W. A LOGICAL

    CALCULUS OF THE IDEAS IMMANENT IN NERVOUS ACTIVITY, BULLETIN OF MATHEMATICAL BIOLOGY, 1943 • [2] KLEENE, S.C., REPRESENTATION OF EVENTS IN NERVE NETS AND FINITE AUTOMATA, RAND PROJECT MEMO, RM-704, 1951. • [3] RABIN, M. O. AND SCOTT, D. FINITE AUTOMATA AND THEIR DECISION PROBLEMS, IBM JOURNAL OF RESEARCH AND DEVELOPMENT, 1959. • [4] THOMPSON, K., REGULAR EXPRESSION SEARCH ALGORITHM, CACM, JUNE 1968.
  11. BIBLIOGRAPHY • [5] DEUTSCH, P., AND LAMPSON, B., AN ONLINE

    EDITOR, CACM, DECEMBER 1967. • [6] RITCHIE, D., AND THOMPSON, K., QED TEXT EDITOR, BELL LABS TECHNICAL MEMORANDUM, 1970. • [7] AHO, A., AND CORASICK, M. EFFICIENT STRING MATCHING: AN AID TO BIBLIOGRAPHIC SEARCH, CACM, JUNE 1975 • [8] SPENCER, H. USENIX POST, JANUARY 1986, REGEXP(3) (GOOGLE.COM).
  12. BIBLIOGRAPHY • [9] FRIEDL, J, MASTERING REGULAR EXPRESSIONS, O’REILLY, 1997.

    • [10], HAZEL, P., FROM PUNCHED CARDS TO FLAT SCREENS, CIHK.PDF - GOOGLE DRIVE ,2017. • [11] COX, R. REGULAR EXPRESSION MATCHING CAN BE SIMPLE AND FAST (SWTCH.COM), 2007. • [12], COX, R. REGULAR EXPRESSION MATCHING IN THE WILD (SWTCH.COM), 2010.