$30 off During Our Annual Pro Sale. View Details »

All About Regular Expressions

Jade Allen
September 23, 2022

All About Regular Expressions

Delivered at Strange Loop 2022 in St. Louis, MO. This talk discusses the theory, origin and history the concept of regular expressions, as well as their unusual syntax and different approaches to searching text for matching patterns (depth-first or breadth-first)

Jade Allen

September 23, 2022
Tweet

More Decks by Jade Allen

Other Decks in Technology

Transcript

  1. ALL ABOUT REGULAR EXPRESSIONS JADE ALLEN @AVOCADOSUPERFAN

  2. OVERVIEW • ORIGIN, THEORY AND BASIC ALGEBRA • DETERMINISTIC AND

    NONDETERMINISTIC FINITE AUTOMATA • WAIT. WHAT? WHY ARE WE TALKING ABOUT EDITORS? • FERAL REGULAR EXPRESSIONS
  3. MCCULLOCH-PITTS NEURON MODEL

  4. STEPHEN KLEENE

  5. ALGEBRA OF REGULAR EVENTS

  6. None
  7. DETERMINISTIC FINITE AUTOMATA •EACH STATE TRANSITION IS UNIQUELY DETERMINED BY

    THE SOURCE STATE AND INPUT SYMBOL •READING AN INPUT SYMBOL IS REQUIRED FOR EACH STATE TRANSITION •ADVANTAGE: EASY TO IMPLEMENT AS A PROGRAM •DRAWBACK: STATE EXPLOSION
  8. REGULAR EXPRESSION SEARCH ALGORITHM

  9. Σ = { A, B, C, D } A(B|C)*D

  10. WILL IT MATCH? •INPUT: AD – MATCH? •INPUT: ABBBBBBBB –

    MATCH? •INPUT: ADCD – MATCH? 🤔
  11. WILL IT MATCH? •INPUT: AD – MATCH? ✅ •INPUT: ABBBBBBBB

    – MATCH? •INPUT: ADCD – MATCH? 🤔
  12. WILL IT MATCH? •INPUT: AD – MATCH? •INPUT: ABBBBBBBB –

    MATCH? •INPUT: ADCD – MATCH? 🤔
  13. WILL IT MATCH? •INPUT: AD – MATCH? •INPUT: ABBBBBBBB –

    MATCH? ✅ •INPUT: ADCD – MATCH? 🤔
  14. WILL IT MATCH? •INPUT: AD – MATCH? •INPUT: ABBBBBBBB –

    MATCH? •INPUT: ADCD – MATCH? 🤔
  15. WILL IT MATCH? •INPUT: AD – MATCH? •INPUT: ABBBBBBBB –

    MATCH? •INPUT: ADCD – MATCH? ❌ 🤔
  16. HMM…

  17. INPUT / OUTPUT FORMATS BEFORE TIME SHARING

  18. AN ONLINE EDITOR

  19. None
  20. FROM WHERE DOES THE SYNTAX COME?

  21. None
  22. EGREP (1978) • TURING AWARD WINNER 2020 • AWK (LEX,

    YACC)
  23. I HAVE NO ^ AND I MUST MATCH BACK IN

    THE DAY, NO ONE HAD A “STANDARD KEYBOARD” (*NOR A STANDARD DISPLAY TERMINAL, BUT THAT IS A FUTURE TALK) YOU GET WHAT YOU GOT, AND YOU WERE GRATEFUL EURO-FLAVORED KEYBOARDS MAY OR MAY NOT HAVE “STANDARD” US ASCII CHARACTERS
  24. INTO THE WILD… HENRY SPENCER WROTE AND PUBLISHED AN EGREP

    COMPATIBLE “NEARLY PUBLIC DOMAIN” REGULAR EXPRESSION LIBRARY IN 1986
  25. PERL (1987)

  26. MAXIMALIST REGEX “TOO MUCH IS NEVER ENOUGH!”

  27. PCRE

  28. REGEX CONSIDERED HARMFUL

  29. RE2

  30. SUMMARY Regular expressions originated from mathematical concepts in matching sequences

    from sets of symbols from alphabets Implementations came from editors and compilers Unix provided facilities to use regex from tools Awk provided a way to use them to trigger actions in a dynamic script Perl added a bunch of features people liked …until they didn’t
  31. THANK YOU!

  32. BIBLIOGRAPHY • [1] MCCULLOCH, W.S. AND PITTS, W. A LOGICAL

    CALCULUS OF THE IDEAS IMMANENT IN NERVOUS ACTIVITY, BULLETIN OF MATHEMATICAL BIOLOGY, 1943 • [2] KLEENE, S.C., REPRESENTATION OF EVENTS IN NERVE NETS AND FINITE AUTOMATA, RAND PROJECT MEMO, RM-704, 1951. • [3] RABIN, M. O. AND SCOTT, D. FINITE AUTOMATA AND THEIR DECISION PROBLEMS, IBM JOURNAL OF RESEARCH AND DEVELOPMENT, 1959. • [4] THOMPSON, K., REGULAR EXPRESSION SEARCH ALGORITHM, CACM, JUNE 1968.
  33. BIBLIOGRAPHY • [5] DEUTSCH, P., AND LAMPSON, B., AN ONLINE

    EDITOR, CACM, DECEMBER 1967. • [6] RITCHIE, D., AND THOMPSON, K., QED TEXT EDITOR, BELL LABS TECHNICAL MEMORANDUM, 1970. • [7] AHO, A., AND CORASICK, M. EFFICIENT STRING MATCHING: AN AID TO BIBLIOGRAPHIC SEARCH, CACM, JUNE 1975 • [8] SPENCER, H. USENIX POST, JANUARY 1986, REGEXP(3) (GOOGLE.COM).
  34. BIBLIOGRAPHY • [9] FRIEDL, J, MASTERING REGULAR EXPRESSIONS, O’REILLY, 1997.

    • [10], HAZEL, P., FROM PUNCHED CARDS TO FLAT SCREENS, CIHK.PDF - GOOGLE DRIVE ,2017. • [11] COX, R. REGULAR EXPRESSION MATCHING CAN BE SIMPLE AND FAST (SWTCH.COM), 2007. • [12], COX, R. REGULAR EXPRESSION MATCHING IN THE WILD (SWTCH.COM), 2010.