$30 off During Our Annual Pro Sale. View Details »

Regex in Depth

Regex in Depth

Explains the history of regex in depth as well as covering all features in a practise-oriented way!

Abdur-Rahmaan Janhangeer

November 20, 2022
Tweet

More Decks by Abdur-Rahmaan Janhangeer

Other Decks in Technology

Transcript

  1. Regex In Depth

    View Slide

  2. 2

    View Slide

  3. Python Mauritius UserGroup
    (pymug)
    More info: mscc.mu/python-mauritius-usergroup-pymug/
    Why Where
    codes github.com/pymug
    share events twitter.com/pymugdotcom
    ping professionals linkedin.com/company/pymug
    all info pymug.com
    tell friends by like facebook.com/pymug
    3

    View Slide

  4. Abdur-Rahmaan Janhangeer
    Python
    Help people get into OpenSource
    4

    View Slide

  5. Regex In Depth
    5

    View Slide

  6. History of regex
    (from notes)
    6

    View Slide

  7. import re

    pattern = re.compile(r'eee')
    result = pattern.match('eeee')

    print(result.group())

    eee
    7

    View Slide

  8. import re

    pattern = re.compile(r'ike')
    result = pattern.search('like sike')

    print(result.group())

    ike
    8

    View Slide

  9. import re

    pattern = re.compile(r'ike')
    result = pattern.findall('like sike')

    print(result)

    ['ike', 'ike']
    9

    View Slide

  10. import re

    pattern = r'ike'

    string = 'like sike'

    result = re.findall(pattern, string)

    print(result)

    10

    View Slide

  11. pattern = re.compile(

    r"...",

    re.MULTILINE)

    11

    View Slide

  12. Patterns
    12

    View Slide

  13. pattern = r"letter"

    string = "i wrote some letters from letters"

    ['letter', 'letter']
    13

    View Slide

  14. pattern = r"l.tt.r"

    string = '''i wrote some letters

    from letters from the latter'''

    ['letter', 'letter', 'latter']
    14

    View Slide

  15. \d digit
    pattern = r"\d\d\d.\d\d\d\d\d\d\d"

    string = '''Personal: +230 5764321,

    Office: +230 6712345'''

    ['230 5764321', '230 6712345']
    15

    View Slide

  16. \s - space

    \D - non digit
    try: r"\+\d\d\d\s\d\d\d\d\d\d\d"

    r".\d\d\d\s\d\d\d\d\d\d\d"
    r".\d\d\d\s\d\d\d\d\d\d\d"
    pattern = r"\d\d\d\D\d\d\d\D\d\d\d\d"

    string = '''Personal: +230 576-4321,

    Office: +230 671 2345'''

    r'\d'
    16

    View Slide

  17. pattern = r"\d{3}\D\d{3}\D\d{4}"

    string = '''Personal: +230 576-4321,

    Office: +230 671 2345''

    17

    View Slide

  18. \w, \W
    pattern = r"\w"

    string = '''abc123_!^%£&£%$'''

    ['a', 'b', 'c', '1', '2', '3', '_']
    18

    View Slide

  19. ^ start, $ end
    pattern = r"^\d$"

    string = '''234234234234234234'''

    19

    View Slide

  20. [abcd] range of chars
    pattern = r"l[aeiou]ce"

    string = '''lice lace leece lyce lvce'''

    ['lice', 'lace']
    try: [a-z] [A-Z] [0-9] [a-zA-z]
    20

    View Slide

  21. [^exclude]
    pattern = r"l[^aeiou]ce"

    string = '''lice lace leece lyce lvce'''

    ['lyce', 'lvce']
    21

    View Slide

  22. Repeat x{1, 3}
    pattern = r"lo{2,5} and behold"

    string = '''loo and behold lo and behold

    looooo and behold looo and behold

    loooooo and behold'''

    ['loo and behold', 'looooo and behold', 'looo and behold']
    22

    View Slide

  23. 0 or more
    pattern = r"lo* and behold"

    string = '''

    loo and behold

    lo and behold

    looooo and behold

    looo and behold

    loooooo and behold

    l and behold'''

    ['loo and behold', 'lo and behold', 'looooo and behold', 'looo
    and behold', 'loooooo and behold', 'l and behold']
    23

    View Slide

  24. at least once
    pattern = r"lo+ and behold"

    string = '''

    loo and behold

    lo and behold

    looooo and behold

    looo and behold

    loooooo and behold

    l and behold'''

    ['loo and behold', 'lo and behold', 'looooo and behold', 'looo
    and behold', 'loooooo and behold']
    24

    View Slide

  25. 0 or 1
    pattern = r"lo? and behold"

    string = '''

    loo and behold

    lo and behold

    looooo and behold

    looo and behold

    loooooo and behold

    l and behold'''

    ['lo and behold', 'l and behold']
    25

    View Slide

  26. Capturing
    pattern = re.compile(

    r"i bought (cat)?\s?(dog)?\s?(mouse)?")

    string = '''

    i bought cat

    i bought dog

    i bought mouse
    i bought cat dog

    i bought cat mouse

    '''

    [('cat', '', ''), ('', 'dog', ''), ('', '', 'mouse'), ('cat', 'dog', ''), ('cat', '',
    'mouse')]
    26

    View Slide

  27. (?:)
    pattern = re.compile(

    r"i bought (?:cat)?\s?(?:dog)?\s?(?:mouse)?")

    string = '''

    i bought cat

    i bought dog

    i bought mouse
    i bought cat dog

    i bought cat mouse

    '''

    ['i bought cat \n', 'i bought dog\n', 'i bought mouse', 'i bought
    cat dog\n', 'i bought cat mouse']
    27

    View Slide

  28. Or
    pattern = re.compile(

    r"(?:Mr|Mrs){1}\.\s\w*")

    string = '''

    Mrs. sam Dr. sam Mr. Sam Miss. Sam

    '''

    28

    View Slide

  29. Capturing and backref
    import re

    pattern = re.compile(

    r"(a)l\1")
    string = '''

    ala alo ali

    '''

    result = re.search(pattern, string)

    print(result.group())

    29

    View Slide

  30. look ahead if there is

    not return match
    pattern = re.compile(

    r"c(?=[aeiou])+")

    string = '''

    coo ca ci cw

    '''

    ['c', 'c', 'c']
    ?! not followed by

    ?<= look behind

    ?30

    View Slide

  31. Misc
    \b
    31

    View Slide