$30 off During Our Annual Pro Sale. View Details »

Regular expressions basics/正規表現の基本

Regular expressions basics/正規表現の基本

Regular expressions basics/正規表現の基本

Kishikawa Katsumi

July 26, 2022
Tweet

More Decks by Kishikawa Katsumi

Other Decks in Programming

Transcript

  1. ਖ਼نදݱͷجຊ
    Regular expressions basics

    View Slide

  2. Swift Regex

    View Slide

  3. https://swiftregex.com/

    View Slide

  4. ਖ਼نදݱͱ͸
    • จࣈྻͷू߹ʢύλʔϯʣΛද͢൚༻తͳه๏

    • [bc]ook͸book·ͨ͸cookʹϚον͢Δ

    View Slide

  5. Ϧςϥϧ
    Literal Characters
    • a

    • Jack is a boy,

    • cat

    • About cats and dogs

    View Slide

  6. ϝλจࣈ
    Special Characters (Meta Characters)
    • ಛผͳҙຯΛ࣋ͭ12ͷจࣈʢϦςϥϧͱͯ͠ѻ͏ʹ͸Τεέʔϓ͕ඞཁɻʢྫʣ1\+1=2ʣ

    • όοΫεϥογϡ \

    • ΩϟϨοτ ^

    • υϧϚʔΫ $

    • υοτʢϐϦΦυʣ .

    • ύΠϓ |

    • ΫΤενϣϯϚʔΫ ?

    • ΞελϦεΫ *

    • ϓϥε +

    • ։ؙ͖Χοί (

    • ดؙ͡Χοί )

    • ։͖֯Χοί [

    • ։͖೾Χοί {

    View Slide

  7. ੍ޚจࣈ
    Non-Printable Characters (Control Characters, Escape sequence)
    • \t

    • λϒʹϚον͢Δ

    • \n

    • վߦʹϚον͢Δ

    View Slide

  8. จࣈΫϥε
    Character Classes (Character Sets)
    • ෳ਺ͷจࣈͷத͔Β̍ͭͷจࣈʹϚον͢Δ

    • a·ͨ͸eʹϚονͤ͞Δʹ͸[ae]ͱॻ͘

    • ʢྫʣgr[ae]y

    • gray·ͨ͸greyʹϚον

    • จࣈΫϥε͸1ͭͷจࣈʹϚον͢Δ

    • Χοίͷதͷจࣈͷॱং͸ؔ܎ͳ͍

    View Slide

  9. จࣈΫϥε
    Character Classes (Character Sets)
    • จࣈΫϥεͷதͰϋΠϑϯΛ࢖༻ͯ͠ൣғΛࢦఆͰ͖Δ

    • [0-9]

    • ͔̌Β̕ͷؒͷܻ̍ͷ਺ࣈʹϚον

    • [0-9a-fA-F]

    • େจࣈͱখจࣈΛ۠ผͤͣʹɺ16ਐ਺ͷ਺ࣈܻ̍ʹϚον

    • ൱ఆจࣈΫϥεʢNegated Character Classesʣ

    • [^0-9\r\n]

    • ਺ࣈ·ͨ͸վߦͰͳ͍೚ҙͷจࣈʹϚον

    View Slide

  10. จࣈΫϥεʢͷུه๏ʣ
    Shorthand Character Classes
    • จࣈΫϥεͷதͰΑ͘࢖ΘΕΔ΋ͷΛ؆୯ʹॻ͚ΔΑ͏ʹ͋Β͔͡Ί༻ҙ͞Εͨه๏

    • \d͸[0-9]ͷུه๏

    • UnicodeΛαϙʔτ͢Δ؀ڥͰ͸׽਺ࣈ΍ؙ਺ࣈͳͲ͢΂ͯͷ਺ࣈʹϚον
    • \w “word character” [A-Za-z0-9_]ͱಉ͡ʢΞϯμʔείΞؚ͕·ΕΔ͜ͱʹ஫ҙʣ

    • UnicodeΛαϙʔτ͢Δ؀ڥͰ͸͍Ζ͍ΖͳจࣈʹϚον
    • \s “whitespace character” ۭനจࣈʹϚον [ \t\r\n\f]

    • UnicodeΛαϙʔτ͢Δ؀ڥͰ͸UnicodeͷʮseparatorʯΧςΰϦͷ͢΂ͯͷจࣈʹϚον

    View Slide

  11. υοτʢϐϦΦυʣ
    The Dot Matches (Almost) Any Character
    • վߦจࣈΛআ̍͘จࣈʹϚον

    • “dot matches all”·ͨ͸“single line”Ϟʔυʢϓϩάϥϛϯάݴޠ΍ਖ਼نදݱΤ
    ϯδϯʹΑͬͯݺͼํ͸ҟͳΔʣΛࢦఆ͢ΔͱվߦจࣈΛؚΉ೚ҙͷ1จࣈʹ
    Ϛον

    • gr.y͸ɺgrayɺgrayɺgr%yͳͲʹϚον

    • υοτ͸ڧྗʹͳΜͰ΋Ϛον͢ΔͷͰ࢖͍͗͢ͳ͍

    • จࣈΫϥε΍൱ఆจࣈΫϥεΛ୅ΘΓʹ࢖͏

    View Slide

  12. ΞϯΧʔ
    Anchors
    • จࣈͰ͸ͳ͘ҐஔʹϚον

    • ^

    • จࣈྻͷઌ಄ʹϚον

    • $

    • จࣈྻͷ຤ඌʹϚον

    • ΄ͱΜͲͷਖ਼نදݱ͸“multi-line”Ϟʔυ͕͋Γɺ ^͸վߦͷޙΖɺ $͸վߦͷલʹϚον͢Δ

    • \b

    • ୯ޠڥքʹϚον

    • ୯ޠڥքͱ͸\wͰϚονͰ͖ΔจࣈͱɺͰ͖ͳ͍จࣈͷؒͷҐஔ

    View Slide

  13. બ୒
    Alternation
    • ࿦ཧ࿨ʢORʣ

    • cat|dog

    • About cats and dogs

    • cat|dog|mouse|
    fi
    sh

    • ޷͖ͳ͚ͩͭͳ͛Δ͜ͱ͕Ͱ͖Δ

    • cat|dog food

    • cat·ͨ͸dog foodʹϚον

    • cat food͔dog foodʹϚονͤ͞Δʹ͸ɺ(cat|dog) foodͷΑ͏ʹબ୒ΛάϧʔϓԽ͢Δ

    View Slide

  14. ܁Γฦ͠
    Repetition
    • ΫΤενϣϯϚʔΫʮ?ʯ

    • Optional

    • colou?r͸color·ͨ͸colourʹϚον

    • ΞελϦεΫʮ*ʯ

    • ̌ճҎ্ͷ܁Γฦ͠

    • <[A-Za-z][A-Za-z0-9]*>

    • ଐੑ͕ͳ͍HTMLλάʹϚον

    • ϓϥεʮ+ʯ

    • ̍ճҎ্ͷ܁Γฦ͠

    • ೾Χοίʮ{n,m}ʯ

    • ࢦఆճ਺ͷ܁Γฦ͠

    • \b[1-9][0-9]{3}b

    • 1000͔Β9999ͷ਺ࣈʹϚον

    • \b[1-9][0-9]{2,4}\b

    • 100͔Β99999ͷ਺ࣈʹϚον

    View Slide

  15. άϧʔϓͱΩϟϓνϟ
    Grouping and Capturing
    • ΧοίͰғΉͱάϧʔϓԽ͞ΕΔ

    • άϧʔϓʹରͯ͠܁Γฦ͠ΛࢦఆͰ͖Δ

    • Set(Value)?

    • Set·ͨ͸SetValueʹϚον

    • ௨ৗͷؙΧοί͸ΩϟϓνϟάϧʔϓΛ࡞੒͢Δ

    • Set(Value)?ͷਖ਼نදݱͰSetValue͕Ϛονͨ͠৔߹͸ɺάϧʔϓ̍ʹΞΫηε͢ΔͱValue͕औΓग़ͤΔ

    • Ωϟϓνϟ͕ඞཁͳ͍৔߹͸Set(?:Value)?ͱ͢ΔͱΩϟϓνϟ͠ͳ͍άϧʔϓ͕࡞੒Ͱ͖Δ

    • ؙΧοίͷޙͷΫΤενϣϯϚʔΫͱɺ̌ճҎ্ͷ܁Γฦ͠ͷࢦఆͷΫΤενϣϯϚʔΫΛࠞಉ͠ͳ͍Α͏ʹ
    ஫ҙ

    View Slide

  16. ޙํࢀর
    Backreferences
    • ΩϟϓνϟάϧʔϓͰΩϟϓνϟʢϚονʣͨ͠಺༰ʹϚον

    • ΩϟϓνϟάϧʔϓʹϚονͨ݁͠ՌΛ࠶ར༻Ͱ͖Δ

    • <([A-Z][A-Z0-9]*)\b[^>]*>.*?\1>

    • HTMLλάʹϚονʢΩϟϓνϟάϧʔϓʹϚονͨ͠։࢝λάΛऴྃλάͰ
    ࠶ར༻͍ͯ͠Δʣ

    View Slide

  17. ໊લ෇͖άϧʔϓʢΩϟϓνϟʣͱޙํࢀর
    Named Groups and Backreferences
    • Ωϟϓνϟ΁ͷࢀরΛ൪߸Ͱ؅ཧ͢Δͷ͸େมͩ͠ɺ௥Ճ࡟আͰͣΕΔͷͰ໊લΛ෇͚ΒΕΔ

    • ߏจʢ໊લ෇͖άϧʔϓʣ

    • (?Pgroup)

    • ߏจʢޙํࢀরʣ

    • (?P=name)

    • <(?P[A-Z][A-Z0-9]*)\b[^>]*>.*?(?P=tag)>

    • HTMLλάʹϚονʢ <([A-Z][A-Z0-9]*)\b[^>]*>.*?\1>ͱಉ͡ʣ

    • ߏจʢ໊લ෇͖Ωϟϓνϟʢ.NETʣʣ

    • (?group)·ͨ͸(?’name'group)

    • ߏจʢ໊લʹΑΔࢀরʢ.NETʣʣ

    • \k·ͨ͸\k'name'

    View Slide

  18. ઌಡΈͱޙಡΈ
    Lookaround (Lookahead/Lookback(Lookbehind))
    • ಛघͳάϧʔϓͰɺΞϯΧʔͷΑ͏ʹϚονͨ݁͠ՌͷҐஔΛࢦఆ͢Δ

    • ʢྫʣ\d+(?=€)

    • ੔਺஋ͷޙʹʮ€ʯ͕ଓ͘จࣈྻʹϚον

    • 1 turkey costs 30€ͷ30ʹϚον

    • ߏจʢߠఆઌಡΈʢPositive lookaheadʣʣ

    • X(?=Y)

    • ߏจʢ൱ఆઌಡΈʢNegative lookaheadʣʣ

    • X(?!Y)

    • ߏจʢߠఆޙಡΈʢPositive lookbehindʣʣ

    • (?<=Y)X

    • ߏจʢ൱ఆޙಡΈʢNegative lookbehindʣʣ

    • (?

    View Slide

  19. References
    • Regular-Expressions.info

    https://www.regular-expressions.info/

    • Swift Regex

    https://swiftregex.com/

    View Slide