Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Regular expressions basics/正規表現の基本

Regular expressions basics/正規表現の基本

Regular expressions basics/正規表現の基本

Kishikawa Katsumi

July 26, 2022
Tweet

More Decks by Kishikawa Katsumi

Other Decks in Programming

Transcript

  1. ϝλจࣈ Special Characters (Meta Characters) • ಛผͳҙຯΛ࣋ͭ12ͷจࣈʢϦςϥϧͱͯ͠ѻ͏ʹ͸Τεέʔϓ͕ඞཁɻʢྫʣ1\+1=2ʣ • όοΫεϥογϡ \

    • ΩϟϨοτ ^ • υϧϚʔΫ $ • υοτʢϐϦΦυʣ . • ύΠϓ | • ΫΤενϣϯϚʔΫ ? • ΞελϦεΫ * • ϓϥε + • ։ؙ͖Χοί ( • ดؙ͡Χοί ) • ։͖֯Χοί [ • ։͖೾Χοί {
  2. จࣈΫϥε Character Classes (Character Sets) • ෳ਺ͷจࣈͷத͔Β̍ͭͷจࣈʹϚον͢Δ • a·ͨ͸eʹϚονͤ͞Δʹ͸[ae]ͱॻ͘ •

    ʢྫʣgr[ae]y • gray·ͨ͸greyʹϚον • จࣈΫϥε͸1ͭͷจࣈʹϚον͢Δ • Χοίͷதͷจࣈͷॱং͸ؔ܎ͳ͍
  3. จࣈΫϥε Character Classes (Character Sets) • จࣈΫϥεͷதͰϋΠϑϯΛ࢖༻ͯ͠ൣғΛࢦఆͰ͖Δ • [0-9] •

    ͔̌Β̕ͷؒͷܻ̍ͷ਺ࣈʹϚον • [0-9a-fA-F] • େจࣈͱখจࣈΛ۠ผͤͣʹɺ16ਐ਺ͷ਺ࣈܻ̍ʹϚον • ൱ఆจࣈΫϥεʢNegated Character Classesʣ • [^0-9\r\n] • ਺ࣈ·ͨ͸վߦͰͳ͍೚ҙͷจࣈʹϚον
  4. จࣈΫϥεʢͷུه๏ʣ Shorthand Character Classes • จࣈΫϥεͷதͰΑ͘࢖ΘΕΔ΋ͷΛ؆୯ʹॻ͚ΔΑ͏ʹ͋Β͔͡Ί༻ҙ͞Εͨه๏ • \d͸[0-9]ͷུه๏ • UnicodeΛαϙʔτ͢Δ؀ڥͰ͸׽਺ࣈ΍ؙ਺ࣈͳͲ͢΂ͯͷ਺ࣈʹϚον

    • \w “word character” [A-Za-z0-9_]ͱಉ͡ʢΞϯμʔείΞؚ͕·ΕΔ͜ͱʹ஫ҙʣ • UnicodeΛαϙʔτ͢Δ؀ڥͰ͸͍Ζ͍ΖͳจࣈʹϚον • \s “whitespace character” ۭനจࣈʹϚον [ \t\r\n\f] • UnicodeΛαϙʔτ͢Δ؀ڥͰ͸UnicodeͷʮseparatorʯΧςΰϦͷ͢΂ͯͷจࣈʹϚον
  5. υοτʢϐϦΦυʣ The Dot Matches (Almost) Any Character • վߦจࣈΛআ̍͘จࣈʹϚον •

    “dot matches all”·ͨ͸“single line”Ϟʔυʢϓϩάϥϛϯάݴޠ΍ਖ਼نදݱΤ ϯδϯʹΑͬͯݺͼํ͸ҟͳΔʣΛࢦఆ͢ΔͱվߦจࣈΛؚΉ೚ҙͷ1จࣈʹ Ϛον • gr.y͸ɺgrayɺgrayɺgr%yͳͲʹϚον • υοτ͸ڧྗʹͳΜͰ΋Ϛον͢ΔͷͰ࢖͍͗͢ͳ͍ • จࣈΫϥε΍൱ఆจࣈΫϥεΛ୅ΘΓʹ࢖͏
  6. ΞϯΧʔ Anchors • จࣈͰ͸ͳ͘ҐஔʹϚον • ^ • จࣈྻͷઌ಄ʹϚον • $

    • จࣈྻͷ຤ඌʹϚον • ΄ͱΜͲͷਖ਼نදݱ͸“multi-line”Ϟʔυ͕͋Γɺ ^͸վߦͷޙΖɺ $͸վߦͷલʹϚον͢Δ • \b • ୯ޠڥքʹϚον • ୯ޠڥքͱ͸\wͰϚονͰ͖ΔจࣈͱɺͰ͖ͳ͍จࣈͷؒͷҐஔ
  7. બ୒ Alternation • ࿦ཧ࿨ʢORʣ • cat|dog • About cats and

    dogs • cat|dog|mouse| fi sh • ޷͖ͳ͚ͩͭͳ͛Δ͜ͱ͕Ͱ͖Δ • cat|dog food • cat·ͨ͸dog foodʹϚον • cat food͔dog foodʹϚονͤ͞Δʹ͸ɺ(cat|dog) foodͷΑ͏ʹબ୒ΛάϧʔϓԽ͢Δ
  8. ܁Γฦ͠ Repetition • ΫΤενϣϯϚʔΫʮ?ʯ • Optional • colou?r͸color·ͨ͸colourʹϚον • ΞελϦεΫʮ*ʯ

    • ̌ճҎ্ͷ܁Γฦ͠ • <[A-Za-z][A-Za-z0-9]*> • ଐੑ͕ͳ͍HTMLλάʹϚον • ϓϥεʮ+ʯ • ̍ճҎ্ͷ܁Γฦ͠ • ೾Χοίʮ{n,m}ʯ • ࢦఆճ਺ͷ܁Γฦ͠ • \b[1-9][0-9]{3}b • 1000͔Β9999ͷ਺ࣈʹϚον • \b[1-9][0-9]{2,4}\b • 100͔Β99999ͷ਺ࣈʹϚον
  9. άϧʔϓͱΩϟϓνϟ Grouping and Capturing • ΧοίͰғΉͱάϧʔϓԽ͞ΕΔ • άϧʔϓʹରͯ͠܁Γฦ͠ΛࢦఆͰ͖Δ • Set(Value)?

    • Set·ͨ͸SetValueʹϚον • ௨ৗͷؙΧοί͸ΩϟϓνϟάϧʔϓΛ࡞੒͢Δ • Set(Value)?ͷਖ਼نදݱͰSetValue͕Ϛονͨ͠৔߹͸ɺάϧʔϓ̍ʹΞΫηε͢ΔͱValue͕औΓग़ͤΔ • Ωϟϓνϟ͕ඞཁͳ͍৔߹͸Set(?:Value)?ͱ͢ΔͱΩϟϓνϟ͠ͳ͍άϧʔϓ͕࡞੒Ͱ͖Δ • ؙΧοίͷޙͷΫΤενϣϯϚʔΫͱɺ̌ճҎ্ͷ܁Γฦ͠ͷࢦఆͷΫΤενϣϯϚʔΫΛࠞಉ͠ͳ͍Α͏ʹ ஫ҙ
  10. ໊લ෇͖άϧʔϓʢΩϟϓνϟʣͱޙํࢀর Named Groups and Backreferences • Ωϟϓνϟ΁ͷࢀরΛ൪߸Ͱ؅ཧ͢Δͷ͸େมͩ͠ɺ௥Ճ࡟আͰͣΕΔͷͰ໊લΛ෇͚ΒΕΔ • ߏจʢ໊લ෇͖άϧʔϓʣ •

    (?P<name>group) • ߏจʢޙํࢀরʣ • (?P=name) • <(?P<tag>[A-Z][A-Z0-9]*)\b[^>]*>.*?</(?P=tag)> • HTMLλάʹϚονʢ <([A-Z][A-Z0-9]*)\b[^>]*>.*?</\1>ͱಉ͡ʣ • ߏจʢ໊લ෇͖Ωϟϓνϟʢ.NETʣʣ • (?<name>group)·ͨ͸(?’name'group) • ߏจʢ໊લʹΑΔࢀরʢ.NETʣʣ • \k<name>·ͨ͸\k'name'
  11. ઌಡΈͱޙಡΈ Lookaround (Lookahead/Lookback(Lookbehind)) • ಛघͳάϧʔϓͰɺΞϯΧʔͷΑ͏ʹϚονͨ݁͠ՌͷҐஔΛࢦఆ͢Δ • ʢྫʣ\d+(?=€) • ੔਺஋ͷޙʹʮ€ʯ͕ଓ͘จࣈྻʹϚον •

    1 turkey costs 30€ͷ30ʹϚον • ߏจʢߠఆઌಡΈʢPositive lookaheadʣʣ • X(?=Y) • ߏจʢ൱ఆઌಡΈʢNegative lookaheadʣʣ • X(?!Y) • ߏจʢߠఆޙಡΈʢPositive lookbehindʣʣ • (?<=Y)X • ߏจʢ൱ఆޙಡΈʢNegative lookbehindʣʣ • (?<!Y)X