Regular expressions basics/正規表現の基本
ਖ਼نදݱͷجຊRegular expressions basics
View Slide
Swift Regex
https://swiftregex.com/
ਖ਼نදݱͱ• จࣈྻͷू߹ʢύλʔϯʣΛද͢൚༻తͳه๏• [bc]ookbook·ͨcookʹϚον͢Δ
ϦςϥϧLiteral Characters• a• Jack is a boy,• cat• About cats and dogs
ϝλจࣈSpecial Characters (Meta Characters)• ಛผͳҙຯΛ࣋ͭ12ͷจࣈʢϦςϥϧͱͯ͠ѻ͏ʹΤεέʔϓ͕ඞཁɻʢྫʣ1\+1=2ʣ• όοΫεϥογϡ \• ΩϟϨοτ ^• υϧϚʔΫ $• υοτʢϐϦΦυʣ .• ύΠϓ |• ΫΤενϣϯϚʔΫ ?• ΞελϦεΫ *• ϓϥε +• ։ؙ͖Χοί (• ดؙ͡Χοί )• ։͖֯Χοί [• ։͖Χοί {
੍ޚจࣈNon-Printable Characters (Control Characters, Escape sequence)• \t• λϒʹϚον͢Δ• \n• վߦʹϚον͢Δ
จࣈΫϥεCharacter Classes (Character Sets)• ෳͷจࣈͷத͔Β̍ͭͷจࣈʹϚον͢Δ• a·ͨeʹϚονͤ͞Δʹ[ae]ͱॻ͘• ʢྫʣgr[ae]y • gray·ͨgreyʹϚον• จࣈΫϥε1ͭͷจࣈʹϚον͢Δ• Χοίͷதͷจࣈͷॱংؔͳ͍
จࣈΫϥεCharacter Classes (Character Sets)• จࣈΫϥεͷதͰϋΠϑϯΛ༻ͯ͠ൣғΛࢦఆͰ͖Δ• [0-9]• ͔̌Β̕ͷؒͷܻ̍ͷࣈʹϚον• [0-9a-fA-F]• େจࣈͱখจࣈΛ۠ผͤͣʹɺ16ਐͷࣈܻ̍ʹϚον• ൱ఆจࣈΫϥεʢNegated Character Classesʣ• [^0-9\r\n]• ࣈ·ͨվߦͰͳ͍ҙͷจࣈʹϚον
จࣈΫϥεʢͷུه๏ʣShorthand Character Classes• จࣈΫϥεͷதͰΑ͘ΘΕΔͷΛ؆୯ʹॻ͚ΔΑ͏ʹ͋Β͔͡Ί༻ҙ͞Εͨه๏• \d[0-9]ͷུه๏• UnicodeΛαϙʔτ͢ΔڥͰࣈؙࣈͳͲͯ͢ͷࣈʹϚον• \w “word character” [A-Za-z0-9_]ͱಉ͡ʢΞϯμʔείΞؚ͕·ΕΔ͜ͱʹҙʣ• UnicodeΛαϙʔτ͢ΔڥͰ͍Ζ͍ΖͳจࣈʹϚον• \s “whitespace character” ۭനจࣈʹϚον [ \t\r\n\f]• UnicodeΛαϙʔτ͢ΔڥͰUnicodeͷʮseparatorʯΧςΰϦͷͯ͢ͷจࣈʹϚον
υοτʢϐϦΦυʣThe Dot Matches (Almost) Any Character• վߦจࣈΛআ̍͘จࣈʹϚον• “dot matches all”·ͨ“single line”Ϟʔυʢϓϩάϥϛϯάݴޠਖ਼نදݱΤϯδϯʹΑͬͯݺͼํҟͳΔʣΛࢦఆ͢ΔͱվߦจࣈΛؚΉҙͷ1จࣈʹϚον• gr.yɺgrayɺgrayɺgr%yͳͲʹϚον• υοτڧྗʹͳΜͰϚον͢ΔͷͰ͍͗͢ͳ͍• จࣈΫϥε൱ఆจࣈΫϥεΛΘΓʹ͏
ΞϯΧʔAnchors• จࣈͰͳ͘ҐஔʹϚον• ^• จࣈྻͷઌ಄ʹϚον• $• จࣈྻͷඌʹϚον• ΄ͱΜͲͷਖ਼نදݱ“multi-line”Ϟʔυ͕͋Γɺ ^վߦͷޙΖɺ $վߦͷલʹϚον͢Δ• \b• ୯ޠڥքʹϚον• ୯ޠڥքͱ\wͰϚονͰ͖ΔจࣈͱɺͰ͖ͳ͍จࣈͷؒͷҐஔ
બAlternation• ཧʢORʣ• cat|dog• About cats and dogs• cat|dog|mouse|fish• ͖ͳ͚ͩͭͳ͛Δ͜ͱ͕Ͱ͖Δ• cat|dog food• cat·ͨdog foodʹϚον• cat food͔dog foodʹϚονͤ͞Δʹɺ(cat|dog) foodͷΑ͏ʹબΛάϧʔϓԽ͢Δ
܁Γฦ͠Repetition• ΫΤενϣϯϚʔΫʮ?ʯ• Optional• colou?rcolor·ͨcolourʹϚον• ΞελϦεΫʮ*ʯ• ̌ճҎ্ͷ܁Γฦ͠• <[A-Za-z][A-Za-z0-9]*>• ଐੑ͕ͳ͍HTMLλάʹϚον• ϓϥεʮ+ʯ• ̍ճҎ্ͷ܁Γฦ͠• Χοίʮ{n,m}ʯ• ࢦఆճͷ܁Γฦ͠• \b[1-9][0-9]{3}b• 1000͔Β9999ͷࣈʹϚον• \b[1-9][0-9]{2,4}\b• 100͔Β99999ͷࣈʹϚον
άϧʔϓͱΩϟϓνϟGrouping and Capturing• ΧοίͰғΉͱάϧʔϓԽ͞ΕΔ• άϧʔϓʹରͯ͠܁Γฦ͠ΛࢦఆͰ͖Δ• Set(Value)? • Set·ͨSetValueʹϚον• ௨ৗͷؙΧοίΩϟϓνϟάϧʔϓΛ࡞͢Δ• Set(Value)?ͷਖ਼نදݱͰSetValue͕Ϛονͨ͠߹ɺάϧʔϓ̍ʹΞΫηε͢ΔͱValue͕औΓग़ͤΔ• Ωϟϓνϟ͕ඞཁͳ͍߹Set(?:Value)?ͱ͢ΔͱΩϟϓνϟ͠ͳ͍άϧʔϓ͕࡞Ͱ͖Δ• ؙΧοίͷޙͷΫΤενϣϯϚʔΫͱɺ̌ճҎ্ͷ܁Γฦ͠ͷࢦఆͷΫΤενϣϯϚʔΫΛࠞಉ͠ͳ͍Α͏ʹҙ
ޙํࢀরBackreferences• ΩϟϓνϟάϧʔϓͰΩϟϓνϟʢϚονʣͨ͠༰ʹϚον• ΩϟϓνϟάϧʔϓʹϚονͨ݁͠ՌΛ࠶ར༻Ͱ͖Δ• <([A-Z][A-Z0-9]*)\b[^>]*>.*?\1>• HTMLλάʹϚονʢΩϟϓνϟάϧʔϓʹϚονͨ͠։࢝λάΛऴྃλάͰ࠶ར༻͍ͯ͠Δʣ
໊લ͖άϧʔϓʢΩϟϓνϟʣͱޙํࢀরNamed Groups and Backreferences• ΩϟϓνϟͷࢀরΛ൪߸Ͱཧ͢Δͷେมͩ͠ɺՃআͰͣΕΔͷͰ໊લΛ͚ΒΕΔ• ߏจʢ໊લ͖άϧʔϓʣ• (?Pgroup)• ߏจʢޙํࢀরʣ• (?P=name)• <(?P[A-Z][A-Z0-9]*)\b[^>]*>.*?(?P=tag)>• HTMLλάʹϚονʢ <([A-Z][A-Z0-9]*)\b[^>]*>.*?\1>ͱಉ͡ʣ• ߏจʢ໊લ͖Ωϟϓνϟʢ.NETʣʣ• (?group)·ͨ(?’name'group)• ߏจʢ໊લʹΑΔࢀরʢ.NETʣʣ• \k·ͨ\k'name'
ઌಡΈͱޙಡΈLookaround (Lookahead/Lookback(Lookbehind))• ಛघͳάϧʔϓͰɺΞϯΧʔͷΑ͏ʹϚονͨ݁͠ՌͷҐஔΛࢦఆ͢Δ• ʢྫʣ\d+(?=€)• ͷޙʹʮ€ʯ͕ଓ͘จࣈྻʹϚον• 1 turkey costs 30€ͷ30ʹϚον• ߏจʢߠఆઌಡΈʢPositive lookaheadʣʣ• X(?=Y)• ߏจʢ൱ఆઌಡΈʢNegative lookaheadʣʣ• X(?!Y)• ߏจʢߠఆޙಡΈʢPositive lookbehindʣʣ• (?<=Y)X• ߏจʢ൱ఆޙಡΈʢNegative lookbehindʣʣ• (?
References• Regular-Expressions.info https://www.regular-expressions.info/• Swift Regex https://swiftregex.com/