Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Rubyの正規表現を調べてみた

Yasuhiroki
September 26, 2018

 Rubyの正規表現を調べてみた

調べてみた系の発表です。

Yasuhiroki

September 26, 2018
Tweet

More Decks by Yasuhiroki

Other Decks in Technology

Transcript

  1. ϋογϡλάΛൈ͖ग़͍ͨ͠ $ cat text #͙ͨ # #λάɹ#͙ͨ # $ cat

    text | ruby -ne 'p $_.scan(/????/)’ ["#͙ͨ", "#λά", "#͙ͨ", "#"]
  2. ϋογϡλάΛൈ͖ग़͍ͨ͠ $ cat text #͙ͨ # #λάɹ#͙ͨ # $ cat

    text | ruby -ne 'p $_.scan(/#[^#\s]+/)’ ["#͙ͨ", "#λάɹ", "#͙ͨ", “#"] શ֯εϖʔε͕औΓআ͚ͳ͍ʂ
  3. ϋογϡλάΛൈ͖ग़͍ͨ͠ $ cat text | ruby -ne 'p $_.scan(/#[^#\s]+/)’ ["#͙ͨ",

    "#λάɹ", "#͙ͨ", "#"] શ֯εϖʔε͕औΓআ͚ͳ͍ $ cat text | \ ruby -ne 'p $_.scan(/#[^#[:space:]+/)’ ["#͙ͨ", "#λά", "#͙ͨ", "#"] ͬͪ͜ͳΒΦοέʔ
  4. Character Property • \p{} Ͱ Unicode ͷ Character Property Λࢦఆ

    Ͱ͖Δ • ͻΒ͕ͳɺΧλΧφɺ׽ࣈɺֆจࣈɺಛघͳ จࣈ ͍Ζ͍ΖϚονͰ͖Δ
  5. ඇแؚԋࢉࢠ • (?~) ͰจࣈྻΛؚ·ͳ͍͕දݱͰ͖Δ • (?~abc) ͸จࣈྻ abc Λؚ·ͳ͍ͷҙ •

    ab ΍ ac ͸ڐ͢ • ࢀߟ
 [^abc] ͸ a, b, c ͷ͍ͣΕͷจࣈ΋ؚ·ͳ͍ͷҙ • ab ΍ ac ΋ڐ͞ͳ͍
  6. ίϝϯτΞ΢τͷநग़ "/* ͜͜ͷ࣮૷͸࠷ѱͰ͢(*^o^*)/ */" .match(%r{/\*(?~\*/)\*/}) => #<MatchData "/* ͜͜ͷ࣮૷͸࠷ѱͰ͢(*^o^*)/ */">

    "/* ͜͜ͷ࣮૷͸࠷ѱͰ͢(*^o^*)/ */" .match(%r{/\*[^\*]*\*+(([^\*/][^\*]*)\*+)*/}) => #<MatchData "/* ͜͜ͷ࣮૷͸࠷ѱͰ͢(*^o^*)/ */" 1:")/ *" 2:")/ "> (?~) Λ࢖Θͳ͍ͱͪΐͬͱେม ※ https://qiita.com/k-takata/items/4e45121081c83d3d5bfd
  7. َӢͷυΩϡϝϯτͰ͸ʁ • \s • 0009, 000A, 000B, 000C, 000D, 0085(NEL)

    • Line_Separator, Paragraph_Separator, Space_Separator • [:space:] • 0009, 000A, 000B, 000C, 000D, 0085(NEL) • Line_Separator, Paragraph_Separator, Space_Separator
  8. َӢͷυΩϡϝϯτͰ͸ʁ • \s • 0009, 000A, 000B, 000C, 000D, 0085(NEL)

    • Line_Separator, Paragraph_Separator, Space_Separator • ASCII֎ͷจࣈΛؚΉ͔Ͳ͏͔͸ ONIG_OPTION_ASCII_RANGE Φϓγϣϯʹґଘ ͢Δɻ • [:space:] • 0009, 000A, 000B, 000C, 000D, 0085(NEL) • Line_Separator, Paragraph_Separator, Space_Separator • ASCII֎ͷจࣈʹϚον͢Δ͔Ͳ͏͔͸ ONIG_OPTION_ASCII_RANGE Φϓγϣϯ ͱ ONIG_OPTION_POSIX_BRACKET_ALL_RANGE Φϓγϣϯʹґଘ͢Δɻ
  9. จࣈू߹Φϓγϣϯʹώϯτ͕ • d: σϑΥϧτ (Ruby 1.9.3 ޓ׵)
 \w, \d, \s

    ͸ɺඇASCIIจࣈʹϚον͠ͳ͍ɻ
 POSIXϒϥέοτ͸ɺ֤ΤϯίʔσΟϯάͷϧʔϧʹै ͏ɻ • u: Unicode
 ONIG_OPTION_ASCII_RANGEΦϓγϣϯ͕ΦϑʹͳΔɻ
 \w (\W), \d (\D), \s (\S), \b (\B), POSIXϒϥέοτ͸ɺ֤Τ ϯίʔσΟϯάͷϧʔϧʹै͏ɻ