Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Rubyの正規表現を調べてみた

D504d1f14dd00f8028c440e9d1923d37?s=47 Yasuhiroki
September 26, 2018

 Rubyの正規表現を調べてみた

調べてみた系の発表です。

D504d1f14dd00f8028c440e9d1923d37?s=128

Yasuhiroki

September 26, 2018
Tweet

Transcript

  1. Ruby ͷ ਖ਼نදݱΛௐ΂ͯΈͨ @yasuhiroki (tw: @duck_ysauhiroki)

  2. ࣗݾ঺հ • Yasuhiroki (Twitter: @duck_yasuhiroki) • ΤʔςϯϥϘגࣜձࣾ • αʔόʔαΠυΤϯδχΞ •

    AWS • Ruby on Rails • ͨ·ʹ Android ΋΍ͬͯΔ
  3. ൃද಺༰ͷ͖͔͚ͬ

  4. ϋογϡλάΛൈ͖ग़͍ͨ͠ $ cat text λά #λά ##λά# ## #ά#͙ͨ##tag g

    λά #ͨ# #͙ͨ # #λάɹ#͙ͨ #
  5. ϋογϡλάΛൈ͖ग़͍ͨ͠ $ cat text #͙ͨ # #λάɹ#͙ͨ # $ cat

    text | ruby -ne 'p $_.scan(/????/)’ ["#͙ͨ", "#λά", "#͙ͨ", "#"]
  6. ϋογϡλάΛൈ͖ग़͍ͨ͠ $ cat text #͙ͨ # #λάɹ#͙ͨ # $ cat

    text | ruby -ne 'p $_.scan(/#[^#\s]+/)’ ["#͙ͨ", "#λάɹ", "#͙ͨ", “#"] શ֯εϖʔε͕औΓআ͚ͳ͍ʂ
  7. ϋογϡλάΛൈ͖ग़͍ͨ͠ $ cat text | ruby -ne 'p $_.scan(/#[^#\s]+/)’ ["#͙ͨ",

    "#λάɹ", "#͙ͨ", "#"] શ֯εϖʔε͕औΓআ͚ͳ͍ $ cat text | \ ruby -ne 'p $_.scan(/#[^#[:space:]+/)’ ["#͙ͨ", "#λά", "#͙ͨ", "#"] ͬͪ͜ͳΒΦοέʔ
  8. \s ͱ [:space:] ͸ԿͰҧ͏ͷʁ

  9. Rubyͷਖ਼نදݱΛௐ΂ͯΈͨ

  10. ൃදͷલఏ • RubyϫΧϧ • ਖ਼نදݱνϣοτγοςϧ • /Ebisu\.rb#\d+/ ͘Β͍͸ϫΧϧ • (ͪͳΈʹ)

    “Ebisu.rb#18” ʹϚον͠·͢
  11. Rubyͷਖ਼نදݱΛௐ΂ͯΈͨ

  12. Rubyͷਖ਼نදݱΤϯδϯ • َӢ https://github.com/k-takata/Onigmo/ • Ruby ͷਖ਼نදݱΤϯδϯ • Ruby 2.0

    ͔Β࠾༻ • PerlͰ΋࢖ΘΕ͍ͯΔʁ • ଞͰ͸࢖ΘΕ͍ͯͳ͍ʁ
  13. Ruby ͷਖ਼نදݱ • ௐ΂ͯʮ͓ͬʯͱࢥͬͨ΋ͷΛϐοΫΞοϓ • Character Property • ෦෼ࣜݺͼग़͠ •

    ඇแؚԋࢉࢠ • ઌಡΈɺޙಡΈ
  14. Ruby ͷਖ਼نදݱ • ௐ΂ͯʮ͓ͬʯͱࢥͬͨ΋ͷΛϐοΫΞοϓ • Character Property • ෦෼ࣜݺͼग़͠ •

    ඇแؚԋࢉࢠ • ઌಡΈɺޙಡΈ
  15. Character Property • \p{} Ͱ Unicode ͷ Character Property Λࢦఆ

    Ͱ͖Δ • ͻΒ͕ͳɺΧλΧφɺ׽ࣈɺֆจࣈɺಛघͳ จࣈ ͍Ζ͍ΖϚονͰ͖Δ
  16. ͻΒ͕ͳɾΧλΧφʹϚον "ΈΜνϟϨ".match(/\p{Hiragana}+/) => #<MatchData “ΈΜ"> "ΈΜνϟϨ".match(/\p{Katakana}+/) => #<MatchData “νϟϨ">

  17. ֆจࣈʹϚον "͙ʔͺΜͪ".match(/\p{Emoji}/) => #<MatchData "">

  18. ৭෇͖ֆจࣈʹϚον "͙ʔ$ͺΜͪ".match(/\p{Emoji}/) => #<MatchData ""> ৭͕ണ͕Εͯ͠·͏ "͙ʔ$ͺΜͪ".match(/\p{Emoji}\p{Emoji_Modifier}/) #=> #<MatchData "$">

    ৭(skin)΋ࢦఆ͢Ε͹OK
  19. ٽ͔ͳ͍Ͱ… (´°̥̥̥ω°̥̥̥ʆ) "(´° ̥ ̥̥ω° ̥ ̥̥ʆ)".gsub(/\p{Combining_Mark}/, '') => “(´°ω°ʆ)"

    ݁߹จࣈΛۭจࣈʹม׵͢Ε͹ྦΛ১͍ڈΕΔʂ
  20. Ruby ͷਖ਼نදݱ • ௐ΂ͯʮ͓ͬʯͱࢥͬͨ΋ͷΛϐοΫΞοϓ • Character Property • ෦෼ࣜݺͼग़͠ •

    ඇแؚԋࢉࢠ • ઌಡΈɺޙಡΈ
  21. ෦෼ࣜݺͼग़͠ • \g{name} άϧʔϓͷࣜͦͷ΋ͷΛݺͼग़͢ • \1, \2 ͷΑ͏ͳޙํࢀরͱ͸ҧ͏

  22. ໋ྩ͞Ε͍ͯΔϝϩε “૸ΕϝϩεౖΕϝϩε伻Εϝϩε” .match(/(.Εϝϩε)\g<1>\g<1>/) => #<MatchData "૸ΕϝϩεౖΕϝϩε伻Εϝϩε" 1:”伻Εϝϩε "> (.Εϝϩε)\g<1>\g<1>ɹ͸ (.Εϝϩε)(.Εϝϩε)(.Εϝϩε)

    ͱಉ͡ “૸Εϝϩε৸Δϝϩε伻Εϝϩε" .match(/(.Εϝϩε)\g<1>\g<1>/) => nil
  23. Ruby ͷਖ਼نදݱ • ௐ΂ͯʮ͓ͬʯͱࢥͬͨ΋ͷΛϐοΫΞοϓ • Character Property • ෦෼ࣜݺͼग़͠ •

    ඇแؚԋࢉࢠ • ઌಡΈɺޙಡΈ
  24. ඇแؚԋࢉࢠ • (?~) ͰจࣈྻΛؚ·ͳ͍͕දݱͰ͖Δ • (?~abc) ͸จࣈྻ abc Λؚ·ͳ͍ͷҙ •

    ab ΍ ac ͸ڐ͢ • ࢀߟ
 [^abc] ͸ a, b, c ͷ͍ͣΕͷจࣈ΋ؚ·ͳ͍ͷҙ • ab ΍ ac ΋ڐ͞ͳ͍
  25. ίϝϯτΞ΢τͷநग़ "/* ͜͜ͷ࣮૷͸࠷ѱͰ͢(*^o^*)/ */" .match(%r{/\*(?~\*/)\*/}) => #<MatchData "/* ͜͜ͷ࣮૷͸࠷ѱͰ͢(*^o^*)/ */">

    "/* ͜͜ͷ࣮૷͸࠷ѱͰ͢(*^o^*)/ */" .match(%r{/\*[^\*]*\*+(([^\*/][^\*]*)\*+)*/}) => #<MatchData "/* ͜͜ͷ࣮૷͸࠷ѱͰ͢(*^o^*)/ */" 1:")/ *" 2:")/ "> (?~) Λ࢖Θͳ͍ͱͪΐͬͱେม ※ https://qiita.com/k-takata/items/4e45121081c83d3d5bfd
  26. Ruby ͷਖ਼نදݱ • ௐ΂ͯʮ͓ͬʯͱࢥͬͨ΋ͷΛϐοΫΞοϓ • Character Property • ෦෼ࣜݺͼग़͠ •

    ඇแؚԋࢉࢠ • ઌಡΈɺޙಡΈ
  27. ઌಡΈɺޙಡΈ • (?=) ΍ (?<=) ͳͲ • Ϛον͢Δ৚݅ʹ͸ࢦఆ͢Δ͚Ͳ
 Ϛονͨ݁͠Ռʹ͸ؚΊͨ͘ͳ͍࣌ʹ࢖͏ •

    ࢖͍ํʹΑͬͯ͸ANDͬΆ͘࢖͑Δ
  28. ܙൺणͷ൪஍ͷΈऔಘ "౦ژ౎ौ୩۠ܙൺण1-8-5 ౦༸Ϗϧ 3֊" .match(/(?<=ܙൺण)\S+/) => #<MatchData "1-8-5">

  29. ౦ژͷौ୩ͷܙൺणͷ͚࣌ͩϚον "౦ژ౎ौ୩۠ܙൺण1-8-5 ౦༸Ϗϧ 3֊" .match(/(?=.*౦ژ)(?=.*ौ୩)(?=.*ܙൺण).*/) => #<MatchData "౦ژ౎ौ୩۠ܙൺण1-8-5 ౦༸Ϗϧ 3֊">

    "ژ౎෎ौ୩۠ܙൺण1-8-5 ౦༸Ϗϧ 3֊" .match(/(?=.*౦ژ)(?=.*ौ୩)(?=.*ܙൺण).*/) => nil ژ౎ͩͱϚον͠ͳ͍
  30. ͓·͚ • \s ͱ [:space:] ͕ҧ͏ཧ༝Λௐ΂ͯΈͨ

  31. َӢͷυΩϡϝϯτͰ͸ʁ • \s • 0009, 000A, 000B, 000C, 000D, 0085(NEL)

    • Line_Separator, Paragraph_Separator, Space_Separator • [:space:] • 0009, 000A, 000B, 000C, 000D, 0085(NEL) • Line_Separator, Paragraph_Separator, Space_Separator
  32. َӢͷυΩϡϝϯτͰ͸ʁ • \s • 0009, 000A, 000B, 000C, 000D, 0085(NEL)

    • Line_Separator, Paragraph_Separator, Space_Separator • ASCII֎ͷจࣈΛؚΉ͔Ͳ͏͔͸ ONIG_OPTION_ASCII_RANGE Φϓγϣϯʹґଘ ͢Δɻ • [:space:] • 0009, 000A, 000B, 000C, 000D, 0085(NEL) • Line_Separator, Paragraph_Separator, Space_Separator • ASCII֎ͷจࣈʹϚον͢Δ͔Ͳ͏͔͸ ONIG_OPTION_ASCII_RANGE Φϓγϣϯ ͱ ONIG_OPTION_POSIX_BRACKET_ALL_RANGE Φϓγϣϯʹґଘ͢Δɻ
  33. จࣈू߹Φϓγϣϯʹώϯτ͕ • d: σϑΥϧτ (Ruby 1.9.3 ޓ׵)
 \w, \d, \s

    ͸ɺඇASCIIจࣈʹϚον͠ͳ͍ɻ
 POSIXϒϥέοτ͸ɺ֤ΤϯίʔσΟϯάͷϧʔϧʹै ͏ɻ • u: Unicode
 ONIG_OPTION_ASCII_RANGEΦϓγϣϯ͕ΦϑʹͳΔɻ
 \w (\W), \d (\D), \s (\S), \b (\B), POSIXϒϥέοτ͸ɺ֤Τ ϯίʔσΟϯάͷϧʔϧʹै͏ɻ
  34. \s ͱ [:space:] ͷڍಈ • σϑΥϧτ • \s: ASCIIจࣈͷΈର৅ •

    [:space:]: ASCIIจࣈҎ֎΋ର৅
  35. จࣈू߹ΦϓγϣϯΛࢦఆ͢Δͱʁ "#λάɹ".scan(/(?u)#[^#\s]+/) => ["#λά"] \sͰ΋શ֯εϖʔεΛѻ͑Δ

  36. ࢀߟ • [ਖ਼نදݱ](https://docs.ruby-lang.org/ja/latest/doc/spec=2fregexp.html) • [RegexpΫϥε](https://docs.ruby-lang.org/ja/latest/class/Regexp.html) • [Emoji Properties](http://unicode.org/reports/tr51/#Emoji_Properties) • [َӢ](https://github.com/k-takata/Onigmo/)

    • [َӢʹඇแؚΦϖϨʔλΛ࣮૷ͨ͠࿩](https://qiita.com/k-takata/items/ 4e45121081c83d3d5bfd) • [ਖ਼نදݱϝϞ](http://www.kt.rim.or.jp/~kbk/regex/regex.html)
  37. ͋Γ͕ͱ͏͍͟͝·ͨ͠