Slide 1

Slide 1 text

Ruby ͷ ਖ਼نදݱΛௐ΂ͯΈͨ @yasuhiroki (tw: @duck_ysauhiroki)

Slide 2

Slide 2 text

ࣗݾ঺հ • Yasuhiroki (Twitter: @duck_yasuhiroki) • ΤʔςϯϥϘגࣜձࣾ • αʔόʔαΠυΤϯδχΞ • AWS • Ruby on Rails • ͨ·ʹ Android ΋΍ͬͯΔ

Slide 3

Slide 3 text

ൃද಺༰ͷ͖͔͚ͬ

Slide 4

Slide 4 text

ϋογϡλάΛൈ͖ग़͍ͨ͠ $ cat text λά #λά ##λά# ## #ά#͙ͨ##tag g λά #ͨ# #͙ͨ # #λάɹ#͙ͨ #

Slide 5

Slide 5 text

ϋογϡλάΛൈ͖ग़͍ͨ͠ $ cat text #͙ͨ # #λάɹ#͙ͨ # $ cat text | ruby -ne 'p $_.scan(/????/)’ ["#͙ͨ", "#λά", "#͙ͨ", "#"]

Slide 6

Slide 6 text

ϋογϡλάΛൈ͖ग़͍ͨ͠ $ cat text #͙ͨ # #λάɹ#͙ͨ # $ cat text | ruby -ne 'p $_.scan(/#[^#\s]+/)’ ["#͙ͨ", "#λάɹ", "#͙ͨ", “#"] શ֯εϖʔε͕औΓআ͚ͳ͍ʂ

Slide 7

Slide 7 text

ϋογϡλάΛൈ͖ग़͍ͨ͠ $ cat text | ruby -ne 'p $_.scan(/#[^#\s]+/)’ ["#͙ͨ", "#λάɹ", "#͙ͨ", "#"] શ֯εϖʔε͕औΓআ͚ͳ͍ $ cat text | \ ruby -ne 'p $_.scan(/#[^#[:space:]+/)’ ["#͙ͨ", "#λά", "#͙ͨ", "#"] ͬͪ͜ͳΒΦοέʔ

Slide 8

Slide 8 text

\s ͱ [:space:] ͸ԿͰҧ͏ͷʁ

Slide 9

Slide 9 text

Rubyͷਖ਼نදݱΛௐ΂ͯΈͨ

Slide 10

Slide 10 text

ൃදͷલఏ • RubyϫΧϧ • ਖ਼نදݱνϣοτγοςϧ • /Ebisu\.rb#\d+/ ͘Β͍͸ϫΧϧ • (ͪͳΈʹ) “Ebisu.rb#18” ʹϚον͠·͢

Slide 11

Slide 11 text

Rubyͷਖ਼نදݱΛௐ΂ͯΈͨ

Slide 12

Slide 12 text

Rubyͷਖ਼نදݱΤϯδϯ • َӢ https://github.com/k-takata/Onigmo/ • Ruby ͷਖ਼نදݱΤϯδϯ • Ruby 2.0 ͔Β࠾༻ • PerlͰ΋࢖ΘΕ͍ͯΔʁ • ଞͰ͸࢖ΘΕ͍ͯͳ͍ʁ

Slide 13

Slide 13 text

Ruby ͷਖ਼نදݱ • ௐ΂ͯʮ͓ͬʯͱࢥͬͨ΋ͷΛϐοΫΞοϓ • Character Property • ෦෼ࣜݺͼग़͠ • ඇแؚԋࢉࢠ • ઌಡΈɺޙಡΈ

Slide 14

Slide 14 text

Ruby ͷਖ਼نදݱ • ௐ΂ͯʮ͓ͬʯͱࢥͬͨ΋ͷΛϐοΫΞοϓ • Character Property • ෦෼ࣜݺͼग़͠ • ඇแؚԋࢉࢠ • ઌಡΈɺޙಡΈ

Slide 15

Slide 15 text

Character Property • \p{} Ͱ Unicode ͷ Character Property Λࢦఆ Ͱ͖Δ • ͻΒ͕ͳɺΧλΧφɺ׽ࣈɺֆจࣈɺಛघͳ จࣈ ͍Ζ͍ΖϚονͰ͖Δ

Slide 16

Slide 16 text

ͻΒ͕ͳɾΧλΧφʹϚον "ΈΜνϟϨ".match(/\p{Hiragana}+/) => # "ΈΜνϟϨ".match(/\p{Katakana}+/) => #

Slide 17

Slide 17 text

ֆจࣈʹϚον "͙ʔͺΜͪ".match(/\p{Emoji}/) => #

Slide 18

Slide 18 text

৭෇͖ֆจࣈʹϚον "͙ʔ$ͺΜͪ".match(/\p{Emoji}/) => # ৭͕ണ͕Εͯ͠·͏ "͙ʔ$ͺΜͪ".match(/\p{Emoji}\p{Emoji_Modifier}/) #=> # ৭(skin)΋ࢦఆ͢Ε͹OK

Slide 19

Slide 19 text

ٽ͔ͳ͍Ͱ… (´°̥̥̥ω°̥̥̥ʆ) "(´° ̥ ̥̥ω° ̥ ̥̥ʆ)".gsub(/\p{Combining_Mark}/, '') => “(´°ω°ʆ)" ݁߹จࣈΛۭจࣈʹม׵͢Ε͹ྦΛ১͍ڈΕΔʂ

Slide 20

Slide 20 text

Ruby ͷਖ਼نදݱ • ௐ΂ͯʮ͓ͬʯͱࢥͬͨ΋ͷΛϐοΫΞοϓ • Character Property • ෦෼ࣜݺͼग़͠ • ඇแؚԋࢉࢠ • ઌಡΈɺޙಡΈ

Slide 21

Slide 21 text

෦෼ࣜݺͼग़͠ • \g{name} άϧʔϓͷࣜͦͷ΋ͷΛݺͼग़͢ • \1, \2 ͷΑ͏ͳޙํࢀরͱ͸ҧ͏

Slide 22

Slide 22 text

໋ྩ͞Ε͍ͯΔϝϩε “૸ΕϝϩεౖΕϝϩε伻Εϝϩε” .match(/(.Εϝϩε)\g<1>\g<1>/) => # (.Εϝϩε)\g<1>\g<1>ɹ͸ (.Εϝϩε)(.Εϝϩε)(.Εϝϩε) ͱಉ͡ “૸Εϝϩε৸Δϝϩε伻Εϝϩε" .match(/(.Εϝϩε)\g<1>\g<1>/) => nil

Slide 23

Slide 23 text

Ruby ͷਖ਼نදݱ • ௐ΂ͯʮ͓ͬʯͱࢥͬͨ΋ͷΛϐοΫΞοϓ • Character Property • ෦෼ࣜݺͼग़͠ • ඇแؚԋࢉࢠ • ઌಡΈɺޙಡΈ

Slide 24

Slide 24 text

ඇแؚԋࢉࢠ • (?~) ͰจࣈྻΛؚ·ͳ͍͕දݱͰ͖Δ • (?~abc) ͸จࣈྻ abc Λؚ·ͳ͍ͷҙ • ab ΍ ac ͸ڐ͢ • ࢀߟ
 [^abc] ͸ a, b, c ͷ͍ͣΕͷจࣈ΋ؚ·ͳ͍ͷҙ • ab ΍ ac ΋ڐ͞ͳ͍

Slide 25

Slide 25 text

ίϝϯτΞ΢τͷநग़ "/* ͜͜ͷ࣮૷͸࠷ѱͰ͢(*^o^*)/ */" .match(%r{/\*(?~\*/)\*/}) => # "/* ͜͜ͷ࣮૷͸࠷ѱͰ͢(*^o^*)/ */" .match(%r{/\*[^\*]*\*+(([^\*/][^\*]*)\*+)*/}) => # (?~) Λ࢖Θͳ͍ͱͪΐͬͱେม ※ https://qiita.com/k-takata/items/4e45121081c83d3d5bfd

Slide 26

Slide 26 text

Ruby ͷਖ਼نදݱ • ௐ΂ͯʮ͓ͬʯͱࢥͬͨ΋ͷΛϐοΫΞοϓ • Character Property • ෦෼ࣜݺͼग़͠ • ඇแؚԋࢉࢠ • ઌಡΈɺޙಡΈ

Slide 27

Slide 27 text

ઌಡΈɺޙಡΈ • (?=) ΍ (?<=) ͳͲ • Ϛον͢Δ৚݅ʹ͸ࢦఆ͢Δ͚Ͳ
 Ϛονͨ݁͠Ռʹ͸ؚΊͨ͘ͳ͍࣌ʹ࢖͏ • ࢖͍ํʹΑͬͯ͸ANDͬΆ͘࢖͑Δ

Slide 28

Slide 28 text

ܙൺणͷ൪஍ͷΈऔಘ "౦ژ౎ौ୩۠ܙൺण1-8-5 ౦༸Ϗϧ 3֊" .match(/(?<=ܙൺण)\S+/) => #

Slide 29

Slide 29 text

౦ژͷौ୩ͷܙൺणͷ͚࣌ͩϚον "౦ژ౎ौ୩۠ܙൺण1-8-5 ౦༸Ϗϧ 3֊" .match(/(?=.*౦ژ)(?=.*ौ୩)(?=.*ܙൺण).*/) => # "ژ౎෎ौ୩۠ܙൺण1-8-5 ౦༸Ϗϧ 3֊" .match(/(?=.*౦ژ)(?=.*ौ୩)(?=.*ܙൺण).*/) => nil ژ౎ͩͱϚον͠ͳ͍

Slide 30

Slide 30 text

͓·͚ • \s ͱ [:space:] ͕ҧ͏ཧ༝Λௐ΂ͯΈͨ

Slide 31

Slide 31 text

َӢͷυΩϡϝϯτͰ͸ʁ • \s • 0009, 000A, 000B, 000C, 000D, 0085(NEL) • Line_Separator, Paragraph_Separator, Space_Separator • [:space:] • 0009, 000A, 000B, 000C, 000D, 0085(NEL) • Line_Separator, Paragraph_Separator, Space_Separator

Slide 32

Slide 32 text

َӢͷυΩϡϝϯτͰ͸ʁ • \s • 0009, 000A, 000B, 000C, 000D, 0085(NEL) • Line_Separator, Paragraph_Separator, Space_Separator • ASCII֎ͷจࣈΛؚΉ͔Ͳ͏͔͸ ONIG_OPTION_ASCII_RANGE Φϓγϣϯʹґଘ ͢Δɻ • [:space:] • 0009, 000A, 000B, 000C, 000D, 0085(NEL) • Line_Separator, Paragraph_Separator, Space_Separator • ASCII֎ͷจࣈʹϚον͢Δ͔Ͳ͏͔͸ ONIG_OPTION_ASCII_RANGE Φϓγϣϯ ͱ ONIG_OPTION_POSIX_BRACKET_ALL_RANGE Φϓγϣϯʹґଘ͢Δɻ

Slide 33

Slide 33 text

จࣈू߹Φϓγϣϯʹώϯτ͕ • d: σϑΥϧτ (Ruby 1.9.3 ޓ׵)
 \w, \d, \s ͸ɺඇASCIIจࣈʹϚον͠ͳ͍ɻ
 POSIXϒϥέοτ͸ɺ֤ΤϯίʔσΟϯάͷϧʔϧʹै ͏ɻ • u: Unicode
 ONIG_OPTION_ASCII_RANGEΦϓγϣϯ͕ΦϑʹͳΔɻ
 \w (\W), \d (\D), \s (\S), \b (\B), POSIXϒϥέοτ͸ɺ֤Τ ϯίʔσΟϯάͷϧʔϧʹै͏ɻ

Slide 34

Slide 34 text

\s ͱ [:space:] ͷڍಈ • σϑΥϧτ • \s: ASCIIจࣈͷΈର৅ • [:space:]: ASCIIจࣈҎ֎΋ର৅

Slide 35

Slide 35 text

จࣈू߹ΦϓγϣϯΛࢦఆ͢Δͱʁ "#λάɹ".scan(/(?u)#[^#\s]+/) => ["#λά"] \sͰ΋શ֯εϖʔεΛѻ͑Δ

Slide 36

Slide 36 text

ࢀߟ • [ਖ਼نදݱ](https://docs.ruby-lang.org/ja/latest/doc/spec=2fregexp.html) • [RegexpΫϥε](https://docs.ruby-lang.org/ja/latest/class/Regexp.html) • [Emoji Properties](http://unicode.org/reports/tr51/#Emoji_Properties) • [َӢ](https://github.com/k-takata/Onigmo/) • [َӢʹඇแؚΦϖϨʔλΛ࣮૷ͨ͠࿩](https://qiita.com/k-takata/items/ 4e45121081c83d3d5bfd) • [ਖ਼نදݱϝϞ](http://www.kt.rim.or.jp/~kbk/regex/regex.html)

Slide 37

Slide 37 text

͋Γ͕ͱ͏͍͟͝·ͨ͠