Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Remove AS::Mb::Unicode::UnicodeDatabase

Remove AS::Mb::Unicode::UnicodeDatabase

ぎんざRuby会議01 発表資料

https://ginzarb.github.io/kaigi01/

Fumiaki MATSUSHIMA

August 05, 2017
Tweet

More Decks by Fumiaki MATSUSHIMA

Other Decks in Programming

Transcript

  1. @mtsmfm
    ActiveSupport::Multibyte::
    Unicode::UnicodeDatabase
    を消したかった

    View Slide

  2. Fumiaki
    MATSUSHIMA
    GitHub, Twitter
    @mtsmfm
    Web Developer

    View Slide

  3. https://www.quipper.com/

    View Slide

  4. https://ninirb.github.io

    View Slide

  5. https://www.meetup.com/ja-JP/GraphQL-Tokyo/

    View Slide

  6. http://rubykaigi.org/2017/speakers

    View Slide

  7. http://contributors.rubyonrails.org/

    View Slide

  8. Rails で
    一番大きいファイル
    知ってますか?

    View Slide

  9. $ find vendor/bundle/gems/acti* -type f -exec du -h -a {} + | sort -h -r | head -n 10
    1.1M vendor/bundle/gems/activesupport-5.1.2/lib/active_support/values/unicode_tables.dat
    104K vendor/bundle/gems/actionview-5.1.2/lib/action_view/helpers/form_helper.rb
    100K vendor/bundle/gems/activerecord-5.1.2/lib/active_record/associations.rb
    76K vendor/bundle/gems/actionpack-5.1.2/lib/action_dispatch/routing/mapper.rb
    60K vendor/bundle/gems/actionview-5.1.2/lib/action_view/helpers/date_helper.rb
    52K vendor/bundle/gems/activerecord-5.1.2/lib/active_record/connection_adapters/abstract/schema_statements.rb
    44K vendor/bundle/gems/activerecord-5.1.2/lib/active_record/migration.rb
    44K vendor/bundle/gems/actionview-5.1.2/lib/action_view/helpers/form_tag_helper.rb
    44K vendor/bundle/gems/actionview-5.1.2/lib/action_view/helpers/form_options_helper.rb
    40K vendor/bundle/gems/activerecord-5.1.2/lib/active_record/relation/query_methods.rb

    View Slide

  10. $ find vendor/bundle/gems/acti* -type f -exec du -h -a {} + | sort -h -r | head -n 10
    1.1M vendor/bundle/gems/activesupport-5.1.2/lib/active_support/values/unicode_tables.dat
    104K vendor/bundle/gems/actionview-5.1.2/lib/action_view/helpers/form_helper.rb
    100K vendor/bundle/gems/activerecord-5.1.2/lib/active_record/associations.rb
    76K vendor/bundle/gems/actionpack-5.1.2/lib/action_dispatch/routing/mapper.rb
    60K vendor/bundle/gems/actionview-5.1.2/lib/action_view/helpers/date_helper.rb
    52K vendor/bundle/gems/activerecord-5.1.2/lib/active_record/connection_adapters/abstract/schema_statements.rb
    44K vendor/bundle/gems/activerecord-5.1.2/lib/active_record/migration.rb
    44K vendor/bundle/gems/actionview-5.1.2/lib/action_view/helpers/form_tag_helper.rb
    44K vendor/bundle/gems/actionview-5.1.2/lib/action_view/helpers/form_options_helper.rb
    40K vendor/bundle/gems/activerecord-5.1.2/lib/active_record/relation/query_methods.rb
    1.1M!

    View Slide

  11. active_support/values/unicode_tables.dat

    View Slide

  12. https://github.com/rails/rails/blob/16f2b2044eaaa54b7bc205ef9af1689a152b2fdf/actives
    upport/lib/active_support/multibyte/unicode.rb

    View Slide

  13. Rails で
    一番大きいファイル

    ActiveSupport::Multibyte::
    Unicode::UnicodeDatabase
    の dat ファイル

    View Slide

  14. https://github.com/rails/rails/pull/26743

    View Slide

  15. @mtsmfm
    ActiveSupport::Multibyte::
    Unicode::UnicodeDatabase
    を消したかった

    View Slide

  16. http://agile.esm.co.jp/news/2016-04-08-rails-study-session.html

    View Slide

  17. 社内 Rails 勉強会

    OSS パッチ会

    View Slide

  18. https://speakerdeck.com/a_matsuda/3x-rails

    View Slide

  19. https://speakerdeck.com/a_matsuda/3x-rails?slide=156

    View Slide

  20. https://speakerdeck.com/a_matsuda/3x-rails?slide=156

    View Slide

  21. https://speakerdeck.com/a_matsuda/3x-rails?slide=156

    View Slide

  22. View Slide

  23. View Slide

  24. AS::Mb::Unicode
    そもそも何ができる?

    View Slide

  25. View Slide

  26. PR 出したタイミングの
    Rails v5.0.0.1 の
    コードベースで話をします
    (今も大差ないけれど)
    当時は Ruby 2.4 が出る
    ちょっと前でした

    View Slide

  27. - Normalize
    - Case mapping
    - Pack/unpack grapheme
    - Tidy bytes

    View Slide

  28. - Normalize
    - Case mapping
    - Pack/unpack grapheme
    - Tidy bytes
    AS::Mb::Unicode::UnicodeDatabase
    使ってない

    View Slide

  29. - Normalize
    - Case mapping
    - Pack/unpack grapheme

    View Slide

  30. Unicode Normalize
    とは

    View Slide

  31. Decompose
    ‘が’ [‘か’, ‘゛’]
    Compose
    [‘か’, ‘゛’] ‘が’

    View Slide

  32. Normalize 関連のメソッド
    - AS::Mb::Unicode#normalize
    - AS::Mb::Unicode#decompose
    - AS::Mb::Unicode#compose
    - AS::Mb::Unicode#reorder_characters

    View Slide

  33. Unicode 正規化
    - NFD
    - NFC
    - NFKD
    - NFKC
    Normalization
    Form
    Decopose
    Compose

    View Slide

  34. Unicode 正規化
    - NFD
    - NFC
    - NFKD
    - NFKC
    Normalization
    Form
    Decopose
    Compose

    View Slide

  35. Unicode 正規化
    - NFD
    - NFC
    - NFKD
    - NFKC
    Normalization
    Form
    Decopose
    Compose

    View Slide

  36. “In NFKC and NFKD, a K is
    used to stand for
    compatibility to avoid
    confusion with the C
    standing for composition.”
    http://unicode.org/reports/tr15/

    View Slide

  37. Unicode 正規化
    - NFD
    - NFC
    - NFKD
    - NFKC
    Normalization
    Form
    Decopose
    Compose
    K(C)ompatibility
    (互換等価)

    View Slide

  38. Unicode 正規化の等価性
    - 正準等価 (Kじゃない方)
    - 戻れる
    - 互換等価 (Kの方)
    - 緩め。戻れない

    View Slide


  39. View Slide

  40. 正準等価
    ‘㈱’ != [‘(’ , ‘株’, ‘)’]
    互換等価
    ‘㈱’ == [‘(’, ‘株’, ‘)’]

    View Slide

  41. Normalize 関連のメソッド
    - AS::Mb::Unicode#normalize
    - AS::Mb::Unicode#decompose
    - AS::Mb::Unicode#compose
    - AS::Mb::Unicode#reorder_characters

    View Slide

  42. https://github.com/rails/rails/blob/v5.0.0.1/activesupport/lib/active_
    support/multibyte/unicode.rb#L285-L301

    View Slide

  43. https://github.com/rails/rails/blob/v5.0.0.1/activesupport/lib/active_
    support/multibyte/unicode.rb#L159-L177

    View Slide

  44. https://github.com/rails/rails/blob/v5.0.0.1/activesupport/lib/active_
    support/multibyte/unicode.rb#L180-L236

    View Slide

  45. https://github.com/rails/rails/blob/v5.0.0.1/activesupport/lib/active_
    support/multibyte/unicode.rb#L143-L136

    View Slide

  46. Normalize 関連のメソッド
    - AS::Mb::Unicode#normalize
    - AS::Mb::Unicode#decompose
    - AS::Mb::Unicode#compose
    - AS::Mb::Unicode#reorder_characters
    #normalize で使うための
    ヘルパメソッド
    (なぜ public なのか...)

    View Slide

  47. Normalize 関連のメソッド
    - AS::Mb::Unicode#normalize
    - AS::Mb::Unicode#decompose
    - AS::Mb::Unicode#compose
    - AS::Mb::Unicode#reorder_characters

    View Slide

  48. Ruby 本体は?

    View Slide

  49. https://docs.ruby-lang.org/ja/search/

    View Slide

  50. https://docs.ruby-lang.org/ja/search/query:unicode/query:normalize/

    View Slide

  51. あった!

    View Slide

  52. String#unicode_normalize
    [1] pry(main)> '株'.codepoints
    => [26666]
    [2] pry(main)> '㈱'.codepoints
    => [12849]
    [3] pry(main)> '㈱'.unicode_normalize(:nfc).codepoints
    => [12849]
    [4] pry(main)> '㈱'.unicode_normalize(:nfd).codepoints
    => [12849]
    [5] pry(main)> '㈱'.unicode_normalize(:nfkc).codepoints
    => [40, 26666, 41]
    [6] pry(main)> '㈱'.unicode_normalize(:nfkd).codepoints
    => [40, 26666, 41]

    View Slide

  53. https://github.com/rails/rails/pull/26743/files?diff=split

    View Slide

  54. https://github.com/rails/rails/pull/26743/files?diff=split

    View Slide

  55. https://github.com/rails/rails/pull/26743/files?diff=split

    View Slide

  56. Ruby 便利!

    View Slide

  57. - Normalize
    - Case mapping
    - Pack/unpack grapheme

    View Slide

  58. ‘A’ ‘a’

    View Slide

  59. ‘A’ ‘a’
    ‘Ä’ ‘ä’

    View Slide

  60. Case mapping 関連のメソッド
    - AS::Mb::Unicode#downcase
    - AS::Mb::Unicode#upcase
    - AS::Mb::Unicode#swapcase

    View Slide

  61. https://github.com/rails/rails/blob/v5.0.0.1/activesupport/lib/active_
    support/multibyte/unicode.rb#L303-L313

    View Slide

  62. https://github.com/rails/rails/blob/v5.0.0.1/activesupport/lib/active_
    support/multibyte/unicode.rb#L392-L402

    View Slide

  63. Ruby 本体は?

    View Slide

  64. http://rubykaigi.org/2016/presentations/duerst.html

    View Slide

  65. https://www.ruby-lang.org/en/news/2016/09/08/ruby-2-4-0-preview
    2-released/

    View Slide

  66. $ docker run -e LANG=C.UTF-8 --rm ruby:2.3 \
    ruby -e "p 'Ä'.downcase == 'ä'"
    false
    $ docker run -e LANG=C.UTF-8 --rm ruby:2.4 \
    ruby -e "p 'Ä'.downcase == 'ä'"
    true

    View Slide

  67. https://github.com/rails/rails/pull/26743/files?diff=split

    View Slide

  68. Ruby 便利!!

    View Slide

  69. https://www.sw.it.aoyama.ac.jp/2016/pub/RubyKaigi/

    View Slide

  70. https://bugs.ruby-lang.org/issues/10084

    View Slide

  71. - Normalize
    - Case mapping
    - Pack/unpack grapheme


    View Slide

  72. Grapheme
    とは

    View Slide

  73. Grapheme (書記素)
    ≒ 文字の単位



    View Slide

  74. ぎんざ

    View Slide

  75. [‘き’, ‘゛’, ‘ん’, ‘ざ’]

    View Slide

  76. 文字区切り
    [[‘き’], [‘゛’], [‘ん’], [‘ざ’]]
    書記素区切り
    [[‘き’, ’゛’], [‘ん’], [‘ざ’]]

    View Slide

  77. Pack/unpack grapheme
    関連のメソッド
    - AS::Mb::Unicode#pack_graphemes
    - AS::Mb::Unicode#unpack_graphemes

    View Slide

  78. https://github.com/rails/rails/blob/v5.0.0.1/activesupport/lib/active_
    support/multibyte/unicode.rb#L138-L140

    View Slide

  79. https://github.com/rails/rails/blob/v5.0.0.1/activesupport/lib/active_
    support/multibyte/unicode.rb#L80-L133

    View Slide

  80. Ruby 本体は?

    View Slide

  81. https://docs.ruby-lang.org/ja/search/

    View Slide

  82. https://docs.ruby-lang.org/ja/search/query:grapheme

    View Slide

  83. /\X/

    View Slide

  84. https://github.com/rails/rails/pull/26743/files

    View Slide

  85. https://github.com/rails/rails/pull/26743/files

    View Slide

  86. Ruby 本体の機能便利!!!

    View Slide

  87. と思いきや
    テストが通らない

    View Slide

  88. View Slide

  89. https://github.com/k-takata/Onigmo/issues/46

    View Slide

  90. https://bugs.ruby-lang.org/issues/12831

    View Slide

  91. https://bugs.ruby-lang.org/issues/12831

    View Slide

  92. 2.4 で入った

    View Slide

  93. https://github.com/rails/rails/pull/26743/files

    View Slide

  94. - Normalize
    - Case mapping
    - Pack/unpack grapheme



    View Slide

  95. View Slide

  96. https://github.com/rails/rails/pull/26743

    View Slide

  97. なぜマージできないか

    View Slide

  98. Rails 5 は
    Ruby 2.2.2 以降を
    サポート

    View Slide

  99. - Normalize
    - Ruby 2.2 から
    - Case mapping
    - Ruby 2.4 から
    - Pack/unpack grapheme
    - Ruby 2.0 から
    - ただし、Unicode のテストが
    通るのは 2.4 から

    View Slide

  100. 入るとしたら
    Ruby のバージョンが
    上がるとき
    ≒ Rails 6 ?

    View Slide

  101. Rails を待たなくても
    手元の開発では
    使える

    View Slide

  102. それ、
    Ruby 本体で
    できるかも

    View Slide

  103. まとめ
    - Rails 6 になると UnicodeDatabase が
    消せて、3x Rails に近づくかも
    - 多数の人の力により、gem でやっていた
    ことが Ruby 本体でできるようになって
    いっている

    View Slide

  104. Credits
    Background pattern from subtlepatterns.com
    Emoji artwork provided by Emoji One

    View Slide