Remove AS::Mb::Unicode::UnicodeDatabase

Remove AS::Mb::Unicode::UnicodeDatabase

ぎんざRuby会議01 発表資料

https://ginzarb.github.io/kaigi01/

Fb1b9f3d7332a7a7e262b70013b5f7dd?s=128

Fumiaki MATSUSHIMA

August 05, 2017
Tweet

Transcript

  1. @mtsmfm ActiveSupport::Multibyte:: Unicode::UnicodeDatabase を消したかった

  2. Fumiaki MATSUSHIMA GitHub, Twitter @mtsmfm Web Developer

  3. https://www.quipper.com/

  4. https://ninirb.github.io

  5. https://www.meetup.com/ja-JP/GraphQL-Tokyo/

  6. http://rubykaigi.org/2017/speakers

  7. http://contributors.rubyonrails.org/

  8. Rails で 一番大きいファイル 知ってますか?

  9. $ find vendor/bundle/gems/acti* -type f -exec du -h -a {}

    + | sort -h -r | head -n 10 1.1M vendor/bundle/gems/activesupport-5.1.2/lib/active_support/values/unicode_tables.dat 104K vendor/bundle/gems/actionview-5.1.2/lib/action_view/helpers/form_helper.rb 100K vendor/bundle/gems/activerecord-5.1.2/lib/active_record/associations.rb 76K vendor/bundle/gems/actionpack-5.1.2/lib/action_dispatch/routing/mapper.rb 60K vendor/bundle/gems/actionview-5.1.2/lib/action_view/helpers/date_helper.rb 52K vendor/bundle/gems/activerecord-5.1.2/lib/active_record/connection_adapters/abstract/schema_statements.rb 44K vendor/bundle/gems/activerecord-5.1.2/lib/active_record/migration.rb 44K vendor/bundle/gems/actionview-5.1.2/lib/action_view/helpers/form_tag_helper.rb 44K vendor/bundle/gems/actionview-5.1.2/lib/action_view/helpers/form_options_helper.rb 40K vendor/bundle/gems/activerecord-5.1.2/lib/active_record/relation/query_methods.rb
  10. $ find vendor/bundle/gems/acti* -type f -exec du -h -a {}

    + | sort -h -r | head -n 10 1.1M vendor/bundle/gems/activesupport-5.1.2/lib/active_support/values/unicode_tables.dat 104K vendor/bundle/gems/actionview-5.1.2/lib/action_view/helpers/form_helper.rb 100K vendor/bundle/gems/activerecord-5.1.2/lib/active_record/associations.rb 76K vendor/bundle/gems/actionpack-5.1.2/lib/action_dispatch/routing/mapper.rb 60K vendor/bundle/gems/actionview-5.1.2/lib/action_view/helpers/date_helper.rb 52K vendor/bundle/gems/activerecord-5.1.2/lib/active_record/connection_adapters/abstract/schema_statements.rb 44K vendor/bundle/gems/activerecord-5.1.2/lib/active_record/migration.rb 44K vendor/bundle/gems/actionview-5.1.2/lib/action_view/helpers/form_tag_helper.rb 44K vendor/bundle/gems/actionview-5.1.2/lib/action_view/helpers/form_options_helper.rb 40K vendor/bundle/gems/activerecord-5.1.2/lib/active_record/relation/query_methods.rb 1.1M!
  11. active_support/values/unicode_tables.dat

  12. https://github.com/rails/rails/blob/16f2b2044eaaa54b7bc205ef9af1689a152b2fdf/actives upport/lib/active_support/multibyte/unicode.rb

  13. Rails で 一番大きいファイル ↓ ActiveSupport::Multibyte:: Unicode::UnicodeDatabase の dat ファイル

  14. https://github.com/rails/rails/pull/26743

  15. @mtsmfm ActiveSupport::Multibyte:: Unicode::UnicodeDatabase を消したかった

  16. http://agile.esm.co.jp/news/2016-04-08-rails-study-session.html

  17. 社内 Rails 勉強会 ↓ OSS パッチ会

  18. https://speakerdeck.com/a_matsuda/3x-rails

  19. https://speakerdeck.com/a_matsuda/3x-rails?slide=156

  20. https://speakerdeck.com/a_matsuda/3x-rails?slide=156

  21. https://speakerdeck.com/a_matsuda/3x-rails?slide=156

  22. None
  23. None
  24. AS::Mb::Unicode そもそも何ができる?

  25. None
  26. PR 出したタイミングの Rails v5.0.0.1 の コードベースで話をします (今も大差ないけれど) 当時は Ruby 2.4

    が出る ちょっと前でした
  27. - Normalize - Case mapping - Pack/unpack grapheme - Tidy

    bytes
  28. - Normalize - Case mapping - Pack/unpack grapheme - Tidy

    bytes AS::Mb::Unicode::UnicodeDatabase 使ってない
  29. - Normalize - Case mapping - Pack/unpack grapheme

  30. Unicode Normalize とは

  31. Decompose ‘が’ [‘か’, ‘゛’] Compose [‘か’, ‘゛’] ‘が’

  32. Normalize 関連のメソッド - AS::Mb::Unicode#normalize - AS::Mb::Unicode#decompose - AS::Mb::Unicode#compose - AS::Mb::Unicode#reorder_characters

  33. Unicode 正規化 - NFD - NFC - NFKD - NFKC

    Normalization Form Decopose Compose
  34. Unicode 正規化 - NFD - NFC - NFKD - NFKC

    Normalization Form Decopose Compose
  35. Unicode 正規化 - NFD - NFC - NFKD - NFKC

    Normalization Form Decopose Compose
  36. “In NFKC and NFKD, a K is used to stand

    for compatibility to avoid confusion with the C standing for composition.” http://unicode.org/reports/tr15/
  37. Unicode 正規化 - NFD - NFC - NFKD - NFKC

    Normalization Form Decopose Compose K(C)ompatibility (互換等価)
  38. Unicode 正規化の等価性 - 正準等価 (Kじゃない方) - 戻れる - 互換等価 (Kの方)

    - 緩め。戻れない
  39. 正準等価 ‘㈱’ != [‘(’ , ‘株’, ‘)’] 互換等価 ‘㈱’ ==

    [‘(’, ‘株’, ‘)’]
  40. Normalize 関連のメソッド - AS::Mb::Unicode#normalize - AS::Mb::Unicode#decompose - AS::Mb::Unicode#compose - AS::Mb::Unicode#reorder_characters

  41. https://github.com/rails/rails/blob/v5.0.0.1/activesupport/lib/active_ support/multibyte/unicode.rb#L285-L301

  42. https://github.com/rails/rails/blob/v5.0.0.1/activesupport/lib/active_ support/multibyte/unicode.rb#L159-L177

  43. https://github.com/rails/rails/blob/v5.0.0.1/activesupport/lib/active_ support/multibyte/unicode.rb#L180-L236

  44. https://github.com/rails/rails/blob/v5.0.0.1/activesupport/lib/active_ support/multibyte/unicode.rb#L143-L136

  45. Normalize 関連のメソッド - AS::Mb::Unicode#normalize - AS::Mb::Unicode#decompose - AS::Mb::Unicode#compose - AS::Mb::Unicode#reorder_characters

    #normalize で使うための ヘルパメソッド (なぜ public なのか...)
  46. Normalize 関連のメソッド - AS::Mb::Unicode#normalize - AS::Mb::Unicode#decompose - AS::Mb::Unicode#compose - AS::Mb::Unicode#reorder_characters

  47. Ruby 本体は?

  48. https://docs.ruby-lang.org/ja/search/

  49. https://docs.ruby-lang.org/ja/search/query:unicode/query:normalize/

  50. あった!

  51. String#unicode_normalize [1] pry(main)> '株'.codepoints => [26666] [2] pry(main)> '㈱'.codepoints =>

    [12849] [3] pry(main)> '㈱'.unicode_normalize(:nfc).codepoints => [12849] [4] pry(main)> '㈱'.unicode_normalize(:nfd).codepoints => [12849] [5] pry(main)> '㈱'.unicode_normalize(:nfkc).codepoints => [40, 26666, 41] [6] pry(main)> '㈱'.unicode_normalize(:nfkd).codepoints => [40, 26666, 41]
  52. https://github.com/rails/rails/pull/26743/files?diff=split

  53. https://github.com/rails/rails/pull/26743/files?diff=split

  54. https://github.com/rails/rails/pull/26743/files?diff=split

  55. Ruby 便利!

  56. - Normalize - Case mapping - Pack/unpack grapheme ✔

  57. ‘A’ ‘a’

  58. ‘A’ ‘a’ ‘Ä’ ‘ä’

  59. Case mapping 関連のメソッド - AS::Mb::Unicode#downcase - AS::Mb::Unicode#upcase - AS::Mb::Unicode#swapcase

  60. https://github.com/rails/rails/blob/v5.0.0.1/activesupport/lib/active_ support/multibyte/unicode.rb#L303-L313

  61. https://github.com/rails/rails/blob/v5.0.0.1/activesupport/lib/active_ support/multibyte/unicode.rb#L392-L402

  62. Ruby 本体は?

  63. http://rubykaigi.org/2016/presentations/duerst.html

  64. https://www.ruby-lang.org/en/news/2016/09/08/ruby-2-4-0-preview 2-released/

  65. $ docker run -e LANG=C.UTF-8 --rm ruby:2.3 \ ruby -e

    "p 'Ä'.downcase == 'ä'" false $ docker run -e LANG=C.UTF-8 --rm ruby:2.4 \ ruby -e "p 'Ä'.downcase == 'ä'" true
  66. https://github.com/rails/rails/pull/26743/files?diff=split

  67. Ruby 便利!!

  68. https://www.sw.it.aoyama.ac.jp/2016/pub/RubyKaigi/

  69. https://bugs.ruby-lang.org/issues/10084

  70. - Normalize - Case mapping - Pack/unpack grapheme ✔ ✔

  71. Grapheme とは

  72. Grapheme (書記素) ≒ 文字の単位 あ が ゛

  73. ぎんざ

  74. [‘き’, ‘゛’, ‘ん’, ‘ざ’]

  75. 文字区切り [[‘き’], [‘゛’], [‘ん’], [‘ざ’]] 書記素区切り [[‘き’, ’゛’], [‘ん’], [‘ざ’]]

  76. Pack/unpack grapheme 関連のメソッド - AS::Mb::Unicode#pack_graphemes - AS::Mb::Unicode#unpack_graphemes

  77. https://github.com/rails/rails/blob/v5.0.0.1/activesupport/lib/active_ support/multibyte/unicode.rb#L138-L140

  78. https://github.com/rails/rails/blob/v5.0.0.1/activesupport/lib/active_ support/multibyte/unicode.rb#L80-L133

  79. Ruby 本体は?

  80. https://docs.ruby-lang.org/ja/search/

  81. https://docs.ruby-lang.org/ja/search/query:grapheme

  82. /\X/

  83. https://github.com/rails/rails/pull/26743/files

  84. https://github.com/rails/rails/pull/26743/files

  85. Ruby 本体の機能便利!!!

  86. と思いきや テストが通らない

  87. None
  88. https://github.com/k-takata/Onigmo/issues/46

  89. https://bugs.ruby-lang.org/issues/12831

  90. https://bugs.ruby-lang.org/issues/12831

  91. 2.4 で入った

  92. https://github.com/rails/rails/pull/26743/files

  93. - Normalize - Case mapping - Pack/unpack grapheme ✔ ✔

  94. None
  95. https://github.com/rails/rails/pull/26743

  96. なぜマージできないか

  97. Rails 5 は Ruby 2.2.2 以降を サポート

  98. - Normalize - Ruby 2.2 から - Case mapping -

    Ruby 2.4 から - Pack/unpack grapheme - Ruby 2.0 から - ただし、Unicode のテストが 通るのは 2.4 から
  99. 入るとしたら Ruby のバージョンが 上がるとき ≒ Rails 6 ?

  100. Rails を待たなくても 手元の開発では 使える

  101. それ、 Ruby 本体で できるかも

  102. まとめ - Rails 6 になると UnicodeDatabase が 消せて、3x Rails に近づくかも

    - 多数の人の力により、gem でやっていた ことが Ruby 本体でできるようになって いっている
  103. Credits Background pattern from subtlepatterns.com Emoji artwork provided by Emoji

    One