Slide 1

Slide 1 text

@mtsmfm ActiveSupport::Multibyte:: Unicode::UnicodeDatabase を消したかった

Slide 2

Slide 2 text

Fumiaki MATSUSHIMA GitHub, Twitter @mtsmfm Web Developer

Slide 3

Slide 3 text

https://www.quipper.com/

Slide 4

Slide 4 text

https://ninirb.github.io

Slide 5

Slide 5 text

https://www.meetup.com/ja-JP/GraphQL-Tokyo/

Slide 6

Slide 6 text

http://rubykaigi.org/2017/speakers

Slide 7

Slide 7 text

http://contributors.rubyonrails.org/

Slide 8

Slide 8 text

Rails で 一番大きいファイル 知ってますか?

Slide 9

Slide 9 text

$ find vendor/bundle/gems/acti* -type f -exec du -h -a {} + | sort -h -r | head -n 10 1.1M vendor/bundle/gems/activesupport-5.1.2/lib/active_support/values/unicode_tables.dat 104K vendor/bundle/gems/actionview-5.1.2/lib/action_view/helpers/form_helper.rb 100K vendor/bundle/gems/activerecord-5.1.2/lib/active_record/associations.rb 76K vendor/bundle/gems/actionpack-5.1.2/lib/action_dispatch/routing/mapper.rb 60K vendor/bundle/gems/actionview-5.1.2/lib/action_view/helpers/date_helper.rb 52K vendor/bundle/gems/activerecord-5.1.2/lib/active_record/connection_adapters/abstract/schema_statements.rb 44K vendor/bundle/gems/activerecord-5.1.2/lib/active_record/migration.rb 44K vendor/bundle/gems/actionview-5.1.2/lib/action_view/helpers/form_tag_helper.rb 44K vendor/bundle/gems/actionview-5.1.2/lib/action_view/helpers/form_options_helper.rb 40K vendor/bundle/gems/activerecord-5.1.2/lib/active_record/relation/query_methods.rb

Slide 10

Slide 10 text

$ find vendor/bundle/gems/acti* -type f -exec du -h -a {} + | sort -h -r | head -n 10 1.1M vendor/bundle/gems/activesupport-5.1.2/lib/active_support/values/unicode_tables.dat 104K vendor/bundle/gems/actionview-5.1.2/lib/action_view/helpers/form_helper.rb 100K vendor/bundle/gems/activerecord-5.1.2/lib/active_record/associations.rb 76K vendor/bundle/gems/actionpack-5.1.2/lib/action_dispatch/routing/mapper.rb 60K vendor/bundle/gems/actionview-5.1.2/lib/action_view/helpers/date_helper.rb 52K vendor/bundle/gems/activerecord-5.1.2/lib/active_record/connection_adapters/abstract/schema_statements.rb 44K vendor/bundle/gems/activerecord-5.1.2/lib/active_record/migration.rb 44K vendor/bundle/gems/actionview-5.1.2/lib/action_view/helpers/form_tag_helper.rb 44K vendor/bundle/gems/actionview-5.1.2/lib/action_view/helpers/form_options_helper.rb 40K vendor/bundle/gems/activerecord-5.1.2/lib/active_record/relation/query_methods.rb 1.1M!

Slide 11

Slide 11 text

active_support/values/unicode_tables.dat

Slide 12

Slide 12 text

https://github.com/rails/rails/blob/16f2b2044eaaa54b7bc205ef9af1689a152b2fdf/actives upport/lib/active_support/multibyte/unicode.rb

Slide 13

Slide 13 text

Rails で 一番大きいファイル ↓ ActiveSupport::Multibyte:: Unicode::UnicodeDatabase の dat ファイル

Slide 14

Slide 14 text

https://github.com/rails/rails/pull/26743

Slide 15

Slide 15 text

@mtsmfm ActiveSupport::Multibyte:: Unicode::UnicodeDatabase を消したかった

Slide 16

Slide 16 text

http://agile.esm.co.jp/news/2016-04-08-rails-study-session.html

Slide 17

Slide 17 text

社内 Rails 勉強会 ↓ OSS パッチ会

Slide 18

Slide 18 text

https://speakerdeck.com/a_matsuda/3x-rails

Slide 19

Slide 19 text

https://speakerdeck.com/a_matsuda/3x-rails?slide=156

Slide 20

Slide 20 text

https://speakerdeck.com/a_matsuda/3x-rails?slide=156

Slide 21

Slide 21 text

https://speakerdeck.com/a_matsuda/3x-rails?slide=156

Slide 22

Slide 22 text

No content

Slide 23

Slide 23 text

No content

Slide 24

Slide 24 text

AS::Mb::Unicode そもそも何ができる?

Slide 25

Slide 25 text

No content

Slide 26

Slide 26 text

PR 出したタイミングの Rails v5.0.0.1 の コードベースで話をします (今も大差ないけれど) 当時は Ruby 2.4 が出る ちょっと前でした

Slide 27

Slide 27 text

- Normalize - Case mapping - Pack/unpack grapheme - Tidy bytes

Slide 28

Slide 28 text

- Normalize - Case mapping - Pack/unpack grapheme - Tidy bytes AS::Mb::Unicode::UnicodeDatabase 使ってない

Slide 29

Slide 29 text

- Normalize - Case mapping - Pack/unpack grapheme

Slide 30

Slide 30 text

Unicode Normalize とは

Slide 31

Slide 31 text

Decompose ‘が’ [‘か’, ‘゛’] Compose [‘か’, ‘゛’] ‘が’

Slide 32

Slide 32 text

Normalize 関連のメソッド - AS::Mb::Unicode#normalize - AS::Mb::Unicode#decompose - AS::Mb::Unicode#compose - AS::Mb::Unicode#reorder_characters

Slide 33

Slide 33 text

Unicode 正規化 - NFD - NFC - NFKD - NFKC Normalization Form Decopose Compose

Slide 34

Slide 34 text

Unicode 正規化 - NFD - NFC - NFKD - NFKC Normalization Form Decopose Compose

Slide 35

Slide 35 text

Unicode 正規化 - NFD - NFC - NFKD - NFKC Normalization Form Decopose Compose

Slide 36

Slide 36 text

“In NFKC and NFKD, a K is used to stand for compatibility to avoid confusion with the C standing for composition.” http://unicode.org/reports/tr15/

Slide 37

Slide 37 text

Unicode 正規化 - NFD - NFC - NFKD - NFKC Normalization Form Decopose Compose K(C)ompatibility (互換等価)

Slide 38

Slide 38 text

Unicode 正規化の等価性 - 正準等価 (Kじゃない方) - 戻れる - 互換等価 (Kの方) - 緩め。戻れない

Slide 39

Slide 39 text

Slide 40

Slide 40 text

正準等価 ‘㈱’ != [‘(’ , ‘株’, ‘)’] 互換等価 ‘㈱’ == [‘(’, ‘株’, ‘)’]

Slide 41

Slide 41 text

Normalize 関連のメソッド - AS::Mb::Unicode#normalize - AS::Mb::Unicode#decompose - AS::Mb::Unicode#compose - AS::Mb::Unicode#reorder_characters

Slide 42

Slide 42 text

https://github.com/rails/rails/blob/v5.0.0.1/activesupport/lib/active_ support/multibyte/unicode.rb#L285-L301

Slide 43

Slide 43 text

https://github.com/rails/rails/blob/v5.0.0.1/activesupport/lib/active_ support/multibyte/unicode.rb#L159-L177

Slide 44

Slide 44 text

https://github.com/rails/rails/blob/v5.0.0.1/activesupport/lib/active_ support/multibyte/unicode.rb#L180-L236

Slide 45

Slide 45 text

https://github.com/rails/rails/blob/v5.0.0.1/activesupport/lib/active_ support/multibyte/unicode.rb#L143-L136

Slide 46

Slide 46 text

Normalize 関連のメソッド - AS::Mb::Unicode#normalize - AS::Mb::Unicode#decompose - AS::Mb::Unicode#compose - AS::Mb::Unicode#reorder_characters #normalize で使うための ヘルパメソッド (なぜ public なのか...)

Slide 47

Slide 47 text

Normalize 関連のメソッド - AS::Mb::Unicode#normalize - AS::Mb::Unicode#decompose - AS::Mb::Unicode#compose - AS::Mb::Unicode#reorder_characters

Slide 48

Slide 48 text

Ruby 本体は?

Slide 49

Slide 49 text

https://docs.ruby-lang.org/ja/search/

Slide 50

Slide 50 text

https://docs.ruby-lang.org/ja/search/query:unicode/query:normalize/

Slide 51

Slide 51 text

あった!

Slide 52

Slide 52 text

String#unicode_normalize [1] pry(main)> '株'.codepoints => [26666] [2] pry(main)> '㈱'.codepoints => [12849] [3] pry(main)> '㈱'.unicode_normalize(:nfc).codepoints => [12849] [4] pry(main)> '㈱'.unicode_normalize(:nfd).codepoints => [12849] [5] pry(main)> '㈱'.unicode_normalize(:nfkc).codepoints => [40, 26666, 41] [6] pry(main)> '㈱'.unicode_normalize(:nfkd).codepoints => [40, 26666, 41]

Slide 53

Slide 53 text

https://github.com/rails/rails/pull/26743/files?diff=split

Slide 54

Slide 54 text

https://github.com/rails/rails/pull/26743/files?diff=split

Slide 55

Slide 55 text

https://github.com/rails/rails/pull/26743/files?diff=split

Slide 56

Slide 56 text

Ruby 便利!

Slide 57

Slide 57 text

- Normalize - Case mapping - Pack/unpack grapheme ✔

Slide 58

Slide 58 text

‘A’ ‘a’

Slide 59

Slide 59 text

‘A’ ‘a’ ‘Ä’ ‘ä’

Slide 60

Slide 60 text

Case mapping 関連のメソッド - AS::Mb::Unicode#downcase - AS::Mb::Unicode#upcase - AS::Mb::Unicode#swapcase

Slide 61

Slide 61 text

https://github.com/rails/rails/blob/v5.0.0.1/activesupport/lib/active_ support/multibyte/unicode.rb#L303-L313

Slide 62

Slide 62 text

https://github.com/rails/rails/blob/v5.0.0.1/activesupport/lib/active_ support/multibyte/unicode.rb#L392-L402

Slide 63

Slide 63 text

Ruby 本体は?

Slide 64

Slide 64 text

http://rubykaigi.org/2016/presentations/duerst.html

Slide 65

Slide 65 text

https://www.ruby-lang.org/en/news/2016/09/08/ruby-2-4-0-preview 2-released/

Slide 66

Slide 66 text

$ docker run -e LANG=C.UTF-8 --rm ruby:2.3 \ ruby -e "p 'Ä'.downcase == 'ä'" false $ docker run -e LANG=C.UTF-8 --rm ruby:2.4 \ ruby -e "p 'Ä'.downcase == 'ä'" true

Slide 67

Slide 67 text

https://github.com/rails/rails/pull/26743/files?diff=split

Slide 68

Slide 68 text

Ruby 便利!!

Slide 69

Slide 69 text

https://www.sw.it.aoyama.ac.jp/2016/pub/RubyKaigi/

Slide 70

Slide 70 text

https://bugs.ruby-lang.org/issues/10084

Slide 71

Slide 71 text

- Normalize - Case mapping - Pack/unpack grapheme ✔ ✔

Slide 72

Slide 72 text

Grapheme とは

Slide 73

Slide 73 text

Grapheme (書記素) ≒ 文字の単位 あ が ゛

Slide 74

Slide 74 text

ぎんざ

Slide 75

Slide 75 text

[‘き’, ‘゛’, ‘ん’, ‘ざ’]

Slide 76

Slide 76 text

文字区切り [[‘き’], [‘゛’], [‘ん’], [‘ざ’]] 書記素区切り [[‘き’, ’゛’], [‘ん’], [‘ざ’]]

Slide 77

Slide 77 text

Pack/unpack grapheme 関連のメソッド - AS::Mb::Unicode#pack_graphemes - AS::Mb::Unicode#unpack_graphemes

Slide 78

Slide 78 text

https://github.com/rails/rails/blob/v5.0.0.1/activesupport/lib/active_ support/multibyte/unicode.rb#L138-L140

Slide 79

Slide 79 text

https://github.com/rails/rails/blob/v5.0.0.1/activesupport/lib/active_ support/multibyte/unicode.rb#L80-L133

Slide 80

Slide 80 text

Ruby 本体は?

Slide 81

Slide 81 text

https://docs.ruby-lang.org/ja/search/

Slide 82

Slide 82 text

https://docs.ruby-lang.org/ja/search/query:grapheme

Slide 83

Slide 83 text

/\X/

Slide 84

Slide 84 text

https://github.com/rails/rails/pull/26743/files

Slide 85

Slide 85 text

https://github.com/rails/rails/pull/26743/files

Slide 86

Slide 86 text

Ruby 本体の機能便利!!!

Slide 87

Slide 87 text

と思いきや テストが通らない

Slide 88

Slide 88 text

No content

Slide 89

Slide 89 text

https://github.com/k-takata/Onigmo/issues/46

Slide 90

Slide 90 text

https://bugs.ruby-lang.org/issues/12831

Slide 91

Slide 91 text

https://bugs.ruby-lang.org/issues/12831

Slide 92

Slide 92 text

2.4 で入った

Slide 93

Slide 93 text

https://github.com/rails/rails/pull/26743/files

Slide 94

Slide 94 text

- Normalize - Case mapping - Pack/unpack grapheme ✔ ✔ ✔

Slide 95

Slide 95 text

No content

Slide 96

Slide 96 text

https://github.com/rails/rails/pull/26743

Slide 97

Slide 97 text

なぜマージできないか

Slide 98

Slide 98 text

Rails 5 は Ruby 2.2.2 以降を サポート

Slide 99

Slide 99 text

- Normalize - Ruby 2.2 から - Case mapping - Ruby 2.4 から - Pack/unpack grapheme - Ruby 2.0 から - ただし、Unicode のテストが 通るのは 2.4 から

Slide 100

Slide 100 text

入るとしたら Ruby のバージョンが 上がるとき ≒ Rails 6 ?

Slide 101

Slide 101 text

Rails を待たなくても 手元の開発では 使える

Slide 102

Slide 102 text

それ、 Ruby 本体で できるかも

Slide 103

Slide 103 text

まとめ - Rails 6 になると UnicodeDatabase が 消せて、3x Rails に近づくかも - 多数の人の力により、gem でやっていた ことが Ruby 本体でできるようになって いっている

Slide 104

Slide 104 text

Credits Background pattern from subtlepatterns.com Emoji artwork provided by Emoji One