Upgrade to Pro — share decks privately, control downloads, hide ads and more …

the world of characters

orisano
September 13, 2018
1.2k

the world of characters

orisano

September 13, 2018
Tweet

Transcript

 1. 1จࣈͷੈք
  @orisano

  View Slide

 2. Έͳ͞Μ
  จࣈΛ਺͑ΒΕ·͢ΑͶʁ

  View Slide

 3. a

  View Slide

 4. a => 1

  View Slide

 5. ͋

  View Slide

 6. ͋ => 1

  View Slide


 7. View Slide

 8. 佛 => 1

  View Slide

 9. View Slide

 10. => 1

  View Slide

 11. View Slide

 12. => 1

  View Slide

 13. Z͑ͫ̓ͪ̂ͫ̽ ̴̙̤̞͉͚̯̞̠͍A̴̵̜̰͔
  ͫ͗
  ͢
  L̠ͨͧͩ͘
  G̴̻͈͍͔̹
  ̑͗̎̅͛
  ́
  Ǫ̵̹̻̝̳
  ͂̌ ̌͘! ͖̬̰̙̗
  ̿̋ ͥ
  ͥ̂ͣ̐́́͜͞

  View Slide

 14. Z͑ͫ̓ͪ̂ͫ̽ ̴̙̤̞͉͚̯̞̠͍A̴̵̜̰͔
  ͫ͗
  ͢
  L̠ͨͧͩ͘
  G̴̻͈͍͔̹
  ̑͗̎̅͛
  ́
  Ǫ̵̹̻̝̳
  ͂̌ ̌͘! ͖̬̰̙̗
  ̿̋ ͥ
  ͥ̂ͣ̐́́͜͞ => 6

  View Slide

 15. Έͳ͞Μ
  όΠτ਺Λ਺͑ΒΕ·͔͢ʁ
  (UTF-8)

  View Slide

 16. a

  View Slide

 17. a => 1

  View Slide

 18. ͋

  View Slide

 19. ͋ => 3

  View Slide


 20. View Slide

 21. 佛 => 4

  View Slide

 22. View Slide

 23. => 4

  View Slide

 24. View Slide

 25. => 18

  View Slide

 26. Z͑ͫ̓ͪ̂ͫ̽ ̴̙̤̞͉͚̯̞̠͍A̴̵̜̰͔
  ͫ͗
  ͢
  L̠ͨͧͩ͘
  G̴̻͈͍͔̹
  ̑͗̎̅͛
  ́
  Ǫ̵̹̻̝̳
  ͂̌ ̌͘! ͖̬̰̙̗
  ̿̋ ͥ
  ͥ̂ͣ̐́́͜͞

  View Slide

 27. Z͑ͫ̓ͪ̂ͫ̽ ̴̙̤̞͉͚̯̞̠͍A̴̵̜̰͔
  ͫ͗
  ͢
  L̠ͨͧͩ͘
  G̴̻͈͍͔̹
  ̑͗̎̅͛
  ́
  Ǫ̵̹̻̝̳
  ͂̌ ̌͘! ͖̬̰̙̗
  ̿̋ ͥ
  ͥ̂ͣ̐́́͜͞ => 143

  View Slide

 28. ͋ͳ͕ͨࢥ͏1จࣈ͸
  Ͳ͏਺͑Δ΂͖͔ʁ

  View Slide

 29. byte਺Ͱ͸਺͑ΒΕͳ͍

  View Slide

 30. Unicode͸จࣈू߹
  จࣈͱ਺஋͕ରԠ͢Δ

  View Slide

 31. ͋ => 3042

  View Slide

 32. => 1F914

  View Slide

 33. ͜ͷ਺஋ͷ͜ͱΛ
  ίʔυϙΠϯτ
  ͱݺͿ

  View Slide

 34. ͜ͷίʔυϙΠϯτΛ
  byteྻͰදݱ͢Δํ๏Λ
  ΤϯίʔσΟϯάͱ͍͏

  View Slide

 35. UTF-8ͱ͔UTF-16ͱ͔͸
  ΤϯίʔσΟϯάͷҰछ

  View Slide

 36. ͱΓ͋͑ͣ
  ίʔυϙΠϯτΛ਺͑Ε͹
  ղܾʁ

  View Slide

 37. ͍͍͑

  View Slide

 38. =>
  1F468 + 200D + 1F469 +
  200D + 1F466

  View Slide

 39. ࣮͸ෳ਺ͷίʔυϙΠϯτͰ
  ҰͭͷจࣈʹͳͬͨΓ͢Δ

  View Slide

 40. ਓ͕ؒೝ͍ࣝͯ͠Δ̍จࣈ͸
  ॻهૉ(Grapheme cluster)
  ͱݺ͹Ε͍ͯΔ

  View Slide

 41. Ͳ͏΍Ε͹
  ίʔυϙΠϯτͷྻ͔Β
  ॻهૉΛऔΓग़ͤΔ͔

  View Slide

 42. ίʔυϙΠϯτ͕ؒ
  ॻهૉڥքʹͳΔ͔Ͳ͏͔ͷ
  ݫີͳϧʔϧ͕͋Δ

  View Slide

 43. UAX #29
  Unicode Text Segmentation

  View Slide

 44. View Slide

 45. ͜ΕΛJSͰ࣮૷ͯ͠·ͨ͠
  github.com/orisano/graphemesplit

  View Slide

 46. ৄ͘͠͸ UAX #29 Λݟͯ
  http://unicode.org/reports/tr29/

  View Slide

 47. ݟ஌Β͵ਓʹʓจࣈͱ
  ݴΘΕͨͱ͖ʹ͸
  ͪΌΜͱ֬ೝ͠Α͏ʂ

  View Slide

 48. 1 byte?
  1 codepoint?
  1 grapheme cluster?

  View Slide