Characters for Humans

05bab33cfd102c84f045838aa4e05bec?s=47 Nova Patch
October 18, 2017

Characters for Humans

A “character” can mean different things to different people, but the largest disparity is between applications and the humans who use them. Programmers aren’t to blame, as our programming languages, libraries, and databases provide little or no support for understanding user-perceived characters. Many systems disagree on the basic units of characters, some use code points, others use code units, and others still operate on individual bytes by default. This frequently leads to products with a poor experience in some users’ languages, especially written languages that use grapheme clusters, sequences of code points that compose a single user-perceived character. With the rise in global emoji usage and the rapid evolution of standard emoji sequences, this problem is increasingly experienced by users worldwide, regardless of their language.

This presentation covers:
• Extended grapheme clusters and emoji sequences
• Programming with these user-perceived characters
• Data input, parsing, analysis, formatting, and output
• Setting product requirements for character support
• Examples from Shutterstock’s platforms for content editing and collaboration

Links to all referenced projects and standards:
http://patch.codes/talks/characters-for-humans/

Presented at:
• 2017-10-18: Internationalization & Unicode Conference 41 (IUC41), Santa Clara, CA
• 2017-06-21: The Perl Conference (YAPC::NA), Washington, DC

05bab33cfd102c84f045838aa4e05bec?s=128

Nova Patch

October 18, 2017
Tweet

Transcript

  1. Characters for Humans Nova Patch @novapatch Shutterstock

  2. Characters for Humans Nova Patch @novapatch Shutterstock The \X Files

  3. None
  4. cs da de en es fi. .fr. hu it ja

    ko nb nl. .pl. .pt ru sv th tr zh.
  5. None
  6. None
  7. None
  8. None
  9. \p{Egyp}

  10. None
  11. \p{Maya}

  12. None
  13. \p{Symbol}

  14. None
  15. z̢̲̲͓̥̈̄ͬ͆ͥͅ a̩ͧ̃͌̌̋ ͙̼ ̼ ̬ ̹ l̼̀ͯ̎ ̣̭̯ ̬ ͔̫g̅

    ̀͑ͨͦͬo̜̎̑͛͞ ̥ ̪ ̤̰ ̯ ̻
  16. z̢̲̲͓̥̈̄ͬ͆ͥͅ a̩ͧ̃͌̌̋ ͙̼ ̼ ̬ ̹ l̼̀ͯ̎ ̣̭̯ ̬ ͔̫g̅

    ̀͑ͨͦͬo̜̎̑͛͞ ̥ ̪ ̤̰ ̯ ̻ H̏̚ ̷͎̱̺̔̇ͯͫ ̭͚̦E̅ ͤ́ ̚ ̢͈͇̙̍̑ ͈̥ C͎͔̪ͩͬ ͖̭ͅỌ͈͙͉̗̬ͧ ͉M̜ͦ̔ ̠ ̫E͏̖͎͕̼ ̼ ̝ ͓Ș͓͈̻̄̆ͫͅ ͈͔
  17. None
  18. \p{Thai}

  19. None
  20. \p{Emoji}

  21. n ̈

  22. n ̈ 0067 ( n ) LATIN SMALL LETTER N

    0308 ( ◌̈ ) COMBINING DIAERESIS
  23. 각 1100 ( ᄀ ) HANGUL CHOSEONG KIYEOK 1161 (ᅟᅡ

    ) HANGUL JUNGSEONG A 11A8 (ᅟᅠᆨ ) HANGUL JONGSEONG KIYEOK
  24. กํา 0E01 ( ก ) THAI CHARACTER KO KAI 0E33

    ( ◌ำ ) THAI CHARACTER SARA AM
  25. நி 0BA8 ( ந ) TAMIL LETTER NA 0BBF (

    ◌ி ) TAMIL VOWEL SIGN I
  26. ष 0937 ( ष ) DEVANAGARI LETTER SSA 093F (

    ◌ि ) DEVANAGARI VOWEL SIGN I
  27. “Unicode grapheme clusters are atomic units”

  28. \X extended grapheme cluster

  29. None
  30. Unicode Technical Standard #51 Unicode Emoji

  31. \X Unicode 9.0 extended grapheme cluster

  32. substr

  33. substr index

  34. substr index rindex

  35. substr index rindex chop

  36. substr index rindex chop reverse

  37. substr index rindex chop reverse

  38. substr index rindex chop reverse split //

  39. substr index rindex chop reverse split // length

  40. substr index rindex chop reverse split // length uc, lc,

    &c.
  41. \p Unicode character property

  42. \p{Thai} Script = Thai

  43. \p{Thai} Script_Extension = Thai

  44. (?=\p{Thai}) positive lookahead

  45. (?=\p{Thai}) \X positive lookahead

  46. The Future

  47. The Future extended grapheme clusters with derived character properties

  48. \X{…}

  49. \X{Thai}

  50. \X{L}

  51. \X{Emoji}

  52. Perl 6

  53. Perl 6 $ perl6 -e 'say " ".uniname'

  54. Perl 6 $ perl6 -e 'say " ".uniname' JACK-O-LANTERN

  55. Perl 6 $ perl6 -e 'say " ".uniprop("Emoji")'

  56. Perl 6 $ perl6 -e 'say " ".uniprop("Emoji")' True

  57. Perl 6 $ perl6 -e 'say "X X" ~~ /<:Emoji>/'

  58. Perl 6 $ perl6 -e 'say "X X" ~~ /<:Emoji>/'

    「 」
  59. Perl 6 "กํา"

  60. Perl 6 "กํา".chars

  61. Perl 6 "กํา".chars ?

  62. Perl 6 "กํา".chars 1

  63. Perl 6 “กํา”.chars 1

  64. Perl 6 “\r\n”.chars ?

  65. Perl 6 “\r\n”.chars 1

  66. Perl 6 < ❤ >

  67. Perl 6 < ❤ >.join

  68. Perl 6 < ❤ >.join.chars ?

  69. Perl 6 < ❤ >.join.chars 3

  70. Perl 6 < ❤ >.join(“\c[ZWJ]”).chars

  71. Perl 6 < ❤ >.join(“\c[ZWJ]”).chars ?

  72. Perl 6 < ❤ >.join(“\c[ZWJ]”).chars 1

  73. Perl 6 “ ❤ \r\nกํา”

  74. Perl 6 “ ❤ \r\nกํา”.substr(1, 1)

  75. Perl 6 “ ❤ \r\nกํา”.substr(1, 1) “\r\n”

  76. None
  77. None
  78. None
  79. None
  80. None
  81. None
  82. "text": " ", "fontSize": 144, "fontWeight": "normal", "fontFamily": "Oswald", "fontStyle":

    "normal", "lineHeight": 1.16, "underline": false, "overline": false, "linethrough": false, "textAlign": "left", "textBackgroundColor": "",
  83. None
  84. None
  85. None
  86. None
  87. None
  88. None
  89. Product

  90. Create

  91. Quality

  92. Career

  93. None
  94. Characters for Humans Nova Patch @novapatch Shutterstock