Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Characters for Humans

Nova Patch
October 18, 2017

Characters for Humans

A “character” can mean different things to different people, but the largest disparity is between applications and the humans who use them. Programmers aren’t to blame, as our programming languages, libraries, and databases provide little or no support for understanding user-perceived characters. Many systems disagree on the basic units of characters, some use code points, others use code units, and others still operate on individual bytes by default. This frequently leads to products with a poor experience in some users’ languages, especially written languages that use grapheme clusters, sequences of code points that compose a single user-perceived character. With the rise in global emoji usage and the rapid evolution of standard emoji sequences, this problem is increasingly experienced by users worldwide, regardless of their language.

This presentation covers:
• Extended grapheme clusters and emoji sequences
• Programming with these user-perceived characters
• Data input, parsing, analysis, formatting, and output
• Setting product requirements for character support
• Examples from Shutterstock’s platforms for content editing and collaboration

Links to all referenced projects and standards:
https://novapatch.is/talks/characters-for-humans/

Presented at:
• 2017-10-18: Internationalization & Unicode Conference 41 (IUC41), Santa Clara, CA
• 2017-06-21: The Perl Conference (YAPC::NA), Washington, DC

Nova Patch

October 18, 2017
Tweet

More Decks by Nova Patch

Other Decks in Programming

Transcript

  1. cs da de en es fi. .fr. hu it ja

    ko nb nl. .pl. .pt ru sv th tr zh.
  2. z̢̲̲͓̥̈̄ͬ͆ͥͅ a̩ͧ̃͌̌̋ ͙̼ ̼ ̬ ̹ l̼̀ͯ̎ ̣̭̯ ̬ ͔̫g̅

    ̀͑ͨͦͬo̜̎̑͛͞ ̥ ̪ ̤̰ ̯ ̻ H̏̚ ̷͎̱̺̔̇ͯͫ ̭͚̦E̅ ͤ́ ̚ ̢͈͇̙̍̑ ͈̥ C͎͔̪ͩͬ ͖̭ͅỌ͈͙͉̗̬ͧ ͉M̜ͦ̔ ̠ ̫E͏̖͎͕̼ ̼ ̝ ͓Ș͓͈̻̄̆ͫͅ ͈͔
  3. n ̈ 0067 ( n ) LATIN SMALL LETTER N

    0308 ( ◌̈ ) COMBINING DIAERESIS
  4. 각 1100 ( ᄀ ) HANGUL CHOSEONG KIYEOK 1161 (ᅟᅡ

    ) HANGUL JUNGSEONG A 11A8 (ᅟᅠᆨ ) HANGUL JONGSEONG KIYEOK
  5. กํา 0E01 ( ก ) THAI CHARACTER KO KAI 0E33

    ( ◌ำ ) THAI CHARACTER SARA AM
  6. நி 0BA8 ( ந ) TAMIL LETTER NA 0BBF (

    ◌ி ) TAMIL VOWEL SIGN I
  7. ष 0937 ( ष ) DEVANAGARI LETTER SSA 093F (

    ◌ि ) DEVANAGARI VOWEL SIGN I
  8. "text": " ", "fontSize": 144, "fontWeight": "normal", "fontFamily": "Oswald", "fontStyle":

    "normal", "lineHeight": 1.16, "underline": false, "overline": false, "linethrough": false, "textAlign": "left", "textBackgroundColor": "",