A “character” can mean different things to different people, but the largest disparity is between applications and the humans who use them. Programmers aren’t to blame, as our programming languages, libraries, and databases provide little or no support for understanding user-perceived characters. Many systems disagree on the basic units of characters, some use code points, others use code units, and others still operate on individual bytes by default. This frequently leads to products with a poor experience in some users’ languages, especially written languages that use grapheme clusters, sequences of code points that compose a single user-perceived character. With the rise in global emoji usage and the rapid evolution of standard emoji sequences, this problem is increasingly experienced by users worldwide, regardless of their language.
This presentation covers:
• Extended grapheme clusters and emoji sequences
• Programming with these user-perceived characters
• Data input, parsing, analysis, formatting, and output
• Setting product requirements for character support
• Examples from Shutterstock’s platforms for content editing and collaboration
Links to all referenced projects and standards:
https://novapatch.is/talks/characters-for-humans/
Presented at:
• 2017-10-18: Internationalization & Unicode Conference 41 (IUC41), Santa Clara, CA
• 2017-06-21: The Perl Conference (YAPC::NA), Washington, DC