Decolonizing Unicode: Can We Type Every Word That Has Ever Been Written? - AlterConf Portland 2017

Decolonizing Unicode: Can We Type Every Word That Has Ever Been Written? - AlterConf Portland 2017

To make the web a fully open and accessible platform, we need to ensure that everyone can communicate online in their native language. However, there are a lot of technical challenges to supporting the entire world’s languages in written text. Unicode aspires to be the international standard for representing text digitally. Let’s look at some of the decisions that were made when designing Unicode, and some of the amusing quirks that they lead to, as we think of the best way to design digital writing systems that are truly universally accessible.

94dcff33cbdf74b5d785369ac54bc1a8?s=128

Aditya Mukerjee

November 04, 2017
Tweet

Transcript

  1. Decolonizing Unicode: Can We Type Every Word That Has Ever

    Been Written? Aditya Mukerjee Observability Engineer at Stripe AlterConf Portland November 4, 2017
  2. 1. What is decolonization? What is Unicode? 2. What are

    the problems with Unicode? 3. How can we fix these problems? @chimeracoder
  3. What is decolonization? @chimeracoder

  4. Decolonization: process of actively reversing the influence of and damage

    caused by a hegemonic, imperial, or oppressive power. @chimeracoder
  5. What is Unicode? @chimeracoder

  6. The ’ problem @chimeracoder

  7. @chimeracoder H E L L O ! ☺ 72 69

    76 76 79 33 9786
  8. @chimeracoder Д ∫ ∯ ㈥ ꣷ ﷽"#$% అ Ψ ◔

  9. Who created Unicode? @chimeracoder

  10. Unicode Consortium •Started in 1987 by engineers at Xerox and

    Apple @chimeracoder $7,000/year $12,000/year $18,000/year
  11. What matters isn’t just who’s at the table – it’s

    who’s not at the table @chimeracoder
  12. What is a character? @chimeracoder

  13. Latin, Greek, and Cyrillic Alphabets @chimeracoder Character Name Unicode Number

    Character Latin Small Letter A 97 a Greek Small Letter Alpha 945 α Cyrillic Small Letter A 1,072 а Acute Accent 180 ´ Latin Small Letter A with Acute [Accent] 225 á
  14. @chimeracoder Character Name Chinese (simplified) Chinese (traditional) Japanese Korean Vietnamese

    ‘edged tool, cutlery, knife edge’ 刃 刃 刃 刃 刃 Han-based Writing Systems
  15. “catalog” vs. “catalogue” @chimeracoder “a” vs. “α ” “Stephen” vs.

    “Steven”
  16. Han Unification @chimeracoder

  17. @chimeracoder Character Name Unicode number Chinese (simplified) Chinese (traditional) Japanese

    Korean Vietnamese ‘edged tool, cutlery, knife edge’ 20,995 刃 刃 刃 刃 刃
  18. @chimeracoder

  19. An iterative approach to diversity is incompatible with a commitment

    to backwards-compatibility @chimeracoder Languages are hard and messy …but…
  20. Emoji @chimeracoder

  21. Racemoji @chimeracoder

  22. 2. Structural biases are more subtle than skin tone @chimeracoder

    1. Encoding skin tone into all interactions is not representation
  23. What’s the solution? @chimeracoder

  24. 1. Engineering convenience cannot be an excuse to sacrifice authenticity

    2. We need qualified engineers and linguists from around the world 3. The writing system for a language must be developed by native speakers @chimeracoder
  25. Unicode could be the first time in history that the

    writing systems for languages are defined by people who don’t even speak those languages @chimeracoder
  26. Thank you! Aditya Mukerjee @chimeracoder