Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Decolonizing Unicode: Can We Type Every Word That Has Ever Been Written? - AlterConf Portland 2017

Decolonizing Unicode: Can We Type Every Word That Has Ever Been Written? - AlterConf Portland 2017

To make the web a fully open and accessible platform, we need to ensure that everyone can communicate online in their native language. However, there are a lot of technical challenges to supporting the entire world’s languages in written text. Unicode aspires to be the international standard for representing text digitally. Let’s look at some of the decisions that were made when designing Unicode, and some of the amusing quirks that they lead to, as we think of the best way to design digital writing systems that are truly universally accessible.

Aditya Mukerjee

November 04, 2017
Tweet

More Decks by Aditya Mukerjee

Other Decks in Technology

Transcript

  1. Decolonizing Unicode: Can We Type Every Word That Has Ever

    Been Written? Aditya Mukerjee Observability Engineer at Stripe AlterConf Portland November 4, 2017
  2. 1. What is decolonization? What is Unicode? 2. What are

    the problems with Unicode? 3. How can we fix these problems? @chimeracoder
  3. Decolonization: process of actively reversing the influence of and damage

    caused by a hegemonic, imperial, or oppressive power. @chimeracoder
  4. Unicode Consortium •Started in 1987 by engineers at Xerox and

    Apple @chimeracoder $7,000/year $12,000/year $18,000/year
  5. What matters isn’t just who’s at the table – it’s

    who’s not at the table @chimeracoder
  6. Latin, Greek, and Cyrillic Alphabets @chimeracoder Character Name Unicode Number

    Character Latin Small Letter A 97 a Greek Small Letter Alpha 945 α Cyrillic Small Letter A 1,072 а Acute Accent 180 ´ Latin Small Letter A with Acute [Accent] 225 á
  7. @chimeracoder Character Name Chinese (simplified) Chinese (traditional) Japanese Korean Vietnamese

    ‘edged tool, cutlery, knife edge’ 刃 刃 刃 刃 刃 Han-based Writing Systems
  8. @chimeracoder Character Name Unicode number Chinese (simplified) Chinese (traditional) Japanese

    Korean Vietnamese ‘edged tool, cutlery, knife edge’ 20,995 刃 刃 刃 刃 刃
  9. An iterative approach to diversity is incompatible with a commitment

    to backwards-compatibility @chimeracoder Languages are hard and messy …but…
  10. 2. Structural biases are more subtle than skin tone @chimeracoder

    1. Encoding skin tone into all interactions is not representation
  11. 1. Engineering convenience cannot be an excuse to sacrifice authenticity

    2. We need qualified engineers and linguists from around the world 3. The writing system for a language must be developed by native speakers @chimeracoder
  12. Unicode could be the first time in history that the

    writing systems for languages are defined by people who don’t even speak those languages @chimeracoder