Slide 1

Slide 1 text

Decolonizing Unicode: Can We Type Every Word That Has Ever Been Written? Aditya Mukerjee Observability Engineer at Stripe AlterConf Portland November 4, 2017

Slide 2

Slide 2 text

1. What is decolonization? What is Unicode? 2. What are the problems with Unicode? 3. How can we fix these problems? @chimeracoder

Slide 3

Slide 3 text

What is decolonization? @chimeracoder

Slide 4

Slide 4 text

Decolonization: process of actively reversing the influence of and damage caused by a hegemonic, imperial, or oppressive power. @chimeracoder

Slide 5

Slide 5 text

What is Unicode? @chimeracoder

Slide 6

Slide 6 text

The ’ problem @chimeracoder

Slide 7

Slide 7 text

@chimeracoder H E L L O ! ☺ 72 69 76 76 79 33 9786

Slide 8

Slide 8 text

@chimeracoder Д ∫ ∯ ㈥ ꣷ ﷽"#$% అ Ψ ◔

Slide 9

Slide 9 text

Who created Unicode? @chimeracoder

Slide 10

Slide 10 text

Unicode Consortium •Started in 1987 by engineers at Xerox and Apple @chimeracoder $7,000/year $12,000/year $18,000/year

Slide 11

Slide 11 text

What matters isn’t just who’s at the table – it’s who’s not at the table @chimeracoder

Slide 12

Slide 12 text

What is a character? @chimeracoder

Slide 13

Slide 13 text

Latin, Greek, and Cyrillic Alphabets @chimeracoder Character Name Unicode Number Character Latin Small Letter A 97 a Greek Small Letter Alpha 945 α Cyrillic Small Letter A 1,072 а Acute Accent 180 ´ Latin Small Letter A with Acute [Accent] 225 á

Slide 14

Slide 14 text

@chimeracoder Character Name Chinese (simplified) Chinese (traditional) Japanese Korean Vietnamese ‘edged tool, cutlery, knife edge’ 刃 刃 刃 刃 刃 Han-based Writing Systems

Slide 15

Slide 15 text

“catalog” vs. “catalogue” @chimeracoder “a” vs. “α ” “Stephen” vs. “Steven”

Slide 16

Slide 16 text

Han Unification @chimeracoder

Slide 17

Slide 17 text

@chimeracoder Character Name Unicode number Chinese (simplified) Chinese (traditional) Japanese Korean Vietnamese ‘edged tool, cutlery, knife edge’ 20,995 刃 刃 刃 刃 刃

Slide 18

Slide 18 text

@chimeracoder

Slide 19

Slide 19 text

An iterative approach to diversity is incompatible with a commitment to backwards-compatibility @chimeracoder Languages are hard and messy …but…

Slide 20

Slide 20 text

Emoji @chimeracoder

Slide 21

Slide 21 text

Racemoji @chimeracoder

Slide 22

Slide 22 text

2. Structural biases are more subtle than skin tone @chimeracoder 1. Encoding skin tone into all interactions is not representation

Slide 23

Slide 23 text

What’s the solution? @chimeracoder

Slide 24

Slide 24 text

1. Engineering convenience cannot be an excuse to sacrifice authenticity 2. We need qualified engineers and linguists from around the world 3. The writing system for a language must be developed by native speakers @chimeracoder

Slide 25

Slide 25 text

Unicode could be the first time in history that the writing systems for languages are defined by people who don’t even speak those languages @chimeracoder

Slide 26

Slide 26 text

Thank you! Aditya Mukerjee @chimeracoder