Slide 1

Slide 1 text

Emojis! !"# @madisp

Slide 2

Slide 2 text

@madisp

Slide 3

Slide 3 text

!

Slide 4

Slide 4 text

!"

Slide 5

Slide 5 text

!"#

Slide 6

Slide 6 text

!"#$

Slide 7

Slide 7 text

!"#$%

Slide 8

Slide 8 text

1999 NTT DoCoMo, Shigetaka Kurita

Slide 9

Slide 9 text

→ 2010 Unicode 6.0, 722 characters → 2011 iOS 5 released → 2013 Android 4.3 released

Slide 10

Slide 10 text

! word of the year 2015 Oxford Dictionaries

Slide 11

Slide 11 text

No content

Slide 12

Slide 12 text

!

Slide 13

Slide 13 text

No content

Slide 14

Slide 14 text

Unicode is everywhere

Slide 15

Slide 15 text

Unicode is everywhere == emojis are everywhere

Slide 16

Slide 16 text

Unicode is everywhere https://en.wikipedia.org/wiki/!

Slide 17

Slide 17 text

Unicode is everywhere https://en.wikipedia.org/wiki/!

Slide 18

Slide 18 text

Unicode is everywhere $ git checkout -b ! Switched to a new branch '!'

Slide 19

Slide 19 text

Unicode is everywhere $ git checkout -b ! Switched to a new branch '!' $ git checkout -b " Switched to a new branch '"'

Slide 20

Slide 20 text

Unicode is everywhere $ git checkout -b ! Switched to a new branch '!' $ git checkout -b " Switched to a new branch '"' $ git branch -l madis/fix-filescan-race master release-1.x * " !

Slide 21

Slide 21 text

Unicode is (almost) everywhere → Payment message: Money !"# → Payment message: Money

Slide 22

Slide 22 text

Emojis on the JVM >>> "!" !

Slide 23

Slide 23 text

Emojis on the JVM >>> "!" ! >>> "!".length 2

Slide 24

Slide 24 text

Emojis on the JVM >>> "!" ! >>> "!".length 2 >>> "!".substring(0,1) ?

Slide 25

Slide 25 text

!

Slide 26

Slide 26 text

Text encodings → US-ASCII, 7 bits (128 chars) → ISO-8859-1 .. 16, 8 bits (256 chars) → Unicode!

Slide 27

Slide 27 text

Unicode / UCS-2 → 16 bits (65536 chars) → Windows (wchar_t), Java, JavaScript, Python 2 → U+0000..U+FFFF → Basic Multilingual Plane

Slide 28

Slide 28 text

Unicode / UTF-16 → 17 planes in total → Reserve U+D800..U+DFFF for surrogate pairs 0xD800 = 1101100000000000 0xDFFF = 1101111111111111

Slide 29

Slide 29 text

Unicode / UTF-16 0xD800 = 110110 0000000000 0xDFFF = 110111 1111111111

Slide 30

Slide 30 text

Unicode / UTF-16 0xD800 = 110110 0000000000 0xDFFF = 110111 1111111111 110110 XXXXX XXXXX <-- high surrogate 110111 YYYYY YYYYY <-- low surrogate

Slide 31

Slide 31 text

UTF-16 Surrogate Pairs → ! - U+1F60E → Subtract 0x10000: 1F60E - 10000 = F60E → Split into 10 bits high and low: F60E = 0000111101.1000001110 = 003D.020E → Add D800 to high and DC00 to low: 003D + D800 = D83D 020E + DC00 = DE0E

Slide 32

Slide 32 text

UTF-16 Surrogate Pairs Surrogate pair is D83D DE0E

Slide 33

Slide 33 text

UTF-16 Surrogate Pairs Surrogate pair is D83D DE0E >>> "!"[0].toInt().toString(16) d83d

Slide 34

Slide 34 text

UTF-16 Surrogate Pairs Surrogate pair is D83D DE0E >>> "!"[0].toInt().toString(16) d83d >>> "!"[1].toInt().toString(16) de0e

Slide 35

Slide 35 text

UTF-16 Surrogate Pairs → Take care with string operations, especially with length() and substring(...) → StringBuilder.reverse actually works

Slide 36

Slide 36 text

UTF-16 Surrogate Pairs static int length(String s) { int len = 0; for (char c : s.toCharArray()) { // count surrogate pairs once len += Character.isLowSurrogate(c) ? 0 : 1; } return len; }

Slide 37

Slide 37 text

Multi-character Emojis → Regional symbols A-Z → ! + " = # → $ + $ = % → & + $ = ' → $ + & = (

Slide 38

Slide 38 text

Multi-character Emojis >>> "!".length 4

Slide 39

Slide 39 text

Multi-character Emojis >>> "!".length 4 >>> StringBuilder("!").reverse().toString()

Slide 40

Slide 40 text

Multi-character Emojis >>> "!".length 4 >>> StringBuilder("!").reverse().toString() "

Slide 41

Slide 41 text

Multi-character Emojis → Zero-Width Joiner (U+200D) → !+ZWJ+"+ZWJ+#+ZWJ+$ = %

Slide 42

Slide 42 text

Multi-character Emojis >>> "!".length 11

Slide 43

Slide 43 text

Multi-character Emojis >>> "!".length 11 >>> "!".substring(0, 8) "

Slide 44

Slide 44 text

Multi-character Emojis

Slide 45

Slide 45 text

Unicode versions → 6 major Unicode versions → ! (Person facepalming, added in 9.0) → (Zombie, added in 10.0)

Slide 46

Slide 46 text

! added in 6.1

Slide 47

Slide 47 text

! added in 6.0

Slide 48

Slide 48 text

Emoji One (Web) 1 EmojiCompat (Android) 2 2 https://developer.android.com/guide/topics/ui/look-and-feel/emoji-compat.html 1 https://github.com/emojione/emojione

Slide 49

Slide 49 text

TL;DR → Know your platform! → Take care with string operations → Use a compatibility library → https://emojipedia.org/

Slide 50

Slide 50 text

Thanks! ? twitter.com/madisp speakerdeck.com/madisp