Emojis! ๐ŸŽ‰๐Ÿ™Œ๐Ÿ˜Ž (GDG Riga 2017)

B5b43736709fb47edc3ee649618d84f7?s=47 Madis Pink
September 01, 2017

Emojis! ๐ŸŽ‰๐Ÿ™Œ๐Ÿ˜Ž (GDG Rigaย 2017)

Over the past decade the popularity of emojis has grown from relative obscurity to a big-budget Hollywood picture ๐ŸŽž๏ธ.

Emojis used to be solely in the domain of chat apps but these days any app having user generated content is expected to have emoji support.
In this session we'll take a technical look at how Emojis are represented in Unicode ๐Ÿค“, identify common pitfalls ๐Ÿ› and find ways to circumvent them (pitfalls that is, not emojis ๐Ÿ˜Ž).

B5b43736709fb47edc3ee649618d84f7?s=128

Madis Pink

September 01, 2017
Tweet

Transcript

  1. Emojis! !"# @madisp

  2. @madisp

  3. !

  4. !"

  5. !"#

  6. !"#$

  7. !"#$%

  8. 1999 NTT DoCoMo, Shigetaka Kurita

  9. โ†’ 2010 Unicode 6.0, 722 characters โ†’ 2011 iOS 5

    released โ†’ 2013 Android 4.3 released
  10. ! word of the year 2015 Oxford Dictionaries

  11. None
  12. !

  13. None
  14. Unicode is everywhere

  15. Unicode is everywhere == emojis are everywhere

  16. Unicode is everywhere https://en.wikipedia.org/wiki/!

  17. Unicode is everywhere https://en.wikipedia.org/wiki/!

  18. Unicode is everywhere $ git checkout -b ! Switched to

    a new branch '!'
  19. Unicode is everywhere $ git checkout -b ! Switched to

    a new branch '!' $ git checkout -b " Switched to a new branch '"'
  20. Unicode is everywhere $ git checkout -b ! Switched to

    a new branch '!' $ git checkout -b " Switched to a new branch '"' $ git branch -l madis/fix-filescan-race master release-1.x * " !
  21. Unicode is (almost) everywhere โ†’ Payment message: Money !"# โ†’

    Payment message: Money
  22. Emojis on the JVM >>> "!" !

  23. Emojis on the JVM >>> "!" ! >>> "!".length 2

  24. Emojis on the JVM >>> "!" ! >>> "!".length 2

    >>> "!".substring(0,1) ?
  25. !

  26. Text encodings โ†’ US-ASCII, 7 bits (128 chars) โ†’ ISO-8859-1

    .. 16, 8 bits (256 chars) โ†’ Unicode!
  27. Unicode / UCS-2 โ†’ 16 bits (65536 chars) โ†’ Windows

    (wchar_t), Java, JavaScript, Python 2 โ†’ U+0000..U+FFFF โ†’ Basic Multilingual Plane
  28. Unicode / UTF-16 โ†’ 17 planes in total โ†’ Reserve

    U+D800..U+DFFF for surrogate pairs 0xD800 = 1101100000000000 0xDFFF = 1101111111111111
  29. Unicode / UTF-16 0xD800 = 110110 0000000000 0xDFFF = 110111

    1111111111
  30. Unicode / UTF-16 0xD800 = 110110 0000000000 0xDFFF = 110111

    1111111111 110110 XXXXX XXXXX <-- high surrogate 110111 YYYYY YYYYY <-- low surrogate
  31. UTF-16 Surrogate Pairs โ†’ ! - U+1F60E โ†’ Subtract 0x10000:

    1F60E - 10000 = F60E โ†’ Split into 10 bits high and low: F60E = 0000111101.1000001110 = 003D.020E โ†’ Add D800 to high and DC00 to low: 003D + D800 = D83D 020E + DC00 = DE0E
  32. UTF-16 Surrogate Pairs Surrogate pair is D83D DE0E

  33. UTF-16 Surrogate Pairs Surrogate pair is D83D DE0E >>> "!"[0].toInt().toString(16)

    d83d
  34. UTF-16 Surrogate Pairs Surrogate pair is D83D DE0E >>> "!"[0].toInt().toString(16)

    d83d >>> "!"[1].toInt().toString(16) de0e
  35. UTF-16 Surrogate Pairs โ†’ Take care with string operations, especially

    with length() and substring(...) โ†’ StringBuilder.reverse actually works
  36. UTF-16 Surrogate Pairs static int length(String s) { int len

    = 0; for (char c : s.toCharArray()) { // count surrogate pairs once len += Character.isLowSurrogate(c) ? 0 : 1; } return len; }
  37. Multi-character Emojis โ†’ Regional symbols A-Z โ†’ ! + "

    = # โ†’ $ + $ = % โ†’ & + $ = ' โ†’ $ + & = (
  38. Multi-character Emojis >>> "!".length 4

  39. Multi-character Emojis >>> "!".length 4 >>> StringBuilder("!").reverse().toString()

  40. Multi-character Emojis >>> "!".length 4 >>> StringBuilder("!").reverse().toString() "

  41. Multi-character Emojis โ†’ Zero-Width Joiner (U+200D) โ†’ !+ZWJ+"+ZWJ+#+ZWJ+$ = %

  42. Multi-character Emojis >>> "!".length 11

  43. Multi-character Emojis >>> "!".length 11 >>> "!".substring(0, 8) "

  44. Multi-character Emojis

  45. Unicode versions โ†’ 6 major Unicode versions โ†’ ! (Person

    facepalming, added in 9.0) โ†’ (Zombie, added in 10.0)
  46. ! added in 6.1

  47. ! added in 6.0

  48. Emoji One (Web) 1 EmojiCompat (Android) 2 2 https://developer.android.com/guide/topics/ui/look-and-feel/emoji-compat.html 1

    https://github.com/emojione/emojione
  49. TL;DR โ†’ Know your platform! โ†’ Take care with string

    operations โ†’ Use a compatibility library โ†’ https://emojipedia.org/
  50. Thanks! ? twitter.com/madisp speakerdeck.com/madisp