Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Emojis! ๐ŸŽ‰๐Ÿ™Œ๐Ÿ˜Ž (GDG Riga 2017)

Madis Pink
September 01, 2017

Emojis! ๐ŸŽ‰๐Ÿ™Œ๐Ÿ˜Ž (GDG Rigaย 2017)

Over the past decade the popularity of emojis has grown from relative obscurity to a big-budget Hollywood picture ๐ŸŽž๏ธ.

Emojis used to be solely in the domain of chat apps but these days any app having user generated content is expected to have emoji support.
In this session we'll take a technical look at how Emojis are represented in Unicode ๐Ÿค“, identify common pitfalls ๐Ÿ› and find ways to circumvent them (pitfalls that is, not emojis ๐Ÿ˜Ž).

Madis Pink

September 01, 2017
Tweet

More Decks by Madis Pink

Other Decks in Programming

Transcript

  1. !

  2. !"

  3. !"#

  4. โ†’ 2010 Unicode 6.0, 722 characters โ†’ 2011 iOS 5

    released โ†’ 2013 Android 4.3 released
  5. !

  6. Unicode is everywhere $ git checkout -b ! Switched to

    a new branch '!' $ git checkout -b " Switched to a new branch '"'
  7. Unicode is everywhere $ git checkout -b ! Switched to

    a new branch '!' $ git checkout -b " Switched to a new branch '"' $ git branch -l madis/fix-filescan-race master release-1.x * " !
  8. !

  9. Unicode / UCS-2 โ†’ 16 bits (65536 chars) โ†’ Windows

    (wchar_t), Java, JavaScript, Python 2 โ†’ U+0000..U+FFFF โ†’ Basic Multilingual Plane
  10. Unicode / UTF-16 โ†’ 17 planes in total โ†’ Reserve

    U+D800..U+DFFF for surrogate pairs 0xD800 = 1101100000000000 0xDFFF = 1101111111111111
  11. Unicode / UTF-16 0xD800 = 110110 0000000000 0xDFFF = 110111

    1111111111 110110 XXXXX XXXXX <-- high surrogate 110111 YYYYY YYYYY <-- low surrogate
  12. UTF-16 Surrogate Pairs โ†’ ! - U+1F60E โ†’ Subtract 0x10000:

    1F60E - 10000 = F60E โ†’ Split into 10 bits high and low: F60E = 0000111101.1000001110 = 003D.020E โ†’ Add D800 to high and DC00 to low: 003D + D800 = D83D 020E + DC00 = DE0E
  13. UTF-16 Surrogate Pairs โ†’ Take care with string operations, especially

    with length() and substring(...) โ†’ StringBuilder.reverse actually works
  14. UTF-16 Surrogate Pairs static int length(String s) { int len

    = 0; for (char c : s.toCharArray()) { // count surrogate pairs once len += Character.isLowSurrogate(c) ? 0 : 1; } return len; }
  15. Multi-character Emojis โ†’ Regional symbols A-Z โ†’ ! + "

    = # โ†’ $ + $ = % โ†’ & + $ = ' โ†’ $ + & = (
  16. Unicode versions โ†’ 6 major Unicode versions โ†’ ! (Person

    facepalming, added in 9.0) โ†’ (Zombie, added in 10.0)
  17. TL;DR โ†’ Know your platform! โ†’ Take care with string

    operations โ†’ Use a compatibility library โ†’ https://emojipedia.org/