Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Unicode, JavaScript and the Emoji family

stefan judis
November 07, 2016

Unicode, JavaScript and the Emoji family

stefan judis

November 07, 2016
Tweet

More Decks by stefan judis

Other Decks in Technology

Transcript

  1. Stefan Judis Frontend Developer, Occasional Teacher, Meetup Organizer ❤ Open

    Source, Performance and Accessibility ❤ @stefanjudis
  2. Stefan Judis Frontend Developer, Occasional Teacher, Meetup Organizer ❤ Open

    Source, Performance and Accessibility ❤ @stefanjudis
  3. UNICODE ... is an international encoding standard 01 02 03

    is a mapping from each letter, digit or symbol to a numeric value works across different platforms and programs
  4. 1,114,112 code points in 17 planes Basic Multilingual Plane U+0000

    to U+FFFF Supplementary Planes u+10000 to U+10FFFF U+10000 to U+1FFFF U+20000 to U+2FFFF U+30000 to U+DFFFF U+E0000 to U+EFFFF U+F0000 to U+10FFFF Supplementary Multilingual Plane Supplementary Ideographic Plane Supplementary Special-purpose Plane Supplementary Private Use Area Planes unassigned 1 plane 1 plane 1 plane 1 plane 2 planes 16 planes 11 planes UNICODE - overview -
  5. characters for almost all modern languages + a lot of

    of symbols Basic Multilingual Plane U+0000 to U+FFFF Supplementary Planes U+10000 to U+10FFFF U+10000 to U+1FFFF U+20000 to U+2FFFF U+30000 to U+DFFFF U+E0000 to U+EFFFF U+F0000 to U+10FFFF Supplementary Multilingual Plane Supplementary Ideographic Plane Supplementary Special-purpose Plane Supplementary Private Use Area Planes unassigned 1 plane 1 plane 1 plane 1 plane 2 planes 16 planes 11 planes UNICODE - Basic Multilingual Plane -
  6. everything else Basic Multilingual Plane U+0000 to U+FFFF Supplementary Planes

    U+10000 to U+10FFFF U+10000 to U+1FFFF U+20000 to U+2FFFF U+30000 to U+DFFFF U+E0000 to U+EFFFF U+F0000 to U+10FFFF Supplementary Multilingual Plane Supplementary Ideographic Plane Supplementary Special-purpose Plane Supplementary Private Use Area Planes unassigned 1 plane 1 plane 1 plane 1 plane 2 planes 16 planes 11 planes UNICODE - Supplementary Planes -
  7. EMOJIS ... were initially used by Japanese mobile operators 01

    02 03 were added to Unicode v6 in October 2010 are supported since OS X 10.7 (Lion) and Windows 8
  8. Basic Multilingual Plane U+0000 to U+FFFF Supplementary Planes U+10000 to

    U+10FFFF U+10000 to U+1FFFF U+20000 to U+2FFFF U+30000 to U+DFFFF U+E0000 to U+EFFFF U+F0000 to U+10FFFF Supplementary Multilingual Plane Supplementary Ideographic Plane Supplementary Special-purpose Plane Supplementary Private Use Area Planes unassigned 1 Plane 1 Plane 1 Plane 1 Plane 2 Planes 16 Planes 11 Planes %' are in the Supplementary Multilingual Plane EMOJIS - overview -
  9. EMOJIS ZERO WIDTH JOINER U+200D Indicator that a single glyph

    should be presented for a sequence of characters - ZWJ sequences -
  10. EMOJIS - ZWJ sequences - * U+1F468 + ZWJ U+200D

    + U+1F468 U+1F467 + ZWJ U+200D + ( 5 code points )
  11. EMOJIS - ZWJ sequences - woman astronaut ( 4 code

    points ) ZWJ + + man artist ( 4 code points ) ZWJ + + man getting hair cut ( 4 code points ) ♂ ZWJ + + - woman mountain biking ( 4 code points ) ♀ ZWJ + + /
  12. EMOJIS - ZWJ sequences - woman astronaut ( 4 code

    points ) ZWJ + + man artist ( 4 code points ) ZWJ + + man getting hair cut ( 4 code points ) ♂ ZWJ + + - woman mountain biking ( 4 code points ) ♀ ZWJ + + / "David Bowie" - Singer - ZWJ + + Apple Google ZWJ + +
  13. EMOJIS - ZWJ sequences - woman astronaut ( 4 code

    points ) ZWJ + + man artist ( 4 code points ) ZWJ + + man getting hair cut ( 4 code points ) ♂ ZWJ + + - woman mountain biking ( 4 code points ) ♀ ZWJ + + / "David Bowie" Emoji is not yet supported.
  14. EMOJIS - ZWJ sequences - woman astronaut ( 4 code

    points ) ZWJ + + man artist ( 4 code points ) ZWJ + + man getting hair cut ( 4 code points ) ♂ ZWJ + + - woman mountain biking ( 4 code points ) ♀ ZWJ + + / Sequences degrade gracefully! '\u{1F468}\u{200D}\u{1F3A4}' "" '\u{1F469}\u{200D}\u{1F3A4}' ""
  15. EMOJIS - flags - ... 26 regional indicators used in

    pairs to represent regions U+1F1E6 U+1F1FF
  16. EMOJIS - flags - ... 26 regional indicators used in

    pairs to represent regions U+1F1E6 U+1F1FF 7 U+1F1E9 U+1F1EA : U+1F1EC U+1F1E7 < U+1F1E8 U+1F1FD ( 2 code points ) ( 2 code points ) ( 2 code points )
  17. EMOJIS - flags - www.dwitter.net/d/2708 function() { x.font='96px a' S=String.fromCodePoint

    W=e=>x.measureText(e).width i=t*4%257|0 W(S(F=0x1F1E6,F))>W(_=S(F+i%26,F+i/26|0))&&x.fillText(_,9,99) } Dweet by @veubeke
  18. How many Emojis are out there? EMOJIS - overview -

    2198 unicode.org/reports/tr51/#Identification (excluding incomplete singletons) (excluding duplicates) (including all combined sequences)
  19. JAVASCRIPT UTF-16, the string format used by JavaScript, uses a

    single 16-bit code unit to represent the most common characters. - string representation -
  20. \u0000 - \uFFFF can fit into 16bit ツ ('\uFF82') 

    ('\uF8FF') ‚ ('\u9731') ⛷ ('\u26F7') JAVASCRIPT - characters with one code unit -
  21. \u0000 - \uFFFF can fit into 16bit 'ツ'.length ''.length '‚'.length

    '⛷'.length 1 JAVASCRIPT - characters with one code unit -
  22. How can we use code points out of the 16bit

    range? JAVASCRIPT - surrogate pairs -
  23. Surrogate Pairs JAVASCRIPT - surrogate pairs - 2048 surrogate code

    points included in the Basic Multilingual Plane
  24. Surrogate Pairs JAVASCRIPT - surrogate pairs - 2048 surrogate code

    points included in the Basic Multilingual Plane Leading/High Surrogates U+D800 to U+DBFF
  25. Surrogate Pairs JAVASCRIPT - surrogate pairs - 2048 surrogate code

    points included in the Basic Multilingual Plane Leading/High Surrogates Trailing/Low Surrogates U+D800 to U+DBFF U+DC00 to U+DFFF
  26. Surrogate Pairs JAVASCRIPT - surrogate pairs - 2048 surrogate code

    points included in the Basic Multilingual Plane Leading/High Surrogates Trailing/Low Surrogates U+D800 to U+DBFF U+DC00 to U+DFFF C = (H - 0xD800) * 0x400 + L - 0xDC00 + 0x10000 Formula to get code point C = (H - 55296) * 1024 + L - 56320 + 65536
  27. Surrogate Pairs JAVASCRIPT - surrogate pairs - ''.charCodeAt(0) U+D83D 55357

    ''.charCodeAt(1) U+DC68 56424 U+1F468 128104 ''.length // 2
  28. Surrogate Pairs JAVASCRIPT - surrogate pairs - ''.charCodeAt(0) U+D83D 55357

    ''.charCodeAt(1) U+DC68 56424 U+1F468 128104 0x1F468 = (0xD83D - 0xD800) * 0x400 + 0xDC68 - 0xDC00 + 0x10000 128104 = (55357 - 55296) * 1024 + 56424 - 56320 + 65536 ''.length // 2
  29. Surrogate Pairs JAVASCRIPT - surrogate pairs - ''.charCodeAt(0) U+D83D 55357

    ''.charCodeAt(1) U+DC68 56424 U+1F468 128104 0x1F468 = (0xD83D - 0xD800) * 0x400 + 0xDC68 - 0xDC00 + 0x10000 128104 = (55357 - 55296) * 1024 + 56424 - 56320 + 65536 ''.length // 2
  30. charCodeAt() vs codePointAt() JAVASCRIPT - surrogate pairs - U+1F468 128104

    ''.codePointAt(0) U+1F468 128104 ''.codePointAt(1) U+DC68 56424 ''.charCodeAt(0) U+D83D 55357 ''.charCodeAt(1) U+DC68 56424
  31. charCodeAt() vs codePointAt() JAVASCRIPT - surrogate pairs - U+1F468 128104

    ''.codePointAt(0) U+1F468 128104 ''.codePointAt(1) U+DC68 56424 ''.charCodeAt(0) U+D83D 55357 ''.charCodeAt(1) U+DC68 56424
  32. JAVASCRIPT - String.prototype.length - This property returns the number of

    code units in the string. String.prototype.length
  33. - the spread operator - The spread operator works for

    every iterable object. [...'ABC'] JAVASCRIPT
  34. - the spread operator - The spread operator works for

    every iterable object. [...'ABC'] JAVASCRIPT > ''[Symbol.iterator] function [Symbol.iterator]() { [native code] }
  35. - the spread operator - [...] iterates over the code

    points of a String value, returning each code point as a String value. String.prototype [ @@iterator ]( ) JAVASCRIPT