Slide 1

Slide 1 text

[...''] = ['', '', '', '', ''] Unicode, JavaScript and the Emoji family @stefanjudis

Slide 2

Slide 2 text

Stefan Judis Frontend Developer, Occasional Teacher, Meetup Organizer ❤ Open Source, Performance and Accessibility ❤ @stefanjudis

Slide 3

Slide 3 text

cssclass.es

Slide 4

Slide 4 text

Stefan Judis Frontend Developer, Occasional Teacher, Meetup Organizer ❤ Open Source, Performance and Accessibility ❤ @stefanjudis

Slide 5

Slide 5 text

''.length

Slide 6

Slide 6 text

''.length 2

Slide 7

Slide 7 text

'%'.length

Slide 8

Slide 8 text

'%'.length 4

Slide 9

Slide 9 text

''.length

Slide 10

Slide 10 text

''.length 8

Slide 11

Slide 11 text

[...'']

Slide 12

Slide 12 text

[...''] ['', '', '', '', ''] length = 5

Slide 13

Slide 13 text

Okay! What's going on here?

Slide 14

Slide 14 text

It's all about Unicode

Slide 15

Slide 15 text

UNICODE ... is an international encoding standard 01 02 03 is a mapping from each letter, digit or symbol to a numeric value works across different platforms and programs

Slide 16

Slide 16 text

U+0000 to U+10FFFF 1,114,112 code points usually formatted as hexadecimal numbers from UNICODE - overview -

Slide 17

Slide 17 text

1,114,112 code points in 17 planes Basic Multilingual Plane U+0000 to U+FFFF Supplementary Planes u+10000 to U+10FFFF U+10000 to U+1FFFF U+20000 to U+2FFFF U+30000 to U+DFFFF U+E0000 to U+EFFFF U+F0000 to U+10FFFF Supplementary Multilingual Plane Supplementary Ideographic Plane Supplementary Special-purpose Plane Supplementary Private Use Area Planes unassigned 1 plane 1 plane 1 plane 1 plane 2 planes 16 planes 11 planes UNICODE - overview -

Slide 18

Slide 18 text

characters for almost all modern languages + a lot of of symbols Basic Multilingual Plane U+0000 to U+FFFF Supplementary Planes U+10000 to U+10FFFF U+10000 to U+1FFFF U+20000 to U+2FFFF U+30000 to U+DFFFF U+E0000 to U+EFFFF U+F0000 to U+10FFFF Supplementary Multilingual Plane Supplementary Ideographic Plane Supplementary Special-purpose Plane Supplementary Private Use Area Planes unassigned 1 plane 1 plane 1 plane 1 plane 2 planes 16 planes 11 planes UNICODE - Basic Multilingual Plane -

Slide 19

Slide 19 text

everything else Basic Multilingual Plane U+0000 to U+FFFF Supplementary Planes U+10000 to U+10FFFF U+10000 to U+1FFFF U+20000 to U+2FFFF U+30000 to U+DFFFF U+E0000 to U+EFFFF U+F0000 to U+10FFFF Supplementary Multilingual Plane Supplementary Ideographic Plane Supplementary Special-purpose Plane Supplementary Private Use Area Planes unassigned 1 plane 1 plane 1 plane 1 plane 2 planes 16 planes 11 planes UNICODE - Supplementary Planes -

Slide 20

Slide 20 text

Emojis

Slide 21

Slide 21 text

EMOJIS ... were initially used by Japanese mobile operators 01 02 03 were added to Unicode v6 in October 2010 are supported since OS X 10.7 (Lion) and Windows 8

Slide 22

Slide 22 text

Basic Multilingual Plane U+0000 to U+FFFF Supplementary Planes U+10000 to U+10FFFF U+10000 to U+1FFFF U+20000 to U+2FFFF U+30000 to U+DFFFF U+E0000 to U+EFFFF U+F0000 to U+10FFFF Supplementary Multilingual Plane Supplementary Ideographic Plane Supplementary Special-purpose Plane Supplementary Private Use Area Planes unassigned 1 Plane 1 Plane 1 Plane 1 Plane 2 Planes 16 Planes 11 Planes %' are in the Supplementary Multilingual Plane EMOJIS - overview -

Slide 23

Slide 23 text

How many Emojis are out there? EMOJIS - overview -

Slide 24

Slide 24 text

How many Emojis are out there? EMOJIS - overview - It depends how you count.

Slide 25

Slide 25 text

Modifier Sequences Five modifiers for diversity U+1F3FB U+1F3FC U+1F3FD U+1F3FE U+1F3FF

Slide 26

Slide 26 text

Modifier Sequences Five modifiers for diversity U+1F3FB U+1F3FC U+1F3FD U+1F3FE U+1F3FF ) = + U+1F3FD U+1F466 ( 2 code points )

Slide 27

Slide 27 text

EMOJIS ZERO WIDTH JOINER U+200D Indicator that a single glyph should be presented for a sequence of characters - ZWJ sequences -

Slide 28

Slide 28 text

EMOJIS U+1F46A - ZWJ sequences - ( 1 code point )

Slide 29

Slide 29 text

EMOJIS * - ZWJ sequences -

Slide 30

Slide 30 text

EMOJIS - ZWJ sequences - * U+1F468 + ZWJ U+200D + U+1F468 U+1F467 + ZWJ U+200D + ( 5 code points )

Slide 31

Slide 31 text

EMOJIS - ZWJ sequences - woman astronaut ( 4 code points ) ZWJ + + man artist ( 4 code points ) ZWJ + + man getting hair cut ( 4 code points ) ♂ ZWJ + + - woman mountain biking ( 4 code points ) ♀ ZWJ + + /

Slide 32

Slide 32 text

EMOJIS - ZWJ sequences - woman astronaut ( 4 code points ) ZWJ + + man artist ( 4 code points ) ZWJ + + man getting hair cut ( 4 code points ) ♂ ZWJ + + - woman mountain biking ( 4 code points ) ♀ ZWJ + + / "David Bowie" - Singer - ZWJ + + Apple Google ZWJ + +

Slide 33

Slide 33 text

EMOJIS - ZWJ sequences - woman astronaut ( 4 code points ) ZWJ + + man artist ( 4 code points ) ZWJ + + man getting hair cut ( 4 code points ) ♂ ZWJ + + - woman mountain biking ( 4 code points ) ♀ ZWJ + + / "David Bowie" Emoji is not yet supported.

Slide 34

Slide 34 text

EMOJIS - ZWJ sequences - woman astronaut ( 4 code points ) ZWJ + + man artist ( 4 code points ) ZWJ + + man getting hair cut ( 4 code points ) ♂ ZWJ + + - woman mountain biking ( 4 code points ) ♀ ZWJ + + / Sequences degrade gracefully! '\u{1F468}\u{200D}\u{1F3A4}' "" '\u{1F469}\u{200D}\u{1F3A4}' ""

Slide 35

Slide 35 text

EMOJIS - flags - ... 26 regional indicators used in pairs to represent regions U+1F1E6 U+1F1FF

Slide 36

Slide 36 text

EMOJIS - flags - ... 26 regional indicators used in pairs to represent regions U+1F1E6 U+1F1FF 7 U+1F1E9 U+1F1EA : U+1F1EC U+1F1E7 < U+1F1E8 U+1F1FD ( 2 code points ) ( 2 code points ) ( 2 code points )

Slide 37

Slide 37 text

EMOJIS - flags - www.dwitter.net/d/2708 function() { x.font='96px a' S=String.fromCodePoint W=e=>x.measureText(e).width i=t*4%257|0 W(S(F=0x1F1E6,F))>W(_=S(F+i%26,F+i/26|0))&&x.fillText(_,9,99) } Dweet by @veubeke

Slide 38

Slide 38 text

How many Emojis are out there? EMOJIS - overview - 2198 unicode.org/reports/tr51/#Identification (excluding incomplete singletons) (excluding duplicates) (including all combined sequences)

Slide 39

Slide 39 text

39 What about Unicode in JavaScript

Slide 40

Slide 40 text

JAVASCRIPT UTF-16, the string format used by JavaScript, uses a single 16-bit code unit to represent the most common characters. - string representation -

Slide 41

Slide 41 text

16-bit code unit 65536 code points JAVASCRIPT - string representation -

Slide 42

Slide 42 text

\u0000 - \uFFFF can fit into 16bit ツ ('\uFF82')  ('\uF8FF') ‚ ('\u9731') ⛷ ('\u26F7') JAVASCRIPT - characters with one code unit -

Slide 43

Slide 43 text

\u0000 - \uFFFF can fit into 16bit 'ツ'.length ''.length '‚'.length '⛷'.length 1 JAVASCRIPT - characters with one code unit -

Slide 44

Slide 44 text

How can we use code points out of the 16bit range? JAVASCRIPT - surrogate pairs -

Slide 45

Slide 45 text

Surrogate Pairs JAVASCRIPT - surrogate pairs - 2048 surrogate code points included in the Basic Multilingual Plane

Slide 46

Slide 46 text

Surrogate Pairs JAVASCRIPT - surrogate pairs - 2048 surrogate code points included in the Basic Multilingual Plane Leading/High Surrogates U+D800 to U+DBFF

Slide 47

Slide 47 text

Surrogate Pairs JAVASCRIPT - surrogate pairs - 2048 surrogate code points included in the Basic Multilingual Plane Leading/High Surrogates Trailing/Low Surrogates U+D800 to U+DBFF U+DC00 to U+DFFF

Slide 48

Slide 48 text

Surrogate Pairs JAVASCRIPT - surrogate pairs - 2048 surrogate code points included in the Basic Multilingual Plane Leading/High Surrogates Trailing/Low Surrogates U+D800 to U+DBFF U+DC00 to U+DFFF C = (H - 0xD800) * 0x400 + L - 0xDC00 + 0x10000 Formula to get code point C = (H - 55296) * 1024 + L - 56320 + 65536

Slide 49

Slide 49 text

Surrogate Pairs JAVASCRIPT - surrogate pairs - ''.length // 2 U+1F468 128104

Slide 50

Slide 50 text

Surrogate Pairs JAVASCRIPT - surrogate pairs - ''.charCodeAt(0) U+D83D 55357 U+1F468 128104 ''.length // 2

Slide 51

Slide 51 text

Surrogate Pairs JAVASCRIPT - surrogate pairs - ''.charCodeAt(0) U+D83D 55357 ''.charCodeAt(1) U+DC68 56424 U+1F468 128104 ''.length // 2

Slide 52

Slide 52 text

Surrogate Pairs JAVASCRIPT - surrogate pairs - ''.charCodeAt(0) U+D83D 55357 ''.charCodeAt(1) U+DC68 56424 U+1F468 128104 0x1F468 = (0xD83D - 0xD800) * 0x400 + 0xDC68 - 0xDC00 + 0x10000 128104 = (55357 - 55296) * 1024 + 56424 - 56320 + 65536 ''.length // 2

Slide 53

Slide 53 text

Surrogate Pairs JAVASCRIPT - surrogate pairs - ''.charCodeAt(0) U+D83D 55357 ''.charCodeAt(1) U+DC68 56424 U+1F468 128104 0x1F468 = (0xD83D - 0xD800) * 0x400 + 0xDC68 - 0xDC00 + 0x10000 128104 = (55357 - 55296) * 1024 + 56424 - 56320 + 65536 ''.length // 2

Slide 54

Slide 54 text

charCodeAt() vs codePointAt() JAVASCRIPT - surrogate pairs - U+1F468 128104 ''.codePointAt(0) U+1F468 128104 ''.codePointAt(1) U+DC68 56424 ''.charCodeAt(0) U+D83D 55357 ''.charCodeAt(1) U+DC68 56424

Slide 55

Slide 55 text

charCodeAt() vs codePointAt() JAVASCRIPT - surrogate pairs - U+1F468 128104 ''.codePointAt(0) U+1F468 128104 ''.codePointAt(1) U+DC68 56424 ''.charCodeAt(0) U+D83D 55357 ''.charCodeAt(1) U+DC68 56424

Slide 56

Slide 56 text

JAVASCRIPT - surrogate pairs - U+1F468 128104 '\uD83D\uDC68' simple Unicode escapes Unicode code point escapes '\u{1F468}'

Slide 57

Slide 57 text

57 Okay, what's the deal?

Slide 58

Slide 58 text

JAVASCRIPT - String.prototype.length - This property returns the number of code units in the string. String.prototype.length

Slide 59

Slide 59 text

- the spread operator - The spread operator works for every iterable object. [...'ABC'] JAVASCRIPT

Slide 60

Slide 60 text

- the spread operator - The spread operator works for every iterable object. [...'ABC'] JAVASCRIPT > ''[Symbol.iterator] function [Symbol.iterator]() { [native code] }

Slide 61

Slide 61 text

- the spread operator - [...] iterates over the code points of a String value, returning each code point as a String value. String.prototype [ @@iterator ]( ) JAVASCRIPT

Slide 62

Slide 62 text

62 Let's go back to the examples

Slide 63

Slide 63 text

''.length 2 1 code point but 2 code units (surrogate pair)

Slide 64

Slide 64 text

'%'.length 4 2 code points but 4 code units (2 surrogate pairs) +

Slide 65

Slide 65 text

''.length 8 5 code points but 8 code units (3 surrogate pairs) ZWJ ZWJ

Slide 66

Slide 66 text

[...''] ['', '', '', '', ''] U+200D (ZWJ) U+1F468 U+1F469 U+1F466 U+200D (ZWJ)

Slide 67

Slide 67 text

Thanks! @stefanjudis Slides ctfl.io/javascript-emoji-family Article ctfl.io/emoji-prototype-dot-length