[...''] = ['', '', '', '', '']
Unicode, JavaScript and
the Emoji family
@stefanjudis
Slide 2
Slide 2 text
Stefan Judis
Frontend Developer, Occasional Teacher, Meetup Organizer
❤ Open Source, Performance and Accessibility ❤
@stefanjudis
Slide 3
Slide 3 text
cssclass.es
Slide 4
Slide 4 text
Stefan Judis
Frontend Developer, Occasional Teacher, Meetup Organizer
❤ Open Source, Performance and Accessibility ❤
@stefanjudis
Slide 5
Slide 5 text
''.length
Slide 6
Slide 6 text
''.length
2
Slide 7
Slide 7 text
'%'.length
Slide 8
Slide 8 text
'%'.length
4
Slide 9
Slide 9 text
''.length
Slide 10
Slide 10 text
''.length
8
Slide 11
Slide 11 text
[...'']
Slide 12
Slide 12 text
[...'']
['', '', '', '', '']
length = 5
Slide 13
Slide 13 text
Okay!
What's going on here?
Slide 14
Slide 14 text
It's all about
Unicode
Slide 15
Slide 15 text
UNICODE ...
is an international
encoding standard
01
02
03
is a mapping from each letter, digit
or symbol to a numeric value
works across different platforms
and programs
Slide 16
Slide 16 text
U+0000 to U+10FFFF
1,114,112 code points
usually formatted as hexadecimal numbers from
UNICODE
- overview -
Slide 17
Slide 17 text
1,114,112 code points in 17 planes
Basic
Multilingual Plane
U+0000 to U+FFFF
Supplementary
Planes
u+10000 to U+10FFFF
U+10000
to
U+1FFFF
U+20000
to
U+2FFFF
U+30000
to
U+DFFFF
U+E0000
to
U+EFFFF
U+F0000
to
U+10FFFF
Supplementary
Multilingual
Plane
Supplementary
Ideographic
Plane
Supplementary
Special-purpose
Plane
Supplementary
Private Use Area
Planes
unassigned
1 plane
1 plane 1 plane 1 plane 2 planes
16 planes
11 planes
UNICODE
- overview -
Slide 18
Slide 18 text
characters for almost all modern languages + a lot of of symbols
Basic
Multilingual Plane
U+0000 to U+FFFF
Supplementary
Planes
U+10000 to U+10FFFF
U+10000
to
U+1FFFF
U+20000
to
U+2FFFF
U+30000
to
U+DFFFF
U+E0000
to
U+EFFFF
U+F0000
to
U+10FFFF
Supplementary
Multilingual
Plane
Supplementary
Ideographic
Plane
Supplementary
Special-purpose
Plane
Supplementary
Private Use Area
Planes
unassigned
1 plane
1 plane 1 plane 1 plane 2 planes
16 planes
11 planes
UNICODE
- Basic Multilingual Plane -
Slide 19
Slide 19 text
everything else
Basic
Multilingual Plane
U+0000 to U+FFFF
Supplementary
Planes
U+10000 to U+10FFFF
U+10000
to
U+1FFFF
U+20000
to
U+2FFFF
U+30000
to
U+DFFFF
U+E0000
to
U+EFFFF
U+F0000
to
U+10FFFF
Supplementary
Multilingual
Plane
Supplementary
Ideographic
Plane
Supplementary
Special-purpose
Plane
Supplementary
Private Use Area
Planes
unassigned
1 plane
1 plane 1 plane 1 plane 2 planes
16 planes
11 planes
UNICODE
- Supplementary Planes -
Slide 20
Slide 20 text
Emojis
Slide 21
Slide 21 text
EMOJIS ...
were initially used by
Japanese mobile operators
01
02
03
were added to Unicode v6 in
October 2010
are supported since OS X 10.7
(Lion) and Windows 8
Slide 22
Slide 22 text
Basic
Multilingual Plane
U+0000 to U+FFFF
Supplementary
Planes
U+10000 to U+10FFFF
U+10000
to
U+1FFFF
U+20000
to
U+2FFFF
U+30000
to
U+DFFFF
U+E0000
to
U+EFFFF
U+F0000
to
U+10FFFF
Supplementary
Multilingual
Plane
Supplementary
Ideographic
Plane
Supplementary
Special-purpose
Plane
Supplementary
Private Use Area
Planes
unassigned
1 Plane
1 Plane 1 Plane 1 Plane 2 Planes
16 Planes
11 Planes
%' are in the Supplementary Multilingual Plane
EMOJIS
- overview -
Slide 23
Slide 23 text
How many Emojis are out there?
EMOJIS
- overview -
Slide 24
Slide 24 text
How many Emojis are out there?
EMOJIS
- overview -
It depends how you count.
Slide 25
Slide 25 text
Modifier Sequences
Five modifiers for diversity
U+1F3FB
U+1F3FC
U+1F3FD
U+1F3FE
U+1F3FF
EMOJIS
- flags -
...
26 regional indicators used
in pairs to represent regions
U+1F1E6 U+1F1FF
Slide 36
Slide 36 text
EMOJIS
- flags -
...
26 regional indicators used
in pairs to represent regions
U+1F1E6 U+1F1FF
7
U+1F1E9 U+1F1EA
:
U+1F1EC U+1F1E7
<
U+1F1E8 U+1F1FD
( 2 code points ) ( 2 code points ) ( 2 code points )
Slide 37
Slide 37 text
EMOJIS
- flags -
www.dwitter.net/d/2708
function() {
x.font='96px a'
S=String.fromCodePoint
W=e=>x.measureText(e).width
i=t*4%257|0
W(S(F=0x1F1E6,F))>W(_=S(F+i%26,F+i/26|0))&&x.fillText(_,9,99)
}
Dweet by @veubeke
Slide 38
Slide 38 text
How many Emojis are out there?
EMOJIS
- overview -
2198
unicode.org/reports/tr51/#Identification
(excluding incomplete singletons)
(excluding duplicates)
(including all combined sequences)
Slide 39
Slide 39 text
39
What about
Unicode
in
JavaScript
Slide 40
Slide 40 text
JAVASCRIPT
UTF-16, the string format used by JavaScript,
uses a single 16-bit code unit
to represent the most common characters.
- string representation -
\u0000 - \uFFFF
can fit into 16bit
ツ
('\uFF82')
('\uF8FF')
‚
('\u9731')
⛷
('\u26F7')
JAVASCRIPT
- characters with one code unit -
Slide 43
Slide 43 text
\u0000 - \uFFFF
can fit into 16bit
'ツ'.length
''.length
'‚'.length
'⛷'.length
1
JAVASCRIPT
- characters with one code unit -
Slide 44
Slide 44 text
How can we use code points
out of the 16bit range?
JAVASCRIPT
- surrogate pairs -
Slide 45
Slide 45 text
Surrogate Pairs
JAVASCRIPT
- surrogate pairs -
2048 surrogate code points
included in the Basic Multilingual Plane
Slide 46
Slide 46 text
Surrogate Pairs
JAVASCRIPT
- surrogate pairs -
2048 surrogate code points
included in the Basic Multilingual Plane
Leading/High Surrogates
U+D800 to U+DBFF
Slide 47
Slide 47 text
Surrogate Pairs
JAVASCRIPT
- surrogate pairs -
2048 surrogate code points
included in the Basic Multilingual Plane
Leading/High Surrogates Trailing/Low Surrogates
U+D800 to U+DBFF U+DC00 to U+DFFF
Slide 48
Slide 48 text
Surrogate Pairs
JAVASCRIPT
- surrogate pairs -
2048 surrogate code points
included in the Basic Multilingual Plane
Leading/High Surrogates Trailing/Low Surrogates
U+D800 to U+DBFF U+DC00 to U+DFFF
C = (H - 0xD800) * 0x400 + L - 0xDC00 + 0x10000
Formula to get code point
C = (H - 55296) * 1024 + L - 56320 + 65536
JAVASCRIPT
- String.prototype.length -
This property returns
the number of code units in the string.
String.prototype.length
Slide 59
Slide 59 text
- the spread operator -
The spread operator works for
every iterable object.
[...'ABC']
JAVASCRIPT
Slide 60
Slide 60 text
- the spread operator -
The spread operator works for
every iterable object.
[...'ABC']
JAVASCRIPT
> ''[Symbol.iterator]
function [Symbol.iterator]() { [native code] }
Slide 61
Slide 61 text
- the spread operator -
[...] iterates over the code points of a String value,
returning each code point as a String value.
String.prototype [ @@iterator ]( )
JAVASCRIPT
Slide 62
Slide 62 text
62
Let's go back
to the examples
Slide 63
Slide 63 text
''.length
2
1 code point
but
2 code units
(surrogate pair)
Slide 64
Slide 64 text
'%'.length
4
2 code points
but
4 code units
(2 surrogate pairs)
+
Slide 65
Slide 65 text
''.length
8
5 code points
but
8 code units
(3 surrogate pairs)
ZWJ
ZWJ