Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Speaker Deck
PRO
Sign in
Sign up for free
Unicode, JavaScript and the Emoji family
stefan judis
November 07, 2016
Technology
4
1.9k
Unicode, JavaScript and the Emoji family
stefan judis
November 07, 2016
Tweet
Share
More Decks by stefan judis
See All by stefan judis
Wanna scale up? Make sure your CMS is ready for it!
stefanjudis
0
98
Did we(b development) lose the right direction?
stefanjudis
6
1.8k
Regular expressions – my secret love
stefanjudis
0
820
Write a Function
stefanjudis
0
380
React in a worker, worker, worker...
stefanjudis
1
300
HTTP headers for the responsible developer
stefanjudis
5
3.7k
I've got 99 problems the server ain't one
stefanjudis
1
170
When a CMS is not enough
stefanjudis
1
480
Markdown, my friend – we have to talk
stefanjudis
2
470
Other Decks in Technology
See All in Technology
Learning from AWS Customer Security Incidents [2022]
ramimac
0
1.6k
Data Warehouse or Data Lake, which one do I choose?
ahana
0
150
AWS ChatbotでEC2インスタンスを 起動できるようにした
iwamot
0
170
Puny to Powerful PostgreSQL Rails Apps
andyatkinson
PRO
0
410
Steps toward self-service operations in eureka
fukubaka0825
0
1k
ソフトウェアテストで参考にしている67のモノ #scrumniigata / 67 things for software testing
kyonmm
PRO
1
840
Microsoft Build 2022 - Azure のデータ & 分析サービス 最新アップデート / Microsoft Build 2022 Updates on Azure Data and Analytics Services
nakazax
1
220
GitHub Actionsを使用してGoogle Play Consoleに自動アップロード
takenaga7
0
260
開発者のための GitHub Organization の安全な運用と 継続的なモニタリング
flatt_security
3
4k
YAMLを書くだけで構築できる分散ストレージ
sat
PRO
0
230
長年運用されてきたモノリシックアプリケーションをコンテナ化しようとするとどんな問題に遭遇するか? / SRE NEXT 2022
nulabinc
PRO
15
8.1k
Kubernetesでハマるメタバースとエッジで夢見る世界観
yudaiono
0
130
Featured
See All Featured
Done Done
chrislema
174
14k
Art Directing for the Web. Five minutes with CSS Template Areas
malarkey
196
9.4k
Why Our Code Smells
bkeepers
PRO
324
54k
Git: the NoSQL Database
bkeepers
PRO
415
59k
jQuery: Nuts, Bolts and Bling
dougneiner
56
6.4k
Adopting Sorbet at Scale
ufuk
63
7.5k
Dealing with People You Can't Stand - Big Design 2015
cassininazir
350
21k
Docker and Python
trallard
27
1.5k
ピンチをチャンスに:未来をつくるプロダクトロードマップ #pmconf2020
aki_i
21
15k
Typedesign – Prime Four
hannesfritz
33
1.3k
Debugging Ruby Performance
tmm1
65
10k
Visualization
eitanlees
124
11k
Transcript
[...''] = ['', '', '', '', ''] Unicode, JavaScript and
the Emoji family @stefanjudis
Stefan Judis Frontend Developer, Occasional Teacher, Meetup Organizer ❤ Open
Source, Performance and Accessibility ❤ @stefanjudis
cssclass.es
Stefan Judis Frontend Developer, Occasional Teacher, Meetup Organizer ❤ Open
Source, Performance and Accessibility ❤ @stefanjudis
''.length
''.length 2
'%'.length
'%'.length 4
''.length
''.length 8
[...'']
[...''] ['', '', '', '', ''] length = 5
Okay! What's going on here?
It's all about Unicode
UNICODE ... is an international encoding standard 01 02 03
is a mapping from each letter, digit or symbol to a numeric value works across different platforms and programs
U+0000 to U+10FFFF 1,114,112 code points usually formatted as hexadecimal
numbers from UNICODE - overview -
1,114,112 code points in 17 planes Basic Multilingual Plane U+0000
to U+FFFF Supplementary Planes u+10000 to U+10FFFF U+10000 to U+1FFFF U+20000 to U+2FFFF U+30000 to U+DFFFF U+E0000 to U+EFFFF U+F0000 to U+10FFFF Supplementary Multilingual Plane Supplementary Ideographic Plane Supplementary Special-purpose Plane Supplementary Private Use Area Planes unassigned 1 plane 1 plane 1 plane 1 plane 2 planes 16 planes 11 planes UNICODE - overview -
characters for almost all modern languages + a lot of
of symbols Basic Multilingual Plane U+0000 to U+FFFF Supplementary Planes U+10000 to U+10FFFF U+10000 to U+1FFFF U+20000 to U+2FFFF U+30000 to U+DFFFF U+E0000 to U+EFFFF U+F0000 to U+10FFFF Supplementary Multilingual Plane Supplementary Ideographic Plane Supplementary Special-purpose Plane Supplementary Private Use Area Planes unassigned 1 plane 1 plane 1 plane 1 plane 2 planes 16 planes 11 planes UNICODE - Basic Multilingual Plane -
everything else Basic Multilingual Plane U+0000 to U+FFFF Supplementary Planes
U+10000 to U+10FFFF U+10000 to U+1FFFF U+20000 to U+2FFFF U+30000 to U+DFFFF U+E0000 to U+EFFFF U+F0000 to U+10FFFF Supplementary Multilingual Plane Supplementary Ideographic Plane Supplementary Special-purpose Plane Supplementary Private Use Area Planes unassigned 1 plane 1 plane 1 plane 1 plane 2 planes 16 planes 11 planes UNICODE - Supplementary Planes -
Emojis
EMOJIS ... were initially used by Japanese mobile operators 01
02 03 were added to Unicode v6 in October 2010 are supported since OS X 10.7 (Lion) and Windows 8
Basic Multilingual Plane U+0000 to U+FFFF Supplementary Planes U+10000 to
U+10FFFF U+10000 to U+1FFFF U+20000 to U+2FFFF U+30000 to U+DFFFF U+E0000 to U+EFFFF U+F0000 to U+10FFFF Supplementary Multilingual Plane Supplementary Ideographic Plane Supplementary Special-purpose Plane Supplementary Private Use Area Planes unassigned 1 Plane 1 Plane 1 Plane 1 Plane 2 Planes 16 Planes 11 Planes %' are in the Supplementary Multilingual Plane EMOJIS - overview -
How many Emojis are out there? EMOJIS - overview -
How many Emojis are out there? EMOJIS - overview -
It depends how you count.
Modifier Sequences Five modifiers for diversity U+1F3FB U+1F3FC U+1F3FD U+1F3FE
U+1F3FF
Modifier Sequences Five modifiers for diversity U+1F3FB U+1F3FC U+1F3FD U+1F3FE
U+1F3FF ) = + U+1F3FD U+1F466 ( 2 code points )
EMOJIS ZERO WIDTH JOINER U+200D Indicator that a single glyph
should be presented for a sequence of characters - ZWJ sequences -
EMOJIS U+1F46A - ZWJ sequences - ( 1 code point
)
EMOJIS * - ZWJ sequences -
EMOJIS - ZWJ sequences - * U+1F468 + ZWJ U+200D
+ U+1F468 U+1F467 + ZWJ U+200D + ( 5 code points )
EMOJIS - ZWJ sequences - woman astronaut ( 4 code
points ) ZWJ + + man artist ( 4 code points ) ZWJ + + man getting hair cut ( 4 code points ) ♂ ZWJ + + - woman mountain biking ( 4 code points ) ♀ ZWJ + + /
EMOJIS - ZWJ sequences - woman astronaut ( 4 code
points ) ZWJ + + man artist ( 4 code points ) ZWJ + + man getting hair cut ( 4 code points ) ♂ ZWJ + + - woman mountain biking ( 4 code points ) ♀ ZWJ + + / "David Bowie" - Singer - ZWJ + + Apple Google ZWJ + +
EMOJIS - ZWJ sequences - woman astronaut ( 4 code
points ) ZWJ + + man artist ( 4 code points ) ZWJ + + man getting hair cut ( 4 code points ) ♂ ZWJ + + - woman mountain biking ( 4 code points ) ♀ ZWJ + + / "David Bowie" Emoji is not yet supported.
EMOJIS - ZWJ sequences - woman astronaut ( 4 code
points ) ZWJ + + man artist ( 4 code points ) ZWJ + + man getting hair cut ( 4 code points ) ♂ ZWJ + + - woman mountain biking ( 4 code points ) ♀ ZWJ + + / Sequences degrade gracefully! '\u{1F468}\u{200D}\u{1F3A4}' "" '\u{1F469}\u{200D}\u{1F3A4}' ""
EMOJIS - flags - ... 26 regional indicators used in
pairs to represent regions U+1F1E6 U+1F1FF
EMOJIS - flags - ... 26 regional indicators used in
pairs to represent regions U+1F1E6 U+1F1FF 7 U+1F1E9 U+1F1EA : U+1F1EC U+1F1E7 < U+1F1E8 U+1F1FD ( 2 code points ) ( 2 code points ) ( 2 code points )
EMOJIS - flags - www.dwitter.net/d/2708 function() { x.font='96px a' S=String.fromCodePoint
W=e=>x.measureText(e).width i=t*4%257|0 W(S(F=0x1F1E6,F))>W(_=S(F+i%26,F+i/26|0))&&x.fillText(_,9,99) } Dweet by @veubeke
How many Emojis are out there? EMOJIS - overview -
2198 unicode.org/reports/tr51/#Identification (excluding incomplete singletons) (excluding duplicates) (including all combined sequences)
39 What about Unicode in JavaScript
JAVASCRIPT UTF-16, the string format used by JavaScript, uses a
single 16-bit code unit to represent the most common characters. - string representation -
16-bit code unit 65536 code points JAVASCRIPT - string representation
-
\u0000 - \uFFFF can fit into 16bit ツ ('\uFF82')
('\uF8FF') ‚ ('\u9731') ⛷ ('\u26F7') JAVASCRIPT - characters with one code unit -
\u0000 - \uFFFF can fit into 16bit 'ツ'.length ''.length '‚'.length
'⛷'.length 1 JAVASCRIPT - characters with one code unit -
How can we use code points out of the 16bit
range? JAVASCRIPT - surrogate pairs -
Surrogate Pairs JAVASCRIPT - surrogate pairs - 2048 surrogate code
points included in the Basic Multilingual Plane
Surrogate Pairs JAVASCRIPT - surrogate pairs - 2048 surrogate code
points included in the Basic Multilingual Plane Leading/High Surrogates U+D800 to U+DBFF
Surrogate Pairs JAVASCRIPT - surrogate pairs - 2048 surrogate code
points included in the Basic Multilingual Plane Leading/High Surrogates Trailing/Low Surrogates U+D800 to U+DBFF U+DC00 to U+DFFF
Surrogate Pairs JAVASCRIPT - surrogate pairs - 2048 surrogate code
points included in the Basic Multilingual Plane Leading/High Surrogates Trailing/Low Surrogates U+D800 to U+DBFF U+DC00 to U+DFFF C = (H - 0xD800) * 0x400 + L - 0xDC00 + 0x10000 Formula to get code point C = (H - 55296) * 1024 + L - 56320 + 65536
Surrogate Pairs JAVASCRIPT - surrogate pairs - ''.length // 2
U+1F468 128104
Surrogate Pairs JAVASCRIPT - surrogate pairs - ''.charCodeAt(0) U+D83D 55357
U+1F468 128104 ''.length // 2
Surrogate Pairs JAVASCRIPT - surrogate pairs - ''.charCodeAt(0) U+D83D 55357
''.charCodeAt(1) U+DC68 56424 U+1F468 128104 ''.length // 2
Surrogate Pairs JAVASCRIPT - surrogate pairs - ''.charCodeAt(0) U+D83D 55357
''.charCodeAt(1) U+DC68 56424 U+1F468 128104 0x1F468 = (0xD83D - 0xD800) * 0x400 + 0xDC68 - 0xDC00 + 0x10000 128104 = (55357 - 55296) * 1024 + 56424 - 56320 + 65536 ''.length // 2
Surrogate Pairs JAVASCRIPT - surrogate pairs - ''.charCodeAt(0) U+D83D 55357
''.charCodeAt(1) U+DC68 56424 U+1F468 128104 0x1F468 = (0xD83D - 0xD800) * 0x400 + 0xDC68 - 0xDC00 + 0x10000 128104 = (55357 - 55296) * 1024 + 56424 - 56320 + 65536 ''.length // 2
charCodeAt() vs codePointAt() JAVASCRIPT - surrogate pairs - U+1F468 128104
''.codePointAt(0) U+1F468 128104 ''.codePointAt(1) U+DC68 56424 ''.charCodeAt(0) U+D83D 55357 ''.charCodeAt(1) U+DC68 56424
charCodeAt() vs codePointAt() JAVASCRIPT - surrogate pairs - U+1F468 128104
''.codePointAt(0) U+1F468 128104 ''.codePointAt(1) U+DC68 56424 ''.charCodeAt(0) U+D83D 55357 ''.charCodeAt(1) U+DC68 56424
JAVASCRIPT - surrogate pairs - U+1F468 128104 '\uD83D\uDC68' simple Unicode
escapes Unicode code point escapes '\u{1F468}'
57 Okay, what's the deal?
JAVASCRIPT - String.prototype.length - This property returns the number of
code units in the string. String.prototype.length
- the spread operator - The spread operator works for
every iterable object. [...'ABC'] JAVASCRIPT
- the spread operator - The spread operator works for
every iterable object. [...'ABC'] JAVASCRIPT > ''[Symbol.iterator] function [Symbol.iterator]() { [native code] }
- the spread operator - [...] iterates over the code
points of a String value, returning each code point as a String value. String.prototype [ @@iterator ]( ) JAVASCRIPT
62 Let's go back to the examples
''.length 2 1 code point but 2 code units (surrogate
pair)
'%'.length 4 2 code points but 4 code units (2
surrogate pairs) +
''.length 8 5 code points but 8 code units (3
surrogate pairs) ZWJ ZWJ
[...''] ['', '', '', '', ''] U+200D (ZWJ) U+1F468 U+1F469
U+1F466 U+200D (ZWJ)
Thanks! @stefanjudis Slides ctfl.io/javascript-emoji-family Article ctfl.io/emoji-prototype-dot-length