Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
the world of characters
Search
Sponsored
·
Your Podcast. Everywhere. Effortlessly.
Share. Educate. Inspire. Entertain. You do you. We'll handle the rest.
→
orisano
September 13, 2018
8
1.5k
the world of characters
orisano
September 13, 2018
Tweet
Share
More Decks by orisano
See All by orisano
OSS Performance Tuning Tips
orisano
8
6.1k
Docker-Compose & BuildKit
orisano
4
1k
Container Build Talk
orisano
3
2.5k
dockerignore talk
orisano
2
7.2k
Better docker image+
orisano
6
6.4k
Socket.IO Introduction
orisano
0
3.3k
Profiling Go Application
orisano
11
8k
Multi-stage Builds Patterns & Practice
orisano
6
5.2k
better docker image
orisano
22
30k
Featured
See All Featured
Let's Do A Bunch of Simple Stuff to Make Websites Faster
chriscoyier
508
140k
ピンチをチャンスに:未来をつくるプロダクトロードマップ #pmconf2020
aki_iinuma
128
55k
How to Align SEO within the Product Triangle To Get Buy-In & Support - #RIMC
aleyda
1
1.4k
ラッコキーワード サービス紹介資料
rakko
1
2.3M
SEO for Brand Visibility & Recognition
aleyda
0
4.2k
AI Search: Where Are We & What Can We Do About It?
aleyda
0
7k
Speed Design
sergeychernyshev
33
1.5k
DevOps and Value Stream Thinking: Enabling flow, efficiency and business value
helenjbeal
1
100
Writing Fast Ruby
sferik
630
62k
Dealing with People You Can't Stand - Big Design 2015
cassininazir
367
27k
How to Ace a Technical Interview
jacobian
281
24k
Color Theory Basics | Prateek | Gurzu
gurzu
0
200
Transcript
1จࣈͷੈք @orisano
Έͳ͞Μ จࣈΛ͑ΒΕ·͢ΑͶʁ
a
a => 1
͋
͋ => 1
佛
佛 => 1
None
=> 1
None
=> 1
Z͑ͫ̓ͪ̂ͫ̽ ̴̙̤̞͉͚̯̞̠͍A̴̵̜̰͔ ͫ͗ ͢ L̠ͨͧͩ͘ G̴̻͈͍͔̹ ̑͗̎̅͛ ́ Ǫ̵̹̻̝̳ ͂̌
̌͘! ͖̬̰̙̗ ̿̋ ͥ ͥ̂ͣ̐́́͜͞
Z͑ͫ̓ͪ̂ͫ̽ ̴̙̤̞͉͚̯̞̠͍A̴̵̜̰͔ ͫ͗ ͢ L̠ͨͧͩ͘ G̴̻͈͍͔̹ ̑͗̎̅͛ ́ Ǫ̵̹̻̝̳ ͂̌
̌͘! ͖̬̰̙̗ ̿̋ ͥ ͥ̂ͣ̐́́͜͞ => 6
Έͳ͞Μ όΠτΛ͑ΒΕ·͔͢ʁ (UTF-8)
a
a => 1
͋
͋ => 3
佛
佛 => 4
None
=> 4
None
=> 18
Z͑ͫ̓ͪ̂ͫ̽ ̴̙̤̞͉͚̯̞̠͍A̴̵̜̰͔ ͫ͗ ͢ L̠ͨͧͩ͘ G̴̻͈͍͔̹ ̑͗̎̅͛ ́ Ǫ̵̹̻̝̳ ͂̌
̌͘! ͖̬̰̙̗ ̿̋ ͥ ͥ̂ͣ̐́́͜͞
Z͑ͫ̓ͪ̂ͫ̽ ̴̙̤̞͉͚̯̞̠͍A̴̵̜̰͔ ͫ͗ ͢ L̠ͨͧͩ͘ G̴̻͈͍͔̹ ̑͗̎̅͛ ́ Ǫ̵̹̻̝̳ ͂̌
̌͘! ͖̬̰̙̗ ̿̋ ͥ ͥ̂ͣ̐́́͜͞ => 143
͋ͳ͕ͨࢥ͏1จࣈ Ͳ͏͑Δ͖͔ʁ
byteͰ͑ΒΕͳ͍
Unicodeจࣈू߹ จࣈͱ͕ରԠ͢Δ
͋ => 3042
=> 1F914
͜ͷͷ͜ͱΛ ίʔυϙΠϯτ ͱݺͿ
͜ͷίʔυϙΠϯτΛ byteྻͰදݱ͢Δํ๏Λ ΤϯίʔσΟϯάͱ͍͏
UTF-8ͱ͔UTF-16ͱ͔ ΤϯίʔσΟϯάͷҰछ
ͱΓ͋͑ͣ ίʔυϙΠϯτΛ͑Ε ղܾʁ
͍͍͑
=> 1F468 + 200D + 1F469 + 200D + 1F466
࣮ෳͷίʔυϙΠϯτͰ ҰͭͷจࣈʹͳͬͨΓ͢Δ
ਓ͕ؒೝ͍ࣝͯ͠Δ̍จࣈ ॻهૉ(Grapheme cluster) ͱݺΕ͍ͯΔ
Ͳ͏Ε ίʔυϙΠϯτͷྻ͔Β ॻهૉΛऔΓग़ͤΔ͔
ίʔυϙΠϯτ͕ؒ ॻهૉڥքʹͳΔ͔Ͳ͏͔ͷ ݫີͳϧʔϧ͕͋Δ
UAX #29 Unicode Text Segmentation
None
͜ΕΛJSͰ࣮ͯ͠·ͨ͠ github.com/orisano/graphemesplit
ৄ͘͠ UAX #29 Λݟͯ http://unicode.org/reports/tr29/
ݟΒ͵ਓʹʓจࣈͱ ݴΘΕͨͱ͖ʹ ͪΌΜͱ֬ೝ͠Α͏ʂ
1 byte? 1 codepoint? 1 grapheme cluster?