Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
ASCII vs. Unicode in Python 2
Search
Sponsored
·
Your Podcast. Everywhere. Effortlessly.
Share. Educate. Inspire. Entertain. You do you. We'll handle the rest.
→
Jessica McElroy
November 03, 2014
250
0
Share
Embed
Copy iframe code
Copy JS code
Copy link
Start on current slide
ASCII vs. Unicode in Python 2
Jessica McElroy
November 03, 2014
Featured
See All Featured
Code Reviewing Like a Champion
maltzj
528
40k
30 Presentation Tips
portentint
PRO
1
320
Leadership Guide Workshop - DevTernity 2021
reverentgeek
1
300
Principles of Awesome APIs and How to Build Them.
keavy
128
17k
How to build a perfect <img>
jonoalderson
1
5.6k
<Decoding/> the Language of Devs - We Love SEO 2024
nikkihalliwell
1
240
Darren the Foodie - Storyboard
khoart
PRO
3
3.4k
DBのスキルで生き残る技術 - AI時代におけるテーブル設計の勘所
soudai
PRO
65
55k
実際に使うSQLの書き方 徹底解説 / pgcon21j-tutorial
soudai
PRO
201
75k
JAMstack: Web Apps at Ludicrous Speed - All Things Open 2022
reverentgeek
1
460
The #1 spot is gone: here's how to win anyway
tamaranovitovic
2
1.1k
Collaborative Software Design: How to facilitate domain modelling decisions
baasie
1
240
Transcript
ascii vs. ṲἤЇℭ☮Ѐ Tech Talk, Hackbright Fall 2014
Jessica McElroy
All data is stored as bits. Bits are
meaningless unHl they are encoded.
Historically, ASCII has been the most common standard
for encoding bits into text.
H 01001000 A 01000001
C 01000011 K 01001011 ASCII can represent up to 2 ^ 8 (256) characters
But the world needs more than 256 characters!
Enter unicode: The universal character set!
How does unicode work? Think of it like a
database that maps any symbol you can think of to a number (code point) and to a unique name. Symbol/glyph Code Point Unique Name
U+0041 A LATIN CAPITAL LETTER A
U+03A9 GREEK CAPITAL LETTER OMEGA Ω
U+060F ARABIC SIGN MISRA ؏
U+1F648 SEE-‐NO-‐EVIL MONKEY
Python 2 has two different built-‐in types
for text.
>> text_str = “Let’s go to a café.” >>
type(text_str) <type ‘str’> >> text_uni = u“Let’s go to a café.” >> type(text_uni) <type ‘unicode’>
Unicode tends to be more accurate.
>> text_str = “Let’s go to a café.” >>
len(text_str) 20 >> text_uni = u“Let’s go to a café.” >> len(text_uni) 19
>> tomorrow = u”Hasta ma\u00f1ana.” >> print tomorrow Hasta mañana.
>> love = u“I \U00002764 Hackbright!” >> print love I ❤ Hackbright! Unicode escape sequences
ASCII to Unicode and back
Bytes .decode() à unicode Unicode .encode() à bytes
Tips for unicode-‐aware programs
Make a unicode sandwich 1) Decode early 2)
Unicode everywhere 3) Encode late
Sample funcHon Find it here: hhps://gist.github.com/jmcelroy5/81bf945a9aff01ea9f0d
Include Unicode in Unit Tests ґ℮αḓαbℓℯ ßü☂ ηø☂ αṧ¢I
TesHng monkeys: ✞ε✖⊥ ḟ◎я ⊥εṧтḯᾔℊ
Hang in there – Python 3 fixes the
unicode problem!
References hhp://www.unicode.org/charts/ hhp://www.joelonso•ware.com/arHcles/Unicode.html hhp://nedbatchelder.com/text/unipain/unipain.html#1
hhp://blog.notdot.net/2010/07/Ge‚ng-‐unicode-‐right-‐in-‐Python hhps://mathiasbynens.be/notes/javascript-‐unicode hhps://pythonhosted.org/kitchen/unicode-‐frustraHons.html