Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
In the beginning was TXT
Search
Markus Wein
October 02, 2014
Programming
0
60
In the beginning was TXT
A very short overview of the history of encodings, given at Vienna.rb on 2014-10-02
Markus Wein
October 02, 2014
Tweet
Share
More Decks by Markus Wein
See All by Markus Wein
Command Line Productivity
cypher
1
100
A crash intro to deliberate practice
cypher
0
100
Keeping Your PostgreSQL Data Save
cypher
0
79
Ghost in the State Machine
cypher
2
270
n Things You Didn't Know About PostgreSQL (Rubyslava & PyVo 2014 Edition)
cypher
1
190
How to Become a Better Developer
cypher
2
1.8k
An Introduction to Rust
cypher
1
7.8k
How to Become a Better Developer
cypher
1
200
A Very Short Overview of Vagrant
cypher
0
7.6k
Other Decks in Programming
See All in Programming
敵対的ポイフル
futabato
0
140
GitHub Copilotのススメ
marcy731
1
230
Python3.12のWhat's New から f-string の変更だけ読んでみた
2323code
0
100
Netty Chicago Java User Group 2024-04-17
sullis
0
210
Three ways to use AI on Android: The Good, the Bad and the Ugly
marxallski
0
110
OpenAPIを中心に考えるAPI開発入門 / Introduction to API Development with a Focus on OpenAPI
seike460
PRO
2
180
Balkan Ruby 2024 — How and why to run SQLite on Rails in production
fractaledmind
0
100
Go製Webアプリケーションのエラーとの向き合い方大全、あるいはやっぱりスタックトレース欲しいやん / Kyoto.go #50
utgwkk
6
1.8k
Code Reviews
bkuhlmann
4
900
見た目から始める生産性向上
ikumatadokoro
10
1.5k
Goのエラースタックトレースの歴史と今後
sonatard
10
1.8k
デフォルトにして至高、RubyMineの大好きな所
ruzia
0
940
Featured
See All Featured
The Psychology of Web Performance [Beyond Tellerrand 2023]
tammyeverts
15
1.6k
What’s in a name? Adding method to the madness
productmarketing
PRO
17
2.7k
How STYLIGHT went responsive
nonsquared
92
4.8k
The Illustrated Children's Guide to Kubernetes
chrisshort
32
46k
How to Create Impact in a Changing Tech Landscape [PerfNow 2023]
tammyeverts
21
1.6k
In The Pink: A Labor of Love
frogandcode
138
21k
Fantastic passwords and where to find them - at NoRuKo
philnash
39
2.5k
Design by the Numbers
sachag
274
18k
Done Done
chrislema
178
15k
The Straight Up "How To Draw Better" Workshop
denniskardys
228
130k
Agile that works and the tools we love
rasmusluckow
325
20k
Easily Structure & Communicate Ideas using Wireframe
afnizarnur
188
16k
Transcript
In the beginning was TXT
!
EBCDIC
Source: http://en.wikipedia.org/wiki/EBCDIC
ASCII
"#$%&
None
ä, ö, or å and Ø?
Latin-1 ISO/IEC 8859-1
Latin-*
Windows code pages
Then came the €
(
None
Shift-JIS
This sucks
Unicode!
Unicode!
✈️ (planes!)
Basic Multilingual Plane
Code Points
U+0041 (LATIN SMALL LETTER A)
Source: http://codepoints.net/U+0041
Grapheme
a a a a a a a
Composite characters
U+0065 U+0301 or U+00E9
e+´ => é é
´ != ´
Unicode… is not an encoding
UTF-32
UCS-2/UTF-16
UTF-8
Source: http://en.wikipedia.org/wiki/File:UnicodeGrow2b.png
What does it look like?
Codepoint Char ASCII Latin-1 ISO-8859-15 UTF-8 UTF-16 U+0041 A 0x41
0x41 0x41 0x41 0x00 0x41 U+00C4 Ä - 0xc4 0xc4 0xc3 0x84 0x00 0xc4 U+20AC € - - 0xa4 0xe3 0x82 0xac 0x20 0xac U+C218 ࣻ - - - 0xec 0x88 0x98 0xc2 0x18 Encoding comparison Source: http://perlgeek.de/en/article/encodings-and-unicode
Remember: Just because someone claims it’s UTF-8, doesn’t mean it
is