Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
In the beginning was TXT
Search
Markus Wein
October 02, 2014
Programming
0
87
In the beginning was TXT
A very short overview of the history of encodings, given at Vienna.rb on 2014-10-02
Markus Wein
October 02, 2014
Tweet
Share
More Decks by Markus Wein
See All by Markus Wein
Command Line Productivity
cypher
1
120
A crash intro to deliberate practice
cypher
0
110
Keeping Your PostgreSQL Data Save
cypher
0
98
Ghost in the State Machine
cypher
2
290
n Things You Didn't Know About PostgreSQL (Rubyslava & PyVo 2014 Edition)
cypher
1
220
How to Become a Better Developer
cypher
2
1.8k
An Introduction to Rust
cypher
1
8k
How to Become a Better Developer
cypher
1
220
A Very Short Overview of Vagrant
cypher
0
7.7k
Other Decks in Programming
See All in Programming
Writing documentation can be fun with plugin system
okuramasafumi
0
120
AWS Organizations で実現する、 マルチ AWS アカウントのルートユーザー管理からの脱却
atpons
0
150
個人アプリを2年ぶりにアプデしたから褒めて / I just updated my personal app, praise me!
lovee
0
350
Unity Android XR入門
sakutama_11
0
160
Conform を推す - Advocating for Conform
mizoguchicoji
3
690
SwiftUI Viewの責務分離
elmetal
PRO
1
240
Pythonでもちょっとリッチな見た目のアプリを設計してみる
ueponx
1
570
SwiftUIで単方向アーキテクチャを導入して得られた成果
takuyaosawa
0
270
苦しいTiDBへの移行を乗り越えて快適な運用を目指す
leveragestech
0
610
CSS Linter による Baseline サポートの仕組み
ryo_manba
1
110
ファインディの テックブログ爆誕までの軌跡
starfish719
2
1.1k
PHP ステートレス VS ステートフル 状態管理と並行性 / php-stateless-stateful
ytake
0
100
Featured
See All Featured
Building a Modern Day E-commerce SEO Strategy
aleyda
38
7.1k
Music & Morning Musume
bryan
46
6.3k
10 Git Anti Patterns You Should be Aware of
lemiorhan
PRO
656
59k
Measuring & Analyzing Core Web Vitals
bluesmoon
6
240
The Success of Rails: Ensuring Growth for the Next 100 Years
eileencodes
44
7k
Why You Should Never Use an ORM
jnunemaker
PRO
55
9.2k
Chrome DevTools: State of the Union 2024 - Debugging React & Beyond
addyosmani
4
330
Building Your Own Lightsaber
phodgson
104
6.2k
Navigating Team Friction
lara
183
15k
Building Flexible Design Systems
yeseniaperezcruz
328
38k
XXLCSS - How to scale CSS and keep your sanity
sugarenia
248
1.3M
How GitHub (no longer) Works
holman
314
140k
Transcript
In the beginning was TXT
!
EBCDIC
Source: http://en.wikipedia.org/wiki/EBCDIC
ASCII
"#$%&
None
ä, ö, or å and Ø?
Latin-1 ISO/IEC 8859-1
Latin-*
Windows code pages
Then came the €
(
None
Shift-JIS
This sucks
Unicode!
Unicode!
✈️ (planes!)
Basic Multilingual Plane
Code Points
U+0041 (LATIN SMALL LETTER A)
Source: http://codepoints.net/U+0041
Grapheme
a a a a a a a
Composite characters
U+0065 U+0301 or U+00E9
e+´ => é é
´ != ´
Unicode… is not an encoding
UTF-32
UCS-2/UTF-16
UTF-8
Source: http://en.wikipedia.org/wiki/File:UnicodeGrow2b.png
What does it look like?
Codepoint Char ASCII Latin-1 ISO-8859-15 UTF-8 UTF-16 U+0041 A 0x41
0x41 0x41 0x41 0x00 0x41 U+00C4 Ä - 0xc4 0xc4 0xc3 0x84 0x00 0xc4 U+20AC € - - 0xa4 0xe3 0x82 0xac 0x20 0xac U+C218 ࣻ - - - 0xec 0x88 0x98 0xc2 0x18 Encoding comparison Source: http://perlgeek.de/en/article/encodings-and-unicode
Remember: Just because someone claims it’s UTF-8, doesn’t mean it
is