Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
In the beginning was TXT
Search
Markus Wein
October 02, 2014
Programming
0
91
In the beginning was TXT
A very short overview of the history of encodings, given at Vienna.rb on 2014-10-02
Markus Wein
October 02, 2014
Tweet
Share
More Decks by Markus Wein
See All by Markus Wein
Command Line Productivity
cypher
1
120
A crash intro to deliberate practice
cypher
0
110
Keeping Your PostgreSQL Data Save
cypher
0
100
Ghost in the State Machine
cypher
2
290
n Things You Didn't Know About PostgreSQL (Rubyslava & PyVo 2014 Edition)
cypher
1
220
How to Become a Better Developer
cypher
2
1.8k
An Introduction to Rust
cypher
1
8k
How to Become a Better Developer
cypher
1
220
A Very Short Overview of Vagrant
cypher
0
7.7k
Other Decks in Programming
See All in Programming
Devin入門 〜月500ドルから始まるAIチームメイトとの開発生活〜 / Introduction Devin 〜Development With AI Teammates〜
rkaga
6
2.2k
PHPによる"非"構造化プログラミング入門 -本当に熱いスパゲティコードを求めて- by きんじょうひでき
o0h
PRO
0
590
AtCoder Heuristic First-step Vol.1 講義スライド
terryu16
2
410
「その気にさせる」エンジニアが 最強のリーダーになる理由
gimupop
3
420
Duke on CRaC with Jakarta EE
ivargrimstad
0
1k
Generative AI for Beginners .NETの紹介
tomokusaba
1
260
Lambdaの監視、できてますか?Datadogを用いてLambdaを見守ろう
nealle
2
960
Devin , 正しい付き合い方と使い方 / Living and Working with Devin
yukinagae
1
430
JavaOne 2025: Advancing Java Profiling
jbachorik
1
240
AI時代のプログラミング教育 / programming education in ai era
kishida
22
16k
Google Cloudとo11yで実現するアプリケーション開発者主体のDB改善
nnaka2992
1
200
신입 안드로이드 개발자의 AI 스타트업 생존기 (+ Native C++ Code를 Android에서 사용해보기)
dygames
0
450
Featured
See All Featured
Automating Front-end Workflow
addyosmani
1369
200k
Improving Core Web Vitals using Speculation Rules API
sergeychernyshev
11
580
It's Worth the Effort
3n
184
28k
10 Git Anti Patterns You Should be Aware of
lemiorhan
PRO
656
60k
Distributed Sagas: A Protocol for Coordinating Microservices
caitiem20
330
21k
Making the Leap to Tech Lead
cromwellryan
133
9.1k
KATA
mclloyd
29
14k
Designing Experiences People Love
moore
140
23k
The Pragmatic Product Professional
lauravandoore
32
6.5k
The World Runs on Bad Software
bkeepers
PRO
67
11k
Agile that works and the tools we love
rasmusluckow
328
21k
How to Create Impact in a Changing Tech Landscape [PerfNow 2023]
tammyeverts
50
2.3k
Transcript
In the beginning was TXT
!
EBCDIC
Source: http://en.wikipedia.org/wiki/EBCDIC
ASCII
"#$%&
None
ä, ö, or å and Ø?
Latin-1 ISO/IEC 8859-1
Latin-*
Windows code pages
Then came the €
(
None
Shift-JIS
This sucks
Unicode!
Unicode!
✈️ (planes!)
Basic Multilingual Plane
Code Points
U+0041 (LATIN SMALL LETTER A)
Source: http://codepoints.net/U+0041
Grapheme
a a a a a a a
Composite characters
U+0065 U+0301 or U+00E9
e+´ => é é
´ != ´
Unicode… is not an encoding
UTF-32
UCS-2/UTF-16
UTF-8
Source: http://en.wikipedia.org/wiki/File:UnicodeGrow2b.png
What does it look like?
Codepoint Char ASCII Latin-1 ISO-8859-15 UTF-8 UTF-16 U+0041 A 0x41
0x41 0x41 0x41 0x00 0x41 U+00C4 Ä - 0xc4 0xc4 0xc3 0x84 0x00 0xc4 U+20AC € - - 0xa4 0xe3 0x82 0xac 0x20 0xac U+C218 ࣻ - - - 0xec 0x88 0x98 0xc2 0x18 Encoding comparison Source: http://perlgeek.de/en/article/encodings-and-unicode
Remember: Just because someone claims it’s UTF-8, doesn’t mean it
is