Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
In the beginning was TXT
Search
Markus Wein
October 02, 2014
Programming
0
120
In the beginning was TXT
A very short overview of the history of encodings, given at Vienna.rb on 2014-10-02
Markus Wein
October 02, 2014
Tweet
Share
More Decks by Markus Wein
See All by Markus Wein
Command Line Productivity
cypher
1
160
A crash intro to deliberate practice
cypher
0
120
Keeping Your PostgreSQL Data Save
cypher
0
140
Ghost in the State Machine
cypher
2
340
n Things You Didn't Know About PostgreSQL (Rubyslava & PyVo 2014 Edition)
cypher
1
250
How to Become a Better Developer
cypher
2
1.8k
An Introduction to Rust
cypher
1
8.3k
How to Become a Better Developer
cypher
1
240
A Very Short Overview of Vagrant
cypher
0
8k
Other Decks in Programming
See All in Programming
ふつうの Rubyist、ちいさなデバイス、大きな一年
bash0c7
0
830
「抽象に依存せよ」が分からなかった新卒1年目の私が Goのインターフェースと和解するまで
kurogenki
0
100
The Ralph Wiggum Loop: First Principles of Autonomous Development
sembayui
0
3.7k
Claude Codeログ基盤の構築
giginet
PRO
7
2.7k
エラーログのマスキングの仕組みづくりに役立ったASTの話
kumoichi
0
180
Codex の「自走力」を高める
yorifuji
0
1.2k
CSC307 Lecture 14
javiergs
PRO
0
470
エンジニアの「手元の自動化」を加速するn8n 2026.02.27
symy2co
0
130
技術検証結果の整理と解析をAIに任せよう!
keisukeikeda
0
110
The Past, Present, and Future of Enterprise Java
ivargrimstad
0
500
encoding/json/v2のUnmarshalはこう変わった:内部実装で見る設計改善
kurakura0916
0
400
S3ストレージクラスの「見える」「ある」「使える」は全部違う ─ 体験から見た、仕様の深淵を覗く
ya_ma23
0
320
Featured
See All Featured
Leveraging LLMs for student feedback in introductory data science courses - posit::conf(2025)
minecr
1
190
We Have a Design System, Now What?
morganepeng
55
8k
Embracing the Ebb and Flow
colly
88
5k
Tell your own story through comics
letsgokoyo
1
840
Music & Morning Musume
bryan
47
7.1k
Groundhog Day: Seeking Process in Gaming for Health
codingconduct
0
120
The Impact of AI in SEO - AI Overviews June 2024 Edition
aleyda
5
760
The Curse of the Amulet
leimatthew05
1
9.8k
Building a A Zero-Code AI SEO Workflow
portentint
PRO
0
380
Context Engineering - Making Every Token Count
addyosmani
9
740
Deep Space Network (abreviated)
tonyrice
0
88
A better future with KSS
kneath
240
18k
Transcript
In the beginning was TXT
!
EBCDIC
Source: http://en.wikipedia.org/wiki/EBCDIC
ASCII
"#$%&
None
ä, ö, or å and Ø?
Latin-1 ISO/IEC 8859-1
Latin-*
Windows code pages
Then came the €
(
None
Shift-JIS
This sucks
Unicode!
Unicode!
✈️ (planes!)
Basic Multilingual Plane
Code Points
U+0041 (LATIN SMALL LETTER A)
Source: http://codepoints.net/U+0041
Grapheme
a a a a a a a
Composite characters
U+0065 U+0301 or U+00E9
e+´ => é é
´ != ´
Unicode… is not an encoding
UTF-32
UCS-2/UTF-16
UTF-8
Source: http://en.wikipedia.org/wiki/File:UnicodeGrow2b.png
What does it look like?
Codepoint Char ASCII Latin-1 ISO-8859-15 UTF-8 UTF-16 U+0041 A 0x41
0x41 0x41 0x41 0x00 0x41 U+00C4 Ä - 0xc4 0xc4 0xc3 0x84 0x00 0xc4 U+20AC € - - 0xa4 0xe3 0x82 0xac 0x20 0xac U+C218 ࣻ - - - 0xec 0x88 0x98 0xc2 0x18 Encoding comparison Source: http://perlgeek.de/en/article/encodings-and-unicode
Remember: Just because someone claims it’s UTF-8, doesn’t mean it
is