Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
In the beginning was TXT
Search
Sponsored
·
Your Podcast. Everywhere. Effortlessly.
Share. Educate. Inspire. Entertain. You do you. We'll handle the rest.
→
Markus Wein
October 02, 2014
Programming
130
0
Share
In the beginning was TXT
A very short overview of the history of encodings, given at Vienna.rb on 2014-10-02
Markus Wein
October 02, 2014
More Decks by Markus Wein
See All by Markus Wein
Command Line Productivity
cypher
1
160
A crash intro to deliberate practice
cypher
0
130
Keeping Your PostgreSQL Data Save
cypher
0
150
Ghost in the State Machine
cypher
2
340
n Things You Didn't Know About PostgreSQL (Rubyslava & PyVo 2014 Edition)
cypher
1
260
How to Become a Better Developer
cypher
2
1.8k
An Introduction to Rust
cypher
1
8.3k
How to Become a Better Developer
cypher
1
250
A Very Short Overview of Vagrant
cypher
0
8k
Other Decks in Programming
See All in Programming
Spec Driven Development | AI Summit Vilnius
danielsogl
PRO
1
120
Kubernetes上でAgentを動かすための最新動向と押さえるべき概念まとめ
sotamaki0421
3
730
PHPer、Cloudflare に引っ越す
suguruooki
1
110
PicoRuby for IoT: Connecting to the Cloud with MQTT
yuuu
2
690
CursorとClaudeCodeとCodexとOpenCodeを実際に比較してみた
terisuke
1
500
UIの境界線をデザインする | React Tokyo #15 メイントーク
sasagar
2
390
AI時代のエンジニアリングの原則 / Engineering Principles in the AI Era
haru860
0
820
From Formal Specification to Property Based Test
ohbarye
0
440
運転動画を検索可能にする〜Cosmos-Embed1とDatabricks Vector Searchで〜/cosmos-embed1-databricks-vector-search
studio_graph
1
490
CDK Deployのための ”反響定位”
watany
5
890
TiDBのアーキテクチャから学ぶ分散システム入門 〜MySQL互換のNewSQLは何を解決するのか〜 / tidb-architecture-study
dznbk
1
190
Claude CodeでETLジョブ実行テストを自動化してみた
yoshikikasama
0
990
Featured
See All Featured
For a Future-Friendly Web
brad_frost
183
10k
The Cult of Friendly URLs
andyhume
79
6.9k
Rails Girls Zürich Keynote
gr2m
96
14k
Getting science done with accelerated Python computing platforms
jacobtomlinson
2
190
How to make the Groovebox
asonas
2
2.1k
The Director’s Chair: Orchestrating AI for Truly Effective Learning
tmiket
1
160
Impact Scores and Hybrid Strategies: The future of link building
tamaranovitovic
0
260
Game over? The fight for quality and originality in the time of robots
wayneb77
1
160
Leveraging LLMs for student feedback in introductory data science courses - posit::conf(2025)
minecr
1
240
<Decoding/> the Language of Devs - We Love SEO 2024
nikkihalliwell
1
200
Creating an realtime collaboration tool: Agile Flush - .NET Oxford
marcduiker
35
2.4k
Dealing with People You Can't Stand - Big Design 2015
cassininazir
367
27k
Transcript
In the beginning was TXT
!
EBCDIC
Source: http://en.wikipedia.org/wiki/EBCDIC
ASCII
"#$%&
None
ä, ö, or å and Ø?
Latin-1 ISO/IEC 8859-1
Latin-*
Windows code pages
Then came the €
(
None
Shift-JIS
This sucks
Unicode!
Unicode!
✈️ (planes!)
Basic Multilingual Plane
Code Points
U+0041 (LATIN SMALL LETTER A)
Source: http://codepoints.net/U+0041
Grapheme
a a a a a a a
Composite characters
U+0065 U+0301 or U+00E9
e+´ => é é
´ != ´
Unicode… is not an encoding
UTF-32
UCS-2/UTF-16
UTF-8
Source: http://en.wikipedia.org/wiki/File:UnicodeGrow2b.png
What does it look like?
Codepoint Char ASCII Latin-1 ISO-8859-15 UTF-8 UTF-16 U+0041 A 0x41
0x41 0x41 0x41 0x00 0x41 U+00C4 Ä - 0xc4 0xc4 0xc3 0x84 0x00 0xc4 U+20AC € - - 0xa4 0xe3 0x82 0xac 0x20 0xac U+C218 ࣻ - - - 0xec 0x88 0x98 0xc2 0x18 Encoding comparison Source: http://perlgeek.de/en/article/encodings-and-unicode
Remember: Just because someone claims it’s UTF-8, doesn’t mean it
is