Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
In the beginning was TXT
Search
Markus Wein
October 02, 2014
Programming
0
120
In the beginning was TXT
A very short overview of the history of encodings, given at Vienna.rb on 2014-10-02
Markus Wein
October 02, 2014
Tweet
Share
More Decks by Markus Wein
See All by Markus Wein
Command Line Productivity
cypher
1
140
A crash intro to deliberate practice
cypher
0
120
Keeping Your PostgreSQL Data Save
cypher
0
130
Ghost in the State Machine
cypher
2
330
n Things You Didn't Know About PostgreSQL (Rubyslava & PyVo 2014 Edition)
cypher
1
250
How to Become a Better Developer
cypher
2
1.8k
An Introduction to Rust
cypher
1
8.3k
How to Become a Better Developer
cypher
1
240
A Very Short Overview of Vagrant
cypher
0
8k
Other Decks in Programming
See All in Programming
それ、本当に安全? ファイルアップロードで見落としがちなセキュリティリスクと対策
penpeen
7
2.1k
公共交通オープンデータ × モバイルUX 複雑な運行情報を 『直感』に変換する技術
tinykitten
PRO
0
190
CSC307 Lecture 02
javiergs
PRO
1
760
Patterns of Patterns
denyspoltorak
0
460
SQL Server 2025 LT
odashinsuke
0
160
AgentCoreとHuman in the Loop
har1101
4
150
Cap'n Webについて
yusukebe
0
160
コントリビューターによるDenoのすゝめ / Deno Recommendations by a Contributor
petamoriken
0
140
從冷知識到漏洞,你不懂的 Web,駭客懂 - Huli @ WebConf Taiwan 2025
aszx87410
2
3.4k
The Art of Re-Architecture - Droidcon India 2025
siddroid
0
160
re:Invent 2025 トレンドからみる製品開発への AI Agent 活用
yoskoh
0
630
Spinner 軸ズレ現象を調べたらレンダリング深淵に飲まれた #レバテックMeetup
bengo4com
1
220
Featured
See All Featured
Done Done
chrislema
186
16k
Performance Is Good for Brains [We Love Speed 2024]
tammyeverts
12
1.4k
Designing Dashboards & Data Visualisations in Web Apps
destraynor
231
54k
Everyday Curiosity
cassininazir
0
120
We Have a Design System, Now What?
morganepeng
54
8k
JavaScript: Past, Present, and Future - NDC Porto 2020
reverentgeek
52
5.8k
Practical Tips for Bootstrapping Information Extraction Pipelines
honnibal
25
1.7k
Learning to Love Humans: Emotional Interface Design
aarron
274
41k
SEOcharity - Dark patterns in SEO and UX: How to avoid them and build a more ethical web
sarafernandez
0
110
Introduction to Domain-Driven Design and Collaborative software design
baasie
1
550
RailsConf & Balkan Ruby 2019: The Past, Present, and Future of Rails at GitHub
eileencodes
141
34k
Ten Tips & Tricks for a 🌱 transition
stuffmc
0
50
Transcript
In the beginning was TXT
!
EBCDIC
Source: http://en.wikipedia.org/wiki/EBCDIC
ASCII
"#$%&
None
ä, ö, or å and Ø?
Latin-1 ISO/IEC 8859-1
Latin-*
Windows code pages
Then came the €
(
None
Shift-JIS
This sucks
Unicode!
Unicode!
✈️ (planes!)
Basic Multilingual Plane
Code Points
U+0041 (LATIN SMALL LETTER A)
Source: http://codepoints.net/U+0041
Grapheme
a a a a a a a
Composite characters
U+0065 U+0301 or U+00E9
e+´ => é é
´ != ´
Unicode… is not an encoding
UTF-32
UCS-2/UTF-16
UTF-8
Source: http://en.wikipedia.org/wiki/File:UnicodeGrow2b.png
What does it look like?
Codepoint Char ASCII Latin-1 ISO-8859-15 UTF-8 UTF-16 U+0041 A 0x41
0x41 0x41 0x41 0x00 0x41 U+00C4 Ä - 0xc4 0xc4 0xc3 0x84 0x00 0xc4 U+20AC € - - 0xa4 0xe3 0x82 0xac 0x20 0xac U+C218 ࣻ - - - 0xec 0x88 0x98 0xc2 0x18 Encoding comparison Source: http://perlgeek.de/en/article/encodings-and-unicode
Remember: Just because someone claims it’s UTF-8, doesn’t mean it
is