Lock in $30 Savings on PRO—Offer Ends Soon! ⏳
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
In the beginning was TXT
Search
Markus Wein
October 02, 2014
Programming
0
110
In the beginning was TXT
A very short overview of the history of encodings, given at Vienna.rb on 2014-10-02
Markus Wein
October 02, 2014
Tweet
Share
More Decks by Markus Wein
See All by Markus Wein
Command Line Productivity
cypher
1
130
A crash intro to deliberate practice
cypher
0
110
Keeping Your PostgreSQL Data Save
cypher
0
120
Ghost in the State Machine
cypher
2
320
n Things You Didn't Know About PostgreSQL (Rubyslava & PyVo 2014 Edition)
cypher
1
240
How to Become a Better Developer
cypher
2
1.8k
An Introduction to Rust
cypher
1
8.2k
How to Become a Better Developer
cypher
1
240
A Very Short Overview of Vagrant
cypher
0
7.9k
Other Decks in Programming
See All in Programming
Google Antigravity and Vibe Coding: Agentic Development Guide
mickey_kubo
2
140
AWS CDKの推しポイントN選
akihisaikeda
1
240
全員アーキテクトで挑む、 巨大で高密度なドメインの紐解き方
agatan
8
19k
バックエンドエンジニアによる Amebaブログ K8s 基盤への CronJobの導入・運用経験
sunabig
0
140
「コードは上から下へ読むのが一番」と思った時に、思い出してほしい話
panda728
PRO
34
21k
ソフトウェア設計の課題・原則・実践技法
masuda220
PRO
26
21k
【CA.ai #3】Google ADKを活用したAI Agent開発と運用知見
harappa80
0
280
S3 VectorsとStrands Agentsを利用したAgentic RAGシステムの構築
tosuri13
6
280
Full-Cycle Reactivity in Angular: SignalStore mit Signal Forms und Resources
manfredsteyer
PRO
0
180
ID管理機能開発の裏側 高速にSaaS連携を実現したチームのAI活用編
atzzcokek
0
190
【CA.ai #3】ワークフローから見直すAIエージェント — 必要な場面と“選ばない”判断
satoaoaka
0
220
DSPy Meetup Tokyo #1 - はじめてのDSPy
masahiro_nishimi
1
150
Featured
See All Featured
Keith and Marios Guide to Fast Websites
keithpitt
413
23k
Visualizing Your Data: Incorporating Mongo into Loggly Infrastructure
mongodb
48
9.8k
Balancing Empowerment & Direction
lara
5
790
Design and Strategy: How to Deal with People Who Don’t "Get" Design
morganepeng
132
19k
Practical Orchestrator
shlominoach
190
11k
Site-Speed That Sticks
csswizardry
13
990
Improving Core Web Vitals using Speculation Rules API
sergeychernyshev
21
1.3k
Fireside Chat
paigeccino
41
3.7k
Building Flexible Design Systems
yeseniaperezcruz
329
39k
How to Think Like a Performance Engineer
csswizardry
28
2.3k
What’s in a name? Adding method to the madness
productmarketing
PRO
24
3.8k
Intergalactic Javascript Robots from Outer Space
tanoku
273
27k
Transcript
In the beginning was TXT
!
EBCDIC
Source: http://en.wikipedia.org/wiki/EBCDIC
ASCII
"#$%&
None
ä, ö, or å and Ø?
Latin-1 ISO/IEC 8859-1
Latin-*
Windows code pages
Then came the €
(
None
Shift-JIS
This sucks
Unicode!
Unicode!
✈️ (planes!)
Basic Multilingual Plane
Code Points
U+0041 (LATIN SMALL LETTER A)
Source: http://codepoints.net/U+0041
Grapheme
a a a a a a a
Composite characters
U+0065 U+0301 or U+00E9
e+´ => é é
´ != ´
Unicode… is not an encoding
UTF-32
UCS-2/UTF-16
UTF-8
Source: http://en.wikipedia.org/wiki/File:UnicodeGrow2b.png
What does it look like?
Codepoint Char ASCII Latin-1 ISO-8859-15 UTF-8 UTF-16 U+0041 A 0x41
0x41 0x41 0x41 0x00 0x41 U+00C4 Ä - 0xc4 0xc4 0xc3 0x84 0x00 0xc4 U+20AC € - - 0xa4 0xe3 0x82 0xac 0x20 0xac U+C218 ࣻ - - - 0xec 0x88 0x98 0xc2 0x18 Encoding comparison Source: http://perlgeek.de/en/article/encodings-and-unicode
Remember: Just because someone claims it’s UTF-8, doesn’t mean it
is