Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
In the beginning was TXT
Search
Markus Wein
October 02, 2014
Programming
0
84
In the beginning was TXT
A very short overview of the history of encodings, given at Vienna.rb on 2014-10-02
Markus Wein
October 02, 2014
Tweet
Share
More Decks by Markus Wein
See All by Markus Wein
Command Line Productivity
cypher
1
120
A crash intro to deliberate practice
cypher
0
110
Keeping Your PostgreSQL Data Save
cypher
0
94
Ghost in the State Machine
cypher
2
290
n Things You Didn't Know About PostgreSQL (Rubyslava & PyVo 2014 Edition)
cypher
1
220
How to Become a Better Developer
cypher
2
1.8k
An Introduction to Rust
cypher
1
7.9k
How to Become a Better Developer
cypher
1
220
A Very Short Overview of Vagrant
cypher
0
7.7k
Other Decks in Programming
See All in Programming
CQRS+ES の力を使って効果を感じる / Feel the effects of using the power of CQRS+ES
seike460
PRO
0
240
Flatt Security XSS Challenge 解答・解説
flatt_security
0
730
HTML/CSS超絶浅い説明
yuki0329
0
190
AWS re:Invent 2024個人的まとめ
satoshi256kbyte
0
100
Swiftコンパイラ超入門+async関数の仕組み
shiz
0
170
php-conference-japan-2024
tasuku43
0
430
ゼロからの、レトロゲームエンジンの作り方
tokujiros
3
1k
令和7年版 あなたが使ってよいフロントエンド機能とは
mugi_uno
10
5.2k
2025.01.17_Sansan × DMM.swift
riofujimon
2
560
各クラウドサービスにおける.NETの対応と見解
ymd65536
0
250
盆栽転じて家具となる / Bonsai and Furnitures
aereal
0
1.9k
いりゃあせ、PHPカンファレンス名古屋2025 / Welcome to PHP Conference Nagoya 2025
ttskch
1
180
Featured
See All Featured
The World Runs on Bad Software
bkeepers
PRO
66
11k
Embracing the Ebb and Flow
colly
84
4.5k
The Myth of the Modular Monolith - Day 2 Keynote - Rails World 2024
eileencodes
19
2.3k
It's Worth the Effort
3n
183
28k
No one is an island. Learnings from fostering a developers community.
thoeni
19
3.1k
The Art of Delivering Value - GDevCon NA Keynote
reverentgeek
8
1.2k
A Tale of Four Properties
chriscoyier
157
23k
GraphQLとの向き合い方2022年版
quramy
44
13k
Designing on Purpose - Digital PM Summit 2013
jponch
116
7.1k
How to Create Impact in a Changing Tech Landscape [PerfNow 2023]
tammyeverts
49
2.2k
Build The Right Thing And Hit Your Dates
maggiecrowley
33
2.5k
The Power of CSS Pseudo Elements
geoffreycrofte
74
5.4k
Transcript
In the beginning was TXT
!
EBCDIC
Source: http://en.wikipedia.org/wiki/EBCDIC
ASCII
"#$%&
None
ä, ö, or å and Ø?
Latin-1 ISO/IEC 8859-1
Latin-*
Windows code pages
Then came the €
(
None
Shift-JIS
This sucks
Unicode!
Unicode!
✈️ (planes!)
Basic Multilingual Plane
Code Points
U+0041 (LATIN SMALL LETTER A)
Source: http://codepoints.net/U+0041
Grapheme
a a a a a a a
Composite characters
U+0065 U+0301 or U+00E9
e+´ => é é
´ != ´
Unicode… is not an encoding
UTF-32
UCS-2/UTF-16
UTF-8
Source: http://en.wikipedia.org/wiki/File:UnicodeGrow2b.png
What does it look like?
Codepoint Char ASCII Latin-1 ISO-8859-15 UTF-8 UTF-16 U+0041 A 0x41
0x41 0x41 0x41 0x00 0x41 U+00C4 Ä - 0xc4 0xc4 0xc3 0x84 0x00 0xc4 U+20AC € - - 0xa4 0xe3 0x82 0xac 0x20 0xac U+C218 ࣻ - - - 0xec 0x88 0x98 0xc2 0x18 Encoding comparison Source: http://perlgeek.de/en/article/encodings-and-unicode
Remember: Just because someone claims it’s UTF-8, doesn’t mean it
is