Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
In the beginning was TXT
Search
Markus Wein
October 02, 2014
Programming
0
71
In the beginning was TXT
A very short overview of the history of encodings, given at Vienna.rb on 2014-10-02
Markus Wein
October 02, 2014
Tweet
Share
More Decks by Markus Wein
See All by Markus Wein
Command Line Productivity
cypher
1
110
A crash intro to deliberate practice
cypher
0
110
Keeping Your PostgreSQL Data Save
cypher
0
88
Ghost in the State Machine
cypher
2
270
n Things You Didn't Know About PostgreSQL (Rubyslava & PyVo 2014 Edition)
cypher
1
200
How to Become a Better Developer
cypher
2
1.8k
An Introduction to Rust
cypher
1
7.9k
How to Become a Better Developer
cypher
1
220
A Very Short Overview of Vagrant
cypher
0
7.6k
Other Decks in Programming
See All in Programming
BasicBasic認証
sadnessojisan
5
3.2k
connect-go で面倒くささと戦う / 2024-08-27 #newmo_layerx_go
izumin5210
2
610
現代のVueとTypeScript - 型安全の活用術
minako__ph
4
3.2k
長期運用プロダクトの開発速度を維持し続けるためのリファクタリング実践例
wataruss
8
2.6k
マルチモジュールにおけるテスト最適化
fxwx23
0
190
What we keep in mind when migrating from Serverless Framework to AWS CDK and AWS SAM
kasacchiful
1
130
Rubyとクリエイティブコーディングの輪の広がり / The Growing Circle of Ruby and Creative Coding
chobishiba
1
240
実践 Advanced CallKit 〜快適な通話の実現に向けて〜
mot_techtalk
3
120
LR で JSON パーサーを作る / Coding LR JSON Parser
junk0612
2
180
1人で挑むSwiftコンパイラ 〜型システム入門編〜
s_shimotori
0
330
dotfiles について話したい #湘なんか
stefafafan
2
290
令和トラベルにおけるLLM活用事例:社内ツール開発から得た学びと実践
ippo012
0
120
Featured
See All Featured
Navigating Team Friction
lara
183
13k
Embracing the Ebb and Flow
colly
83
4.4k
Clear Off the Table
cherdarchuk
91
320k
In The Pink: A Labor of Love
frogandcode
139
22k
Gamification - CAS2011
davidbonilla
79
4.9k
CSS Pre-Processors: Stylus, Less & Sass
bermonpainter
354
29k
[RailsConf 2023 Opening Keynote] The Magic of Rails
eileencodes
27
8.9k
Rails Girls Zürich Keynote
gr2m
93
13k
Web Components: a chance to create the future
zenorocha
308
41k
Principles of Awesome APIs and How to Build Them.
keavy
125
16k
Code Reviewing Like a Champion
maltzj
518
39k
Statistics for Hackers
jakevdp
793
220k
Transcript
In the beginning was TXT
!
EBCDIC
Source: http://en.wikipedia.org/wiki/EBCDIC
ASCII
"#$%&
None
ä, ö, or å and Ø?
Latin-1 ISO/IEC 8859-1
Latin-*
Windows code pages
Then came the €
(
None
Shift-JIS
This sucks
Unicode!
Unicode!
✈️ (planes!)
Basic Multilingual Plane
Code Points
U+0041 (LATIN SMALL LETTER A)
Source: http://codepoints.net/U+0041
Grapheme
a a a a a a a
Composite characters
U+0065 U+0301 or U+00E9
e+´ => é é
´ != ´
Unicode… is not an encoding
UTF-32
UCS-2/UTF-16
UTF-8
Source: http://en.wikipedia.org/wiki/File:UnicodeGrow2b.png
What does it look like?
Codepoint Char ASCII Latin-1 ISO-8859-15 UTF-8 UTF-16 U+0041 A 0x41
0x41 0x41 0x41 0x00 0x41 U+00C4 Ä - 0xc4 0xc4 0xc3 0x84 0x00 0xc4 U+20AC € - - 0xa4 0xe3 0x82 0xac 0x20 0xac U+C218 ࣻ - - - 0xec 0x88 0x98 0xc2 0x18 Encoding comparison Source: http://perlgeek.de/en/article/encodings-and-unicode
Remember: Just because someone claims it’s UTF-8, doesn’t mean it
is