Lock in $30 Savings on PRO—Offer Ends Soon! ⏳
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
In the beginning was TXT
Search
Markus Wein
October 02, 2014
Programming
0
120
In the beginning was TXT
A very short overview of the history of encodings, given at Vienna.rb on 2014-10-02
Markus Wein
October 02, 2014
Tweet
Share
More Decks by Markus Wein
See All by Markus Wein
Command Line Productivity
cypher
1
140
A crash intro to deliberate practice
cypher
0
120
Keeping Your PostgreSQL Data Save
cypher
0
130
Ghost in the State Machine
cypher
2
320
n Things You Didn't Know About PostgreSQL (Rubyslava & PyVo 2014 Edition)
cypher
1
250
How to Become a Better Developer
cypher
2
1.8k
An Introduction to Rust
cypher
1
8.2k
How to Become a Better Developer
cypher
1
240
A Very Short Overview of Vagrant
cypher
0
8k
Other Decks in Programming
See All in Programming
ローカルLLMを⽤いてコード補完を⾏う VSCode拡張機能を作ってみた
nearme_tech
PRO
0
160
Giselleで作るAI QAアシスタント 〜 Pull Requestレビューに継続的QAを
codenote
0
290
Context is King? 〜Verifiability時代とコンテキスト設計 / Beyond "Context is King"
rkaga
10
1.4k
クラウドに依存しないS3を使った開発術
simesaba80
0
160
Implementation Patterns
denyspoltorak
0
110
Canon EOS R50 V と R5 Mark II 購入でみえてきた最近のデジイチ VR180 事情、そして VR180 静止画に活路を見出すまで
karad
0
140
tsgolintはいかにしてtypescript-goの非公開APIを呼び出しているのか
syumai
7
2.3k
マスタデータ問題、マイクロサービスでどう解くか
kts
0
120
0→1 フロントエンド開発 Tips🚀 #レバテックMeetup
bengo4com
0
360
tparseでgo testの出力を見やすくする
utgwkk
2
280
これならできる!個人開発のすゝめ
tinykitten
PRO
0
130
GISエンジニアから見たLINKSデータ
nokonoko1203
0
180
Featured
See All Featured
ReactJS: Keep Simple. Everything can be a component!
pedronauck
666
130k
Heart Work Chapter 1 - Part 1
lfama
PRO
3
35k
End of SEO as We Know It (SMX Advanced Version)
ipullrank
2
3.8k
svc-hook: hooking system calls on ARM64 by binary rewriting
retrage
1
26
Data-driven link building: lessons from a $708K investment (BrightonSEO talk)
szymonslowik
1
850
How to train your dragon (web standard)
notwaldorf
97
6.4k
How to Get Subject Matter Experts Bought In and Actively Contributing to SEO & PR Initiatives.
livdayseo
0
29
SEO Brein meetup: CTRL+C is not how to scale international SEO
lindahogenes
0
2.2k
Stewardship and Sustainability of Urban and Community Forests
pwiseman
0
72
How to Grow Your eCommerce with AI & Automation
katarinadahlin
PRO
0
75
Visualization
eitanlees
150
16k
HU Berlin: Industrial-Strength Natural Language Processing with spaCy and Prodigy
inesmontani
PRO
0
100
Transcript
In the beginning was TXT
!
EBCDIC
Source: http://en.wikipedia.org/wiki/EBCDIC
ASCII
"#$%&
None
ä, ö, or å and Ø?
Latin-1 ISO/IEC 8859-1
Latin-*
Windows code pages
Then came the €
(
None
Shift-JIS
This sucks
Unicode!
Unicode!
✈️ (planes!)
Basic Multilingual Plane
Code Points
U+0041 (LATIN SMALL LETTER A)
Source: http://codepoints.net/U+0041
Grapheme
a a a a a a a
Composite characters
U+0065 U+0301 or U+00E9
e+´ => é é
´ != ´
Unicode… is not an encoding
UTF-32
UCS-2/UTF-16
UTF-8
Source: http://en.wikipedia.org/wiki/File:UnicodeGrow2b.png
What does it look like?
Codepoint Char ASCII Latin-1 ISO-8859-15 UTF-8 UTF-16 U+0041 A 0x41
0x41 0x41 0x41 0x00 0x41 U+00C4 Ä - 0xc4 0xc4 0xc3 0x84 0x00 0xc4 U+20AC € - - 0xa4 0xe3 0x82 0xac 0x20 0xac U+C218 ࣻ - - - 0xec 0x88 0x98 0xc2 0x18 Encoding comparison Source: http://perlgeek.de/en/article/encodings-and-unicode
Remember: Just because someone claims it’s UTF-8, doesn’t mean it
is