Slide 1

Slide 1 text

Of representation and interpretation A unified theory @arnoutboks Arnout Boks #odeaandecode 25-04-2019

Slide 2

Slide 2 text

@arnoutboks #odeaandecode Hard problems in programming • Cache invalidation • Naming things • Off-by-one errors

Slide 3

Slide 3 text

@arnoutboks #odeaandecode Hard problems in programming • String escaping • Timezones • Character encoding

Slide 4

Slide 4 text

@arnoutboks #odeaandecode Many difficulties with these topics are related

Slide 5

Slide 5 text

@arnoutboks #odeaandecode This talk is… • My personal view • Not an absolute truth • Meant to make you think

Slide 6

Slide 6 text

Data And its meaning

Slide 7

Slide 7 text

@arnoutboks #odeaandecode A number

Slide 8

Slide 8 text

@arnoutboks #odeaandecode A byte

Slide 9

Slide 9 text

@arnoutboks #odeaandecode Some JSON data { "income":100000 }

Slide 10

Slide 10 text

@arnoutboks #odeaandecode A word

Slide 11

Slide 11 text

@arnoutboks #odeaandecode A prime number 4931083597028501900275777672390764957284907772150208632080750184097926278850976588645578020 1366007328679544734112831735367831201557535981978545054811571939345877330038009932619505876 4525023820408110189885042615176579941704250889037029119015870030479432826073821469541570330 2279875576818956016240300641115169008728798381942582716745647748166843479284645809291315318 6007001004335318936319343912948604450370991980047709462921558180711169153031876288477878354 1575932891093295447350881882465495060005019006274705305381164278294267474853496525745368151 1706550281905552656221353146310421008662867971144467063669219825861581112515556504813420768 6732340765505485910826956266693066236799702104812396562518006818323653959348395675357557532 4619023481064700987753027956186892925380693305204238149969945456945774138335689906005870832 1812704861133682026515905166351874029X18197693937677852928722109550412925792573818660584501 5055250274994771883129310457698090915304613359419030258813205932277444385255046677902451869 7062627788891979580423065750615669834695617797879659201644051939960716981112615195610276283 2339825791423321726961443744381056485529348876349210309887028787453233132532122678633283702 7925099749969488775936915917644588032718384740235933020374888506755706587919461134193230781 4854436454375113207098606390746417564121635042388002967808558670370387509410769821183765499 2052043682558546422885024299633226853691246485500075591664024729240716450725319674499952944 8434741902107729606820558130923626837987951966199798285525887161096136561780745661592488660 8898164568541721362920846656279131478466791550965154310113538586208196875836883595577893914 5453935681996098808540476590735897289898342504712891841626587896821853808795627903997862944 9397605467534821256750121517082737107646270712467532102483678159400087505452543537

Slide 12

Slide 12 text

@arnoutboks #odeaandecode Data has a different meaning under different interpretations

Slide 13

Slide 13 text

Functions A bit of theory

Slide 14

Slide 14 text

@arnoutboks #odeaandecode Functions x y y = f(x) f

Slide 15

Slide 15 text

@arnoutboks #odeaandecode Functions D R x y f: D → R y = f(x)

Slide 16

Slide 16 text

@arnoutboks #odeaandecode Functions

Slide 17

Slide 17 text

@arnoutboks #odeaandecode Functions float int 3.84 4 round: float → int 4 = round(3.84)

Slide 18

Slide 18 text

@arnoutboks #odeaandecode Functions D R x1 y f: D → R x2

Slide 19

Slide 19 text

@arnoutboks #odeaandecode Functions float int 3.84 4 round: float → int 4.35

Slide 20

Slide 20 text

@arnoutboks #odeaandecode Functions D R x y1 f: D → R y2

Slide 21

Slide 21 text

@arnoutboks #odeaandecode Pure functions D R x y1 f: D → R y2

Slide 22

Slide 22 text

@arnoutboks #odeaandecode Impure functions • Randomness • State (global or local) • IO • Filesystem • Network • System clock

Slide 23

Slide 23 text

@arnoutboks #odeaandecode Functions D R x ? f: D → R

Slide 24

Slide 24 text

@arnoutboks #odeaandecode Functions (in mathematics) D R x f: D → R

Slide 25

Slide 25 text

@arnoutboks #odeaandecode Functions (in programming) D R x Exception! f: D → R

Slide 26

Slide 26 text

Representation & Interpretation

Slide 27

Slide 27 text

@arnoutboks #odeaandecode Level of abstraction float int round: abstraction

Slide 28

Slide 28 text

@arnoutboks #odeaandecode Level of abstraction Money string serialize_money: abstraction

Slide 29

Slide 29 text

@arnoutboks #odeaandecode Level of abstraction string byte[] encode_as_utf16: abstraction

Slide 30

Slide 30 text

@arnoutboks #odeaandecode Representation • Translation of a high-level concept to a lower-level representation • All input values supported • Lossless (ideally) • Same data, just in a different form

Slide 31

Slide 31 text

@arnoutboks #odeaandecode Level of abstraction byte[] string decode_from_utf16: abstraction

Slide 32

Slide 32 text

@arnoutboks #odeaandecode Level of abstraction string Money parse_money: abstraction

Slide 33

Slide 33 text

@arnoutboks #odeaandecode Interpretation • Translation of a low-level representation (back) to a higher-level concept • Usually not all input values supported • Same data, just in a different form

Slide 34

Slide 34 text

@arnoutboks #odeaandecode Why do we do this?

Slide 35

Slide 35 text

No content

Slide 36

Slide 36 text

No content

Slide 37

Slide 37 text

@arnoutboks #odeaandecode Transmission of data Money string abstraction byte[] light pulses, magnetic grains, etc. …

Slide 38

Slide 38 text

Escaping A different view

Slide 39

Slide 39 text

@arnoutboks #odeaandecode Escaping for SQL

Slide 40

Slide 40 text

@arnoutboks #odeaandecode Escaping for SQL

Slide 41

Slide 41 text

@arnoutboks #odeaandecode Escaping for SQL

Slide 42

Slide 42 text

@arnoutboks #odeaandecode Escaping for SQL

Slide 43

Slide 43 text

@arnoutboks #odeaandecode Escaping for SQL

Slide 44

Slide 44 text

@arnoutboks #odeaandecode String escaping is NOT a security measure

Slide 45

Slide 45 text

@arnoutboks #odeaandecode String escaping is just properly representing data

Slide 46

Slide 46 text

@arnoutboks #odeaandecode Escaping as representation string SQLFragment (?) db_escape_string: "O'Shea" "O\'Shea"

Slide 47

Slide 47 text

@arnoutboks #odeaandecode Escaping as representation "John O\'Shea" "'John O\'Shea'" Better:

Slide 48

Slide 48 text

@arnoutboks #odeaandecode Escaping as representation string SQLStringLiteral as_sql_string_literal: "O'Shea" "'O\'Shea'"

Slide 49

Slide 49 text

@arnoutboks #odeaandecode Chained escaping as representation Show greeting"; $sql = "INSERT INTO html_examples VALUES(" . as_sql_string_literal($html_tag) . ")";

Slide 50

Slide 50 text

@arnoutboks #odeaandecode Premature representation $value) { $_REQUEST[$key] = htmlentities( db_escape_string($value) ); } // ‘Now the input data is safe’

Slide 51

Slide 51 text

@arnoutboks #odeaandecode Premature representation $value) { $_REQUEST[$key] = htmlentities( db_escape_string($value) ); } // ‘Now the input data is safe’

Slide 52

Slide 52 text

@arnoutboks #odeaandecode Premature representation $value) { if ( strpos($value, "DROP TABLE") !== false ) { die("SQL injection!"); } }

Slide 53

Slide 53 text

@arnoutboks #odeaandecode Premature representation $value) { if ( strpos($value, "DROP TABLE") !== false ) { die("SQL injection!"); } }

Slide 54

Slide 54 text

What goes wrong and how to prevent that

Slide 55

Slide 55 text

@arnoutboks #odeaandecode Misrepresentation Money string serialize_money_ok: (€ 10) "€ 10"

Slide 56

Slide 56 text

@arnoutboks #odeaandecode Misrepresentation Money string serialize_money_bad: (€ 10) "10"

Slide 57

Slide 57 text

@arnoutboks #odeaandecode Misrepresentation Money string serialize_money_bad: (€ 10) "10" ($ 10)

Slide 58

Slide 58 text

@arnoutboks #odeaandecode Misrepresentation Money string (€ 10) "10" ($ 10) Money ?

Slide 59

Slide 59 text

@arnoutboks #odeaandecode That doesn’t happen to me! DateTime string 2022-03-14 06:28:00 Europe/Amsterdam 2022-03-14T\ 06:28:00+01:00

Slide 60

Slide 60 text

@arnoutboks #odeaandecode That doesn’t happen to me! fraction float 1/3 0.33333333333333

Slide 61

Slide 61 text

@arnoutboks #odeaandecode That doesn’t happen to me! string byte[] ⛽☂️ ? encode_as_iso8859_1

Slide 62

Slide 62 text

@arnoutboks #odeaandecode Lossy representations cannot be re-interpreted

Slide 63

Slide 63 text

@arnoutboks #odeaandecode Misinterpretation byte[] […] string "café"

Slide 64

Slide 64 text

@arnoutboks #odeaandecode Misinterpretation byte[] […] string "café" utf8 decode_from_utf8 Exception!

Slide 65

Slide 65 text

@arnoutboks #odeaandecode Misinterpretation byte[] […] string "café" iso8859-1 utf8

Slide 66

Slide 66 text

@arnoutboks #odeaandecode Misinterpretation byte[] […] string "café" iso8859-1 utf8 "café"

Slide 67

Slide 67 text

@arnoutboks #odeaandecode Duck typing byte[] […] iso8859-1 utf8

Slide 68

Slide 68 text

@arnoutboks #odeaandecode Duck typing byte[] […] iso8859-1 utf8 HOW?

Slide 69

Slide 69 text

@arnoutboks #odeaandecode The meaning of data comes from our interpretation of it…

Slide 70

Slide 70 text

@arnoutboks #odeaandecode …which may very well be wrong

Slide 71

Slide 71 text

@arnoutboks #odeaandecode Interpretation hints Store/transmit desired interpretation along with data Examples: • HTTP headers • Content-Type • Content-Encoding • Content-Language • UTF8 byte order mark

Slide 72

Slide 72 text

@arnoutboks #odeaandecode Interpretation hints Store/transmit desired interpretation along with data Examples: • Music staves

Slide 73

Slide 73 text

@arnoutboks #odeaandecode Interpretation hints utf16 LE utf16 BE […]

Slide 74

Slide 74 text

@arnoutboks #odeaandecode Interpretation hints utf16 LE with BOM utf16 BE with BOM […]

Slide 75

Slide 75 text

@arnoutboks #odeaandecode Hints by variable naming Include intended interpretation in variable name: • $message_utf8 • $username_sql • $title_html

Slide 76

Slide 76 text

@arnoutboks #odeaandecode Hints by variable naming

Slide 77

Slide 77 text

@arnoutboks #odeaandecode Hints by variable naming query("SELECT * FROM users WHERE username = " . $username); Username is not properly represented for SQL

Slide 78

Slide 78 text

@arnoutboks #odeaandecode Hints by variable naming " . htmlentities($title_html) . ""; Double escaping, title is already represented as HTML

Slide 79

Slide 79 text

PHP’s string type A strange hybrid

Slide 80

Slide 80 text

@arnoutboks #odeaandecode Role overloading of string string byte[] encode_as_utf16: abstraction

Slide 81

Slide 81 text

@arnoutboks #odeaandecode PHP’s string type is actually a byte array

Slide 82

Slide 82 text

@arnoutboks #odeaandecode Role overloading of string string = byte[] encode_as_utf16: abstraction

Slide 83

Slide 83 text

@arnoutboks #odeaandecode Role overloading of string string = byte[] abstraction ‘real’ string use as if utf16

Slide 84

Slide 84 text

@arnoutboks #odeaandecode Role overloading of string string = byte[] abstraction ‘real’ string use as if utf16 use as if utf8

Slide 85

Slide 85 text

@arnoutboks #odeaandecode More accurate types • A better type system makes mis-interpretation more difficult • If our programming language does not provide the types, we have to do it ourselves • Value Objects!

Slide 86

Slide 86 text

@arnoutboks #odeaandecode More accurate types bytes = $bytes; } public static function fromPHPString(string $bytes) { return new self($bytes); } public function toPHPString(): string { return $this->bytes; } // ... }

Slide 87

Slide 87 text

@arnoutboks #odeaandecode More accurate types bytes = $bytes; } public static function fromPHPString(string $bytes) { return new self($bytes); } public function toPHPString(): string { return $this->bytes; } // ... }

Slide 88

Slide 88 text

@arnoutboks #odeaandecode More accurate types bytes = $bytes; } public static function fromPHPString(string $bytes) { return new self($bytes); } public function toPHPString(): string { return $this->bytes; } // ... }

Slide 89

Slide 89 text

@arnoutboks #odeaandecode More accurate types value_utf16 = $value_utf16; } public static function fromUTF8(UTF8Bytes $utf8_bytes) { $value_utf16 = mb_convert_encoding( $utf8_bytes->toPHPString(), 'UTF-16', 'UTF-8'); return new self($value_utf16); } public static function fromISO88591(ISO88591Bytes $iso88591_bytes) { // ... } // ... }

Slide 90

Slide 90 text

@arnoutboks #odeaandecode More accurate types value_utf16, 'UTF-8', 'UTF-16'); return UTF8Bytes::fromPHPString($value_utf8); } public function toISO88591(): ISO88591Bytes { $value_iso88591 = mb_convert_encoding( $this->value_utf16, 'ISO-8859-1', 'UTF-16'); return ISO88591Bytes::fromPHPString($value_iso88591); } // ... }

Slide 91

Slide 91 text

@arnoutboks #odeaandecode More accurate types value_utf16 . $other->value_utf16); } public function substring(int $start, int $length): RealString { $substr_utf16 = mb_substr($this->value_utf16, $start, $length, 'UTF-16'); return new RealString($substr_utf16); } // ... }

Slide 92

Slide 92 text

@arnoutboks #odeaandecode More accurate types (usage) concat($string2)->substring(7, 42); header("Content-Type: text/plain; charset=utf-8"); print $new_string->toUTF8()->toPHPString();

Slide 93

Slide 93 text

@arnoutboks #odeaandecode Without value objects iso8859-1 utf8 string = byte[] abstraction

Slide 94

Slide 94 text

@arnoutboks #odeaandecode With value objects abstraction UTF8Bytes ISO88591Bytes RealString

Slide 95

Slide 95 text

@arnoutboks #odeaandecode Value objects • Achieve higher levels of abstraction • Avoid misinterpretation • No silver bullet! • Comes with additional overhead

Slide 96

Slide 96 text

@arnoutboks #odeaandecode Recap • Data • Functions • Interpretation and representation • String escaping • Misrepresentation and –interpretation • The string type • Value objects

Slide 97

Slide 97 text

@arnoutboks #odeaandecode The meaning of data is defined by how we interpret it

Slide 98

Slide 98 text

@arnoutboks #odeaandecode Feedback & Questions @arnoutboks @arnoutboks @aboks Arnout Boks

Slide 99

Slide 99 text

@arnoutboks #odeaandecode Image Credits • https://pixabay.com/en/reading-relaxation-glasses-sight- 3088491/ • https://fr.m.wikipedia.org/wiki/Fichier:DARPA_Big_Data.jpg • https://www.flickr.com/photos/elefevre/3936916711 • https://www.flickr.com/photos/perspective/9045532603 • https://www.flickr.com/photos/wwarby/11644168395/ • https://www.flickr.com/photos/opengridscheduler/16480450157 • https://www.flickr.com/photos/cogdog/14401469262 • https://www.flickr.com/photos/paulsimpson1976/3998279762 • https://www.flickr.com/photos/pewari/3499963407