Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Of representation and interpretation: A unified theory - PHPBenelux meetup

8dfcb5f1b3cd5397f19780e2319694da?s=47 Arnout Boks
September 10, 2020

Of representation and interpretation: A unified theory - PHPBenelux meetup

Many hard problems in programming originate from one single source: not properly distinguishing the representation of data from the way it is interpreted. Have you ever written code that filters $_GET for SQL injection attempts? Struggled with timezones? Tried to get escaping right for Javascript in HTML? Detected the character encoding of a string? All are examples of this one problem.

In this talk we will look at some examples of the representation-interpretation problem and find the general pattern behind it. We will see how primitive types make it so hard for us to get this right, and how we can use value objects to steer us in the right direction. You’ll start finding many more examples of this pattern and understand them more easily.

8dfcb5f1b3cd5397f19780e2319694da?s=128

Arnout Boks

September 10, 2020
Tweet

Transcript

  1. Of representation and interpretation A unified theory @arnoutboks Arnout Boks

    PHPBenelux 10-09-2020
  2. @arnoutboks #PHPBenelux Hard problems in programming • Cache invalidation •

    Naming things • Off-by-one errors
  3. @arnoutboks #PHPBenelux Hard problems in programming • String escaping •

    Timezones • Character encoding
  4. @arnoutboks #PHPBenelux Many difficulties with these topics are related

  5. @arnoutboks #PHPBenelux This talk is… • My personal view •

    Not an absolute truth • Meant to make you think
  6. Data And its meaning

  7. @arnoutboks #PHPBenelux A number

  8. @arnoutboks #PHPBenelux A byte

  9. @arnoutboks #PHPBenelux Some JSON data { "income":100000 }

  10. @arnoutboks #PHPBenelux A word

  11. @arnoutboks #PHPBenelux A prime number 4931083597028501900275777672390764957284907772150208632080750184097926278850976588645578020 1366007328679544734112831735367831201557535981978545054811571939345877330038009932619505876 4525023820408110189885042615176579941704250889037029119015870030479432826073821469541570330 2279875576818956016240300641115169008728798381942582716745647748166843479284645809291315318 6007001004335318936319343912948604450370991980047709462921558180711169153031876288477878354

    1575932891093295447350881882465495060005019006274705305381164278294267474853496525745368151 1706550281905552656221353146310421008662867971144467063669219825861581112515556504813420768 6732340765505485910826956266693066236799702104812396562518006818323653959348395675357557532 4619023481064700987753027956186892925380693305204238149969945456945774138335689906005870832 1812704861133682026515905166351874029X18197693937677852928722109550412925792573818660584501 5055250274994771883129310457698090915304613359419030258813205932277444385255046677902451869 7062627788891979580423065750615669834695617797879659201644051939960716981112615195610276283 2339825791423321726961443744381056485529348876349210309887028787453233132532122678633283702 7925099749969488775936915917644588032718384740235933020374888506755706587919461134193230781 4854436454375113207098606390746417564121635042388002967808558670370387509410769821183765499 2052043682558546422885024299633226853691246485500075591664024729240716450725319674499952944 8434741902107729606820558130923626837987951966199798285525887161096136561780745661592488660 8898164568541721362920846656279131478466791550965154310113538586208196875836883595577893914 5453935681996098808540476590735897289898342504712891841626587896821853808795627903997862944 9397605467534821256750121517082737107646270712467532102483678159400087505452543537
  12. @arnoutboks #PHPBenelux Data has a different meaning under different interpretations

  13. Functions A bit of theory

  14. @arnoutboks #PHPBenelux Functions x y y = f(x) f

  15. @arnoutboks #PHPBenelux Functions D R x y f: D →

    R y = f(x)
  16. @arnoutboks #PHPBenelux Functions <?php function f(D $x): R { $y

    = do_something_with($x); return $y; }
  17. @arnoutboks #PHPBenelux Functions float int 3.84 4 round: float →

    int 4 = round(3.84)
  18. @arnoutboks #PHPBenelux Functions D R x1 y f: D →

    R x2
  19. @arnoutboks #PHPBenelux Functions float int 3.84 4 round: float →

    int 4.35
  20. @arnoutboks #PHPBenelux Functions D R x y1 f: D →

    R y2
  21. @arnoutboks #PHPBenelux Pure functions D R x y1 f: D

    → R y2
  22. @arnoutboks #PHPBenelux Impure functions • Randomness • State (global or

    local) • IO • Filesystem • Network • System clock
  23. @arnoutboks #PHPBenelux Functions D R x ? f: D →

    R
  24. @arnoutboks #PHPBenelux Functions (in mathematics) D R x f: D

    → R
  25. @arnoutboks #PHPBenelux Functions (in programming) D R x Exception! f:

    D → R
  26. Representation & Interpretation

  27. @arnoutboks #PHPBenelux Level of abstraction float int round: abstraction

  28. @arnoutboks #PHPBenelux Level of abstraction Money string serialize_money: abstraction

  29. @arnoutboks #PHPBenelux Level of abstraction string byte[] encode_as_utf16: abstraction

  30. @arnoutboks #PHPBenelux Representation • Translation of a high-level concept to

    a lower-level representation • All input values supported • Lossless (ideally) • Same data, just in a different form
  31. @arnoutboks #PHPBenelux Level of abstraction byte[] string decode_from_utf16: abstraction

  32. @arnoutboks #PHPBenelux Level of abstraction string Money parse_money: abstraction

  33. @arnoutboks #PHPBenelux Interpretation • Translation of a low-level representation (back)

    to a higher-level concept • Usually not all input values supported • Same data, just in a different form
  34. @arnoutboks #PHPBenelux Why do we do this?

  35. None
  36. None
  37. @arnoutboks #PHPBenelux Transmission of data Money string abstraction byte[] light

    pulses, magnetic grains, etc. …
  38. Escaping A different view

  39. @arnoutboks #PHPBenelux Escaping for SQL <?php $username = $_POST['username']; $sql

    = "SELECT * FROM users WHERE username = '" . db_escape_string($username) . "'"; $result = db_query($sql);
  40. @arnoutboks #PHPBenelux Escaping for SQL <?php // $_POST['username'] = "';

    DROP TABLE users; --" $username = $_POST['username']; $sql = "SELECT * FROM users WHERE username = '" . db_escape_string($username) . "'"; $result = db_query($sql);
  41. @arnoutboks #PHPBenelux Escaping for SQL <?php // $_POST['username'] = "John

    O'Shea" $username = $_POST['username']; $sql = "SELECT * FROM users WHERE username = '" . db_escape_string($username) . "'"; $result = db_query($sql);
  42. @arnoutboks #PHPBenelux Escaping for SQL <?php // returns "John O'Shea"

    $username = get_username_using_api(); $sql = "SELECT * FROM users WHERE username = '" . db_escape_string($username) . "'"; $result = db_query($sql);
  43. @arnoutboks #PHPBenelux Escaping for SQL <?php // $username = "John

    O'Shea" $username = /* (whatever) */; $sql = "SELECT * FROM users WHERE username = '" . db_escape_string($username) . "'"; $result = db_query($sql);
  44. @arnoutboks #PHPBenelux String escaping is NOT a security measure

  45. @arnoutboks #PHPBenelux String escaping is just properly representing data

  46. @arnoutboks #PHPBenelux Escaping as representation string SQLFragment (?) db_escape_string: "O'Shea"

    "O\'Shea"
  47. @arnoutboks #PHPBenelux Escaping as representation <?php db_escape_string("John O'Shea"); // ->

    "John O\'Shea" <?php as_sql_string_literal("John O'Shea"); // -> "'John O\'Shea'" Better:
  48. @arnoutboks #PHPBenelux Escaping as representation string SQLStringLiteral as_sql_string_literal: "O'Shea" "'O\'Shea'"

  49. @arnoutboks #PHPBenelux Chained escaping as representation <?php $name = "John

    O'Shea"; $js_name = as_js_string_literal($name); $js_script = "showGreeting(" . $js_name . ")"; $html_tag = "<a onclick=" . as_html_attribute_value($js_script) . ">Show greeting</a>"; $sql = "INSERT INTO html_examples VALUES(" . as_sql_string_literal($html_tag) . ")";
  50. @arnoutboks #PHPBenelux Premature representation <?php foreach ($_REQUEST as $key =>

    $value) { $_REQUEST[$key] = htmlentities( db_escape_string($value) ); } // ‘Now the input data is safe’
  51. @arnoutboks #PHPBenelux Premature representation <?php foreach ($_REQUEST as $key =>

    $value) { $_REQUEST[$key] = htmlentities( db_escape_string($value) ); } // ‘Now the input data is safe’
  52. @arnoutboks #PHPBenelux Premature representation <?php foreach ($_REQUEST as $key =>

    $value) { if ( strpos($value, "DROP TABLE") !== false ) { die("SQL injection!"); } }
  53. @arnoutboks #PHPBenelux Premature representation <?php foreach ($_REQUEST as $key =>

    $value) { if ( strpos($value, "DROP TABLE") !== false ) { die("SQL injection!"); } }
  54. What goes wrong and how to prevent that

  55. @arnoutboks #PHPBenelux Misrepresentation Money string serialize_money_ok: (€ 10) "€ 10"

  56. @arnoutboks #PHPBenelux Misrepresentation Money string serialize_money_bad: (€ 10) "10"

  57. @arnoutboks #PHPBenelux Misrepresentation Money string serialize_money_bad: (€ 10) "10" ($

    10)
  58. @arnoutboks #PHPBenelux Misrepresentation Money string (€ 10) "10" ($ 10)

    Money ?
  59. @arnoutboks #PHPBenelux That doesn’t happen to me! DateTime string 2022-03-14

    06:28:00 Europe/Amsterdam 2022-03-14T\ 06:28:00+01:00
  60. @arnoutboks #PHPBenelux That doesn’t happen to me! fraction float 1/3

    0.33333333333333
  61. @arnoutboks #PHPBenelux That doesn’t happen to me! string byte[] ⛽☂️

    ? encode_as_iso8859_1
  62. @arnoutboks #PHPBenelux Lossy representations cannot be re-interpreted

  63. @arnoutboks #PHPBenelux Misinterpretation byte[] […] string "café"

  64. @arnoutboks #PHPBenelux Misinterpretation byte[] […] string "café" utf8 decode_from_utf8 Exception!

  65. @arnoutboks #PHPBenelux Misinterpretation byte[] […] string "café" iso8859-1 utf8

  66. @arnoutboks #PHPBenelux Misinterpretation byte[] […] string "café" iso8859-1 utf8 "café"

  67. @arnoutboks #PHPBenelux Duck typing byte[] […] iso8859-1 utf8

  68. @arnoutboks #PHPBenelux Duck typing byte[] […] iso8859-1 utf8 HOW?

  69. @arnoutboks #PHPBenelux The meaning of data comes from our interpretation

    of it…
  70. @arnoutboks #PHPBenelux …which may very well be wrong

  71. @arnoutboks #PHPBenelux Interpretation hints Store/transmit desired interpretation along with data

    Examples: • Music staves
  72. @arnoutboks #PHPBenelux Interpretation hints Store/transmit desired interpretation along with data

    Examples: • HTTP headers • Content-Type • Content-Encoding • Content-Language • Unicode byte order mark
  73. @arnoutboks #PHPBenelux Interpretation hints utf16 LE utf16 BE […]

  74. @arnoutboks #PHPBenelux Interpretation hints utf16 LE with BOM utf16 BE

    with BOM […]
  75. @arnoutboks #PHPBenelux Hints by variable naming Include intended interpretation in

    variable name: • $message_utf8 • $username_sql • $title_html
  76. @arnoutboks #PHPBenelux Hints by variable naming <?php header("Content-Type: text/plain; charset=iso-8859-1");

    print $message_utf8; Charset mismatch between header and content
  77. @arnoutboks #PHPBenelux Hints by variable naming <?php $database->query("SELECT * FROM

    users WHERE username = " . $username); Username is not properly represented for SQL
  78. @arnoutboks #PHPBenelux Hints by variable naming <?php print "<h1>" .

    htmlentities($title_html) . "</h1>"; Double escaping, title is already represented as HTML
  79. PHP’s string type A strange hybrid

  80. @arnoutboks #PHPBenelux Role overloading of string string byte[] encode_as_utf16: abstraction

  81. @arnoutboks #PHPBenelux PHP’s string type is actually a byte array

  82. @arnoutboks #PHPBenelux Role overloading of string string = byte[] encode_as_utf16:

    abstraction
  83. @arnoutboks #PHPBenelux Role overloading of string string = byte[] abstraction

    ‘real’ string use as if utf16
  84. @arnoutboks #PHPBenelux Role overloading of string string = byte[] abstraction

    ‘real’ string use as if utf16 use as if utf8
  85. @arnoutboks #PHPBenelux More accurate types • A better type system

    makes mis-interpretation more difficult • If our programming language does not provide the types, we have to do it ourselves • Value Objects!
  86. @arnoutboks #PHPBenelux More accurate types <?php class UTF8Bytes { private

    string $bytes; private function __construct(string $bytes) { $this->bytes = $bytes; } public static function fromPHPString(string $bytes) { return new self($bytes); } public function toPHPString(): string { return $this->bytes; } // ... }
  87. @arnoutboks #PHPBenelux More accurate types <?php class ISO88591Bytes { private

    string $bytes; private function __construct(string $bytes) { $this->bytes = $bytes; } public static function fromPHPString(string $bytes) { return new self($bytes); } public function toPHPString(): string { return $this->bytes; } // ... }
  88. @arnoutboks #PHPBenelux More accurate types <?php class RealString { private

    string $value_utf16; private function __construct(string $value_utf16) { $this->value_utf16 = $value_utf16; } public static function fromUTF8(UTF8Bytes $utf8_bytes) { $value_utf16 = mb_convert_encoding( $utf8_bytes->toPHPString(), 'UTF-16', 'UTF-8'); return new self($value_utf16); } public static function fromISO88591(ISO88591Bytes $iso88591_bytes) { // ... } // ... }
  89. @arnoutboks #PHPBenelux More accurate types <?php class RealString { //

    ... public function toUTF8(): UTF8Bytes { $value_utf8 = mb_convert_encoding( $this->value_utf16, 'UTF-8', 'UTF-16'); return UTF8Bytes::fromPHPString($value_utf8); } public function toISO88591(): ISO88591Bytes { $value_iso88591 = mb_convert_encoding( $this->value_utf16, 'ISO-8859-1', 'UTF-16'); return ISO88591Bytes::fromPHPString($value_iso88591); } // ... }
  90. @arnoutboks #PHPBenelux More accurate types <?php class RealString { //

    ... public function concat(RealString $other): RealString { return new RealString($this->value_utf16 . $other->value_utf16); } public function substring(int $start, int $length): RealString { $substr_utf16 = mb_substr($this->value_utf16, $start, $length, 'UTF-16'); return new RealString($substr_utf16); } // ... }
  91. @arnoutboks #PHPBenelux More accurate types (usage) <?php $string1_utf8 = UTF8Bytes::fromPHPString(

    file_get_contents("file1.txt")); $string1 = RealString::fromUTF8($string1_utf8); $string2_iso88591 = ISO88591Bytes::fromPHPString( file_get_contents("file2.txt")); $string2 = RealString::fromISO88591($string2_iso88591); // The original charsets do not matter anymore now... $new_string = $string1->concat($string2)->substring(7, 42); header("Content-Type: text/plain; charset=utf-8"); print $new_string->toUTF8()->toPHPString();
  92. @arnoutboks #PHPBenelux Without value objects iso8859-1 utf8 string = byte[]

    abstraction
  93. @arnoutboks #PHPBenelux With value objects abstraction UTF8Bytes ISO88591Bytes RealString

  94. @arnoutboks #PHPBenelux Value objects • Achieve higher levels of abstraction

    • Avoid misinterpretation • No silver bullet! • Comes with additional overhead
  95. @arnoutboks #PHPBenelux Recap • Data • Functions • Interpretation and

    representation • String escaping • Misrepresentation and –interpretation • The string type • Value objects
  96. @arnoutboks #PHPBenelux The meaning of data is defined by how

    we interpret it
  97. @arnoutboks #PHPBenelux Questions @arnoutboks @arnoutboks @aboks Arnout Boks Please leave

    your feedback on joind.in: https://joind.in/talk/109ec We’re hiring!
  98. @arnoutboks #PHPBenelux Image Credits • https://pixabay.com/en/reading-relaxation-glasses-sight- 3088491/ • https://fr.m.wikipedia.org/wiki/Fichier:DARPA_Big_Data.jpg •

    https://www.flickr.com/photos/elefevre/3936916711 • https://www.flickr.com/photos/perspective/9045532603 • https://www.flickr.com/photos/wwarby/11644168395/ • https://www.flickr.com/photos/opengridscheduler/1648045015 7 • https://www.flickr.com/photos/cogdog/14401469262 • https://www.flickr.com/photos/paulsimpson1976/3998279762 • https://www.flickr.com/photos/pewari/3499963407