Of representation and interpretation: A unified theory - phpCE 2018

Of representation and interpretation: A unified theory - phpCE 2018

Joind.in: https://joind.in/event/php-central-europe-conference-2018/of-representation-and-interpretation-a-unified-theory
Video recording: https://www.youtube.com/watch?v=Xa4VtQNmwKM

Many hard problems in programming originate from one single source: not properly distinguishing the representation of data from the way it is interpreted. Have you ever written code that filters $_GET for SQL injection attempts? Struggled with timezones? Tried to get escaping right for Javascript in HTML? Detected the character encoding of a string? All are examples of this one problem. In this talk we will look at some examples of the representation-interpretation problem and find the general pattern behind it. We will see how primitive types make it so hard for us to get this right, and how we can use value objects to steer us in the right direction. You’ll start finding many more examples of this pattern and understand them more easily.

8dfcb5f1b3cd5397f19780e2319694da?s=128

Arnout Boks

October 28, 2018
Tweet

Transcript

  1. 5.

    @arnoutboks #phpce18 This talk is… • My personal view •

    Not an absolute truth • Meant to make you think
  2. 11.

    @arnoutboks #phpce18 A prime number 4931083597028501900275777672390764957284907772150208632080750184097926278850976588645578020 1366007328679544734112831735367831201557535981978545054811571939345877330038009932619505876 4525023820408110189885042615176579941704250889037029119015870030479432826073821469541570330 2279875576818956016240300641115169008728798381942582716745647748166843479284645809291315318 6007001004335318936319343912948604450370991980047709462921558180711169153031876288477878354

    1575932891093295447350881882465495060005019006274705305381164278294267474853496525745368151 1706550281905552656221353146310421008662867971144467063669219825861581112515556504813420768 6732340765505485910826956266693066236799702104812396562518006818323653959348395675357557532 4619023481064700987753027956186892925380693305204238149969945456945774138335689906005870832 1812704861133682026515905166351874029X18197693937677852928722109550412925792573818660584501 5055250274994771883129310457698090915304613359419030258813205932277444385255046677902451869 7062627788891979580423065750615669834695617797879659201644051939960716981112615195610276283 2339825791423321726961443744381056485529348876349210309887028787453233132532122678633283702 7925099749969488775936915917644588032718384740235933020374888506755706587919461134193230781 4854436454375113207098606390746417564121635042388002967808558670370387509410769821183765499 2052043682558546422885024299633226853691246485500075591664024729240716450725319674499952944 8434741902107729606820558130923626837987951966199798285525887161096136561780745661592488660 8898164568541721362920846656279131478466791550965154310113538586208196875836883595577893914 5453935681996098808540476590735897289898342504712891841626587896821853808795627903997862944 9397605467534821256750121517082737107646270712467532102483678159400087505452543537
  3. 16.
  4. 22.

    @arnoutboks #phpce18 Impure functions • Randomness • State (global or

    local) • IO • Filesystem • Network • System clock
  5. 30.

    @arnoutboks #phpce18 Representation • Translation of a high-level concept to

    a lower-level representation • All input values supported • Lossless (ideally) • Same data, just in a different form
  6. 33.

    @arnoutboks #phpce18 Interpretation • Translation of a low-level representation (back)

    to a higher-level concept • Usually not all input values supported • Same data, just in a different form
  7. 35.
  8. 36.
  9. 39.

    @arnoutboks #phpce18 Escaping for SQL <?php $username = $_POST['username']; $sql

    = "SELECT * FROM users WHERE username = '" . db_escape_string($username) . "'"; $result = db_query($sql);
  10. 40.

    @arnoutboks #phpce18 Escaping for SQL <?php // $_POST['username'] = "';

    DROP TABLE users; --" $username = $_POST['username']; $sql = "SELECT * FROM users WHERE username = '" . db_escape_string($username) . "'"; $result = db_query($sql);
  11. 41.

    @arnoutboks #phpce18 Escaping for SQL <?php // $_POST['username'] = "John

    O'Shea" $username = $_POST['username']; $sql = "SELECT * FROM users WHERE username = '" . db_escape_string($username) . "'"; $result = db_query($sql);
  12. 42.

    @arnoutboks #phpce18 Escaping for SQL <?php // returns "John O'Shea"

    $username = get_username_using_api(); $sql = "SELECT * FROM users WHERE username = '" . db_escape_string($username) . "'"; $result = db_query($sql);
  13. 43.

    @arnoutboks #phpce18 Escaping for SQL <?php // $username = "John

    O'Shea" $username = /* (whatever) */; $sql = "SELECT * FROM users WHERE username = '" . db_escape_string($username) . "'"; $result = db_query($sql);
  14. 47.

    @arnoutboks #phpce18 Escaping as representation <?php db_escape_string("John O'Shea"); // ->

    "John O\'Shea" <?php as_sql_string_literal("John O'Shea"); // -> "'John O\'Shea'" Better:
  15. 49.

    @arnoutboks #phpce18 Chained escaping as representation <?php $name = "John

    O'Shea"; $js_name = as_js_string_literal($name); $js_script = "var name = " . $js_name . ";"; $html_tag = "<script>" . as_html_text($js_script) . "</script>"; $sql = "INSERT INTO html_examples VALUES(" . as_sql_string_literal($html_tag) . ")";
  16. 50.

    @arnoutboks #phpce18 Premature representation <?php foreach ($_REQUEST as $key =>

    $value) { $_REQUEST[$key] = htmlentities( db_escape_string($value) ); } // ‘Now the input data is safe’
  17. 51.

    @arnoutboks #phpce18 Premature representation <?php foreach ($_REQUEST as $key =>

    $value) { $_REQUEST[$key] = htmlentities( db_escape_string($value) ); } // ‘Now the input data is safe’
  18. 52.

    @arnoutboks #phpce18 Premature representation <?php foreach ($_REQUEST as $key =>

    $value) { if ( strpos($value, "DROP TABLE") !== false ) { die("SQL injection!"); } }
  19. 53.

    @arnoutboks #phpce18 Premature representation <?php foreach ($_REQUEST as $key =>

    $value) { if ( strpos($value, "DROP TABLE") !== false ) { die("SQL injection!"); } }
  20. 59.

    @arnoutboks #phpce18 That doesn’t happen to me! DateTime string 2022-03-14

    06:28:00 Europe/Prague 2022-03-14T\ 06:28:00+01:00
  21. 71.

    @arnoutboks #phpce18 Interpretation hints Store/transmit desired interpretation along with data

    Examples: • HTTP headers • Content-Type • Content-Encoding • Content-Language • UTF8 byte order mark
  22. 75.

    @arnoutboks #phpce18 Hints by variable naming Include intended interpretation in

    variable name: • $message_utf8 • $username_sql • $title_html
  23. 77.

    @arnoutboks #phpce18 Hints by variable naming <?php $database->query("SELECT * FROM

    users WHERE username = " . $username); Username is not properly represented for SQL
  24. 78.

    @arnoutboks #phpce18 Hints by variable naming <?php print "<h1>" .

    htmlentities($title_html) . "</h1>"; Double escaping, title is already represented as HTML
  25. 84.

    @arnoutboks #phpce18 Role overloading of string string = byte[] abstraction

    ‘real’ string use as if utf16 use as if utf8
  26. 85.

    @arnoutboks #phpce18 More accurate types • A better type system

    makes mis-interpretation more difficult • If our programming language does not provide the types, we have to do it ourselves • Value Objects!
  27. 86.

    @arnoutboks #phpce18 More accurate types <?php class UTF8Bytes { private

    string $bytes; private function __construct(string $bytes) { $this->bytes = $bytes; } public static function fromPHPString(string $bytes) { return new self($bytes); } public function toPHPString(): string { return $this->bytes; } // ... }
  28. 87.

    @arnoutboks #phpce18 More accurate types <?php class UTF8Bytes { private

    string $bytes; // <- PHP 7.4: typed properties private function __construct(string $bytes) { $this->bytes = $bytes; } public static function fromPHPString(string $bytes) { return new self($bytes); } public function toPHPString(): string { return $this->bytes; } // ... }
  29. 88.

    @arnoutboks #phpce18 More accurate types <?php class ISO88591Bytes { private

    string $bytes; private function __construct(string $bytes) { $this->bytes = $bytes; } public static function fromPHPString(string $bytes) { return new self($bytes); } public function toPHPString(): string { return $this->bytes; } // ... }
  30. 89.

    @arnoutboks #phpce18 More accurate types <?php class RealString { private

    string $value_utf16; private function __construct(string $value_utf16) { $this->value_utf16 = $value_utf16; } public static function fromUTF8(UTF8Bytes $utf8_bytes) { $value_utf16 = mb_convert_encoding( $utf8_bytes->toPHPString(), 'UTF-16', 'UTF-8'); return new self($value_utf16); } public static function fromISO88591(ISO88591Bytes $iso88591_bytes) { // ... } // ... }
  31. 90.

    @arnoutboks #phpce18 More accurate types <?php class RealString { //

    ... public function toUTF8(): UTF8Bytes { $value_utf8 = mb_convert_encoding( $this->value_utf16, 'UTF-8', 'UTF-16'); return UTF8Bytes::fromPHPString($value_utf8); } public function toISO88591(): ISO88591Bytes { $value_iso88591 = mb_convert_encoding( $this->value_utf16, 'ISO-8859-1', 'UTF-16'); return ISO88591Bytes::fromPHPString($value_iso88591); } // ... }
  32. 91.

    @arnoutboks #phpce18 More accurate types <?php class RealString { //

    ... public function concat(RealString $other): RealString { return new RealString($this->value_utf16 . $other->value_utf16); } public function substring(int $start, int $length): RealString { $substr_utf16 = mb_substr($this->value_utf16, $start, $length, 'UTF-16'); return new RealString($substr_utf16); } // ... }
  33. 92.

    @arnoutboks #phpce18 More accurate types (usage) <?php $string1_utf8 = UTF8Bytes::fromPHPString(

    file_get_contents("file1.txt")); $string1 = RealString::fromUTF8($string1_utf8); $string2_iso88591 = ISO88591Bytes::fromPHPString( file_get_contents("file2.txt")); $string2 = RealString::fromISO88591($string2_iso88591); // The original charsets do not matter anymore now... $new_string = $string1->concat($string2)->substring(7, 42); header("Content-Type: text/plain; charset=utf-8"); print $new_string->toUTF8()->toPHPString();
  34. 95.

    @arnoutboks #phpce18 Recap • Data • Functions • Interpretation and

    representation • String escaping • Misrepresentation and –interpretation • The string type • Value objects
  35. 97.

    @arnoutboks #phpce18 Feedback & Questions @arnoutboks @arnoutboks @aboks Arnout Boks

    Please leave your feedback on joind.in: https://joind.in/talk/221c8
  36. 98.

    @arnoutboks #phpce18 Image Credits • https://pixabay.com/en/reading-relaxation-glasses-sight- 3088491/ • https://fr.m.wikipedia.org/wiki/Fichier:DARPA_Big_Data.jpg •

    https://www.flickr.com/photos/elefevre/3936916711 • https://www.flickr.com/photos/perspective/9045532603 • https://www.flickr.com/photos/wwarby/11644168395/ • https://www.flickr.com/photos/opengridscheduler/16480450157 • https://www.flickr.com/photos/cogdog/14401469262 • https://www.flickr.com/photos/paulsimpson1976/3998279762 • https://www.flickr.com/photos/pewari/3499963407