Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Of representation and interpretation: A unified theory - Dutch PHP Conference 2019

Of representation and interpretation: A unified theory - Dutch PHP Conference 2019

Joind.in: https://joind.in/talk/4aaf0
Video recording: https://www.youtube.com/watch?v=K2zS6vbBb9A

Many hard problems in programming originate from one single source: not properly distinguishing the representation of data from the way it is interpreted. Have you ever written code that filters $_GET for SQL injection attempts? Struggled with timezones? Tried to get escaping right for Javascript in HTML? Detected the character encoding of a string? All are examples of this one problem.

In this talk we will look at some examples of the representation-interpretation problem and find the general pattern behind it. We will see how primitive types make it so hard for us to get this right, and how we can use value objects to steer us in the right direction. You’ll start finding many more examples of this pattern and understand them more easily.

Arnout Boks

June 07, 2019
Tweet

More Decks by Arnout Boks

Other Decks in Programming

Transcript

  1. Of representation and interpretation
    A unified theory
    @arnoutboks
    Arnout Boks
    #dpc19
    07-06-2019

    View full-size slide

  2. @arnoutboks #dpc19
    Hard problems in programming
    • Cache invalidation
    • Naming things
    • Off-by-one errors

    View full-size slide

  3. @arnoutboks #dpc19
    Hard problems in programming
    • String escaping
    • Timezones
    • Character encoding

    View full-size slide

  4. @arnoutboks #dpc19
    Many difficulties with
    these topics are related

    View full-size slide

  5. @arnoutboks #dpc19
    This talk is…
    • My personal view
    • Not an absolute truth
    • Meant to make you think

    View full-size slide

  6. Data
    And its meaning

    View full-size slide

  7. @arnoutboks #dpc19
    A number

    View full-size slide

  8. @arnoutboks #dpc19
    A byte

    View full-size slide

  9. @arnoutboks #dpc19
    Some JSON data
    {
    "income":100000
    }

    View full-size slide

  10. @arnoutboks #dpc19
    A word

    View full-size slide

  11. @arnoutboks #dpc19
    A prime number
    4931083597028501900275777672390764957284907772150208632080750184097926278850976588645578020
    1366007328679544734112831735367831201557535981978545054811571939345877330038009932619505876
    4525023820408110189885042615176579941704250889037029119015870030479432826073821469541570330
    2279875576818956016240300641115169008728798381942582716745647748166843479284645809291315318
    6007001004335318936319343912948604450370991980047709462921558180711169153031876288477878354
    1575932891093295447350881882465495060005019006274705305381164278294267474853496525745368151
    1706550281905552656221353146310421008662867971144467063669219825861581112515556504813420768
    6732340765505485910826956266693066236799702104812396562518006818323653959348395675357557532
    4619023481064700987753027956186892925380693305204238149969945456945774138335689906005870832
    1812704861133682026515905166351874029X18197693937677852928722109550412925792573818660584501
    5055250274994771883129310457698090915304613359419030258813205932277444385255046677902451869
    7062627788891979580423065750615669834695617797879659201644051939960716981112615195610276283
    2339825791423321726961443744381056485529348876349210309887028787453233132532122678633283702
    7925099749969488775936915917644588032718384740235933020374888506755706587919461134193230781
    4854436454375113207098606390746417564121635042388002967808558670370387509410769821183765499
    2052043682558546422885024299633226853691246485500075591664024729240716450725319674499952944
    8434741902107729606820558130923626837987951966199798285525887161096136561780745661592488660
    8898164568541721362920846656279131478466791550965154310113538586208196875836883595577893914
    5453935681996098808540476590735897289898342504712891841626587896821853808795627903997862944
    9397605467534821256750121517082737107646270712467532102483678159400087505452543537

    View full-size slide

  12. @arnoutboks #dpc19
    Data has a different meaning
    under different interpretations

    View full-size slide

  13. Functions
    A bit of theory

    View full-size slide

  14. @arnoutboks #dpc19
    Functions
    x
    y
    y = f(x)
    f

    View full-size slide

  15. @arnoutboks #dpc19
    Functions
    D R
    x
    y
    f: D → R
    y = f(x)

    View full-size slide

  16. @arnoutboks #dpc19
    Functions
    function f(D $x): R {
    $y = do_something_with($x);
    return $y;
    }

    View full-size slide

  17. @arnoutboks #dpc19
    Functions
    float int
    3.84
    4
    round: float → int
    4 = round(3.84)

    View full-size slide

  18. @arnoutboks #dpc19
    Functions
    D R
    x1
    y
    f: D → R
    x2

    View full-size slide

  19. @arnoutboks #dpc19
    Functions
    float int
    3.84
    4
    round: float → int
    4.35

    View full-size slide

  20. @arnoutboks #dpc19
    Functions
    D R
    x
    y1
    f: D → R
    y2

    View full-size slide

  21. @arnoutboks #dpc19
    Pure functions
    D R
    x
    y1
    f: D → R
    y2

    View full-size slide

  22. @arnoutboks #dpc19
    Impure functions
    • Randomness
    • State (global or local)
    • IO
    • Filesystem
    • Network
    • System clock

    View full-size slide

  23. @arnoutboks #dpc19
    Functions
    D R
    x
    ?
    f: D → R

    View full-size slide

  24. @arnoutboks #dpc19
    Functions (in mathematics)
    D R
    x
    f: D → R

    View full-size slide

  25. @arnoutboks #dpc19
    Functions (in programming)
    D R
    x
    Exception!
    f: D → R

    View full-size slide

  26. Representation &
    Interpretation

    View full-size slide

  27. @arnoutboks #dpc19
    Level of abstraction
    float int
    round: abstraction

    View full-size slide

  28. @arnoutboks #dpc19
    Level of abstraction
    Money
    string
    serialize_money: abstraction

    View full-size slide

  29. @arnoutboks #dpc19
    Level of abstraction
    string
    byte[]
    encode_as_utf16: abstraction

    View full-size slide

  30. @arnoutboks #dpc19
    Representation
    • Translation of a high-level concept
    to a lower-level representation
    • All input values supported
    • Lossless (ideally)
    • Same data, just in a different form

    View full-size slide

  31. @arnoutboks #dpc19
    Level of abstraction
    byte[]
    string
    decode_from_utf16: abstraction

    View full-size slide

  32. @arnoutboks #dpc19
    Level of abstraction
    string
    Money
    parse_money: abstraction

    View full-size slide

  33. @arnoutboks #dpc19
    Interpretation
    • Translation of a low-level representation
    (back) to a higher-level concept
    • Usually not all input values supported
    • Same data, just in a different form

    View full-size slide

  34. @arnoutboks #dpc19
    Why do we do this?

    View full-size slide

  35. @arnoutboks #dpc19
    Transmission of data
    Money
    string
    abstraction
    byte[]
    light pulses,
    magnetic grains,
    etc.

    View full-size slide

  36. Escaping
    A different view

    View full-size slide

  37. @arnoutboks #dpc19
    Escaping for SQL
    $username = $_POST['username'];
    $sql = "SELECT * FROM users WHERE username =
    '" . db_escape_string($username) . "'";
    $result = db_query($sql);

    View full-size slide

  38. @arnoutboks #dpc19
    Escaping for SQL
    // $_POST['username'] = "'; DROP TABLE users; --"
    $username = $_POST['username'];
    $sql = "SELECT * FROM users WHERE username =
    '" . db_escape_string($username) . "'";
    $result = db_query($sql);

    View full-size slide

  39. @arnoutboks #dpc19
    Escaping for SQL
    // $_POST['username'] = "John O'Shea"
    $username = $_POST['username'];
    $sql = "SELECT * FROM users WHERE username =
    '" . db_escape_string($username) . "'";
    $result = db_query($sql);

    View full-size slide

  40. @arnoutboks #dpc19
    Escaping for SQL
    // returns "John O'Shea"
    $username = get_username_using_api();
    $sql = "SELECT * FROM users WHERE username =
    '" . db_escape_string($username) . "'";
    $result = db_query($sql);

    View full-size slide

  41. @arnoutboks #dpc19
    Escaping for SQL
    // $username = "John O'Shea"
    $username = /* (whatever) */;
    $sql = "SELECT * FROM users WHERE username =
    '" . db_escape_string($username) . "'";
    $result = db_query($sql);

    View full-size slide

  42. @arnoutboks #dpc19
    String escaping is NOT
    a security measure

    View full-size slide

  43. @arnoutboks #dpc19
    String escaping is just
    properly representing data

    View full-size slide

  44. @arnoutboks #dpc19
    Escaping as representation
    string
    SQLFragment (?)
    db_escape_string:
    "O'Shea"
    "O\'Shea"

    View full-size slide

  45. @arnoutboks #dpc19
    Escaping as representation
    db_escape_string("John O'Shea");
    // -> "John O\'Shea"
    as_sql_string_literal("John O'Shea");
    // -> "'John O\'Shea'"
    Better:

    View full-size slide

  46. @arnoutboks #dpc19
    Escaping as representation
    string
    SQLStringLiteral
    as_sql_string_literal:
    "O'Shea"
    "'O\'Shea'"

    View full-size slide

  47. @arnoutboks #dpc19
    Chained escaping as representation
    $name = "John O'Shea";
    $js_name = as_js_string_literal($name);
    $js_script = "showGreeting(" . $js_name . ")";
    $html_tag = "Show greeting";
    $sql = "INSERT INTO html_examples VALUES("
    . as_sql_string_literal($html_tag) . ")";

    View full-size slide

  48. @arnoutboks #dpc19
    Premature representation
    foreach ($_REQUEST as $key => $value) {
    $_REQUEST[$key] = htmlentities(
    db_escape_string($value)
    );
    }
    // ‘Now the input data is safe’

    View full-size slide

  49. @arnoutboks #dpc19
    Premature representation
    foreach ($_REQUEST as $key => $value) {
    $_REQUEST[$key] = htmlentities(
    db_escape_string($value)
    );
    }
    // ‘Now the input data is safe’

    View full-size slide

  50. @arnoutboks #dpc19
    Premature representation
    foreach ($_REQUEST as $key => $value) {
    if (
    strpos($value, "DROP TABLE")
    !== false
    ) {
    die("SQL injection!");
    }
    }

    View full-size slide

  51. @arnoutboks #dpc19
    Premature representation
    foreach ($_REQUEST as $key => $value) {
    if (
    strpos($value, "DROP TABLE")
    !== false
    ) {
    die("SQL injection!");
    }
    }

    View full-size slide

  52. What goes wrong
    and how to prevent that

    View full-size slide

  53. @arnoutboks #dpc19
    Misrepresentation
    Money
    string
    serialize_money_ok:
    (€ 10)
    "€ 10"

    View full-size slide

  54. @arnoutboks #dpc19
    Misrepresentation
    Money
    string
    serialize_money_bad:
    (€ 10)
    "10"

    View full-size slide

  55. @arnoutboks #dpc19
    Misrepresentation
    Money
    string
    serialize_money_bad:
    (€ 10)
    "10"
    ($ 10)

    View full-size slide

  56. @arnoutboks #dpc19
    Misrepresentation
    Money
    string
    (€ 10)
    "10"
    ($ 10)
    Money
    ?

    View full-size slide

  57. @arnoutboks #dpc19
    That doesn’t happen to me!
    DateTime
    string
    2022-03-14 06:28:00
    Europe/Amsterdam
    2022-03-14T\
    06:28:00+01:00

    View full-size slide

  58. @arnoutboks #dpc19
    That doesn’t happen to me!
    fraction
    float
    1/3
    0.33333333333333

    View full-size slide

  59. @arnoutboks #dpc19
    That doesn’t happen to me!
    string
    byte[]
    ⛽☂️
    ?
    encode_as_iso8859_1

    View full-size slide

  60. @arnoutboks #dpc19
    Lossy representations
    cannot be re-interpreted

    View full-size slide

  61. @arnoutboks #dpc19
    Misinterpretation
    byte[]
    […]
    string
    "café"

    View full-size slide

  62. @arnoutboks #dpc19
    Misinterpretation
    byte[]
    […]
    string
    "café"
    utf8
    decode_from_utf8
    Exception!

    View full-size slide

  63. @arnoutboks #dpc19
    Misinterpretation
    byte[]
    […]
    string
    "café"
    iso8859-1
    utf8

    View full-size slide

  64. @arnoutboks #dpc19
    Misinterpretation
    byte[]
    […]
    string
    "café"
    iso8859-1
    utf8
    "café"

    View full-size slide

  65. @arnoutboks #dpc19
    Duck typing
    byte[]
    […]
    iso8859-1
    utf8

    View full-size slide

  66. @arnoutboks #dpc19
    Duck typing
    byte[]
    […]
    iso8859-1
    utf8
    HOW?

    View full-size slide

  67. @arnoutboks #dpc19
    The meaning of data comes
    from our interpretation of it…

    View full-size slide

  68. @arnoutboks #dpc19
    …which may very
    well be wrong

    View full-size slide

  69. @arnoutboks #dpc19
    Interpretation hints
    Store/transmit desired interpretation along with data
    Examples:
    • Music staves

    View full-size slide

  70. @arnoutboks #dpc19
    Interpretation hints
    Store/transmit desired interpretation along with data
    Examples:
    • HTTP headers
    • Content-Type
    • Content-Encoding
    • Content-Language
    • Unicode byte order mark

    View full-size slide

  71. @arnoutboks #dpc19
    Interpretation hints
    utf16 LE utf16 BE
    […]

    View full-size slide

  72. @arnoutboks #dpc19
    Interpretation hints
    utf16 LE
    with BOM
    utf16 BE
    with BOM
    […]

    View full-size slide

  73. @arnoutboks #dpc19
    Hints by variable naming
    Include intended interpretation in variable name:
    • $message_utf8
    • $username_sql
    • $title_html

    View full-size slide

  74. @arnoutboks #dpc19
    Hints by variable naming
    header("Content-Type: text/plain;
    charset=iso-8859-1");
    print $message_utf8;
    Charset mismatch between header and content

    View full-size slide

  75. @arnoutboks #dpc19
    Hints by variable naming
    $database->query("SELECT * FROM users
    WHERE username = " . $username);
    Username is not properly represented for SQL

    View full-size slide

  76. @arnoutboks #dpc19
    Hints by variable naming
    print "" . htmlentities($title_html) .
    "";
    Double escaping, title is already represented as HTML

    View full-size slide

  77. PHP’s string type
    A strange hybrid

    View full-size slide

  78. @arnoutboks #dpc19
    Role overloading of string
    string
    byte[]
    encode_as_utf16: abstraction

    View full-size slide

  79. @arnoutboks #dpc19
    PHP’s string type is
    actually a byte array

    View full-size slide

  80. @arnoutboks #dpc19
    Role overloading of string
    string
    = byte[]
    encode_as_utf16: abstraction

    View full-size slide

  81. @arnoutboks #dpc19
    Role overloading of string
    string
    = byte[]
    abstraction
    ‘real’ string
    use as if
    utf16

    View full-size slide

  82. @arnoutboks #dpc19
    Role overloading of string
    string
    = byte[]
    abstraction
    ‘real’ string
    use as if
    utf16
    use as if
    utf8

    View full-size slide

  83. @arnoutboks #dpc19
    More accurate types
    • A better type system makes mis-interpretation
    more difficult
    • If our programming language does not provide
    the types, we have to do it ourselves
    • Value Objects!

    View full-size slide

  84. @arnoutboks #dpc19
    More accurate types
    class UTF8Bytes {
    private string $bytes;
    private function __construct(string $bytes) {
    $this->bytes = $bytes;
    }
    public static function fromPHPString(string $bytes) {
    return new self($bytes);
    }
    public function toPHPString(): string {
    return $this->bytes;
    }
    // ...
    }

    View full-size slide

  85. @arnoutboks #dpc19
    More accurate types
    class ISO88591Bytes {
    private string $bytes;
    private function __construct(string $bytes) {
    $this->bytes = $bytes;
    }
    public static function fromPHPString(string $bytes) {
    return new self($bytes);
    }
    public function toPHPString(): string {
    return $this->bytes;
    }
    // ...
    }

    View full-size slide

  86. @arnoutboks #dpc19
    More accurate types
    class RealString {
    private string $value_utf16;
    private function __construct(string $value_utf16) {
    $this->value_utf16 = $value_utf16;
    }
    public static function fromUTF8(UTF8Bytes $utf8_bytes) {
    $value_utf16 = mb_convert_encoding(
    $utf8_bytes->toPHPString(), 'UTF-16', 'UTF-8');
    return new self($value_utf16);
    }
    public static function fromISO88591(ISO88591Bytes $iso88591_bytes) {
    // ...
    }
    // ...
    }

    View full-size slide

  87. @arnoutboks #dpc19
    More accurate types
    class RealString {
    // ...
    public function toUTF8(): UTF8Bytes {
    $value_utf8 = mb_convert_encoding(
    $this->value_utf16, 'UTF-8', 'UTF-16');
    return UTF8Bytes::fromPHPString($value_utf8);
    }
    public function toISO88591(): ISO88591Bytes {
    $value_iso88591 = mb_convert_encoding(
    $this->value_utf16, 'ISO-8859-1', 'UTF-16');
    return ISO88591Bytes::fromPHPString($value_iso88591);
    }
    // ...
    }

    View full-size slide

  88. @arnoutboks #dpc19
    More accurate types
    class RealString {
    // ...
    public function concat(RealString $other): RealString {
    return new RealString($this->value_utf16 .
    $other->value_utf16);
    }
    public function substring(int $start, int $length): RealString {
    $substr_utf16 = mb_substr($this->value_utf16,
    $start, $length, 'UTF-16');
    return new RealString($substr_utf16);
    }
    // ...
    }

    View full-size slide

  89. @arnoutboks #dpc19
    More accurate types (usage)
    $string1_utf8 = UTF8Bytes::fromPHPString(
    file_get_contents("file1.txt"));
    $string1 = RealString::fromUTF8($string1_utf8);
    $string2_iso88591 = ISO88591Bytes::fromPHPString(
    file_get_contents("file2.txt"));
    $string2 = RealString::fromISO88591($string2_iso88591);
    // The original charsets do not matter anymore now...
    $new_string = $string1->concat($string2)->substring(7, 42);
    header("Content-Type: text/plain; charset=utf-8");
    print $new_string->toUTF8()->toPHPString();

    View full-size slide

  90. @arnoutboks #dpc19
    Without value objects
    iso8859-1
    utf8
    string
    = byte[]
    abstraction

    View full-size slide

  91. @arnoutboks #dpc19
    With value objects
    abstraction
    UTF8Bytes ISO88591Bytes
    RealString

    View full-size slide

  92. @arnoutboks #dpc19
    Value objects
    • Achieve higher levels of abstraction
    • Avoid misinterpretation
    • No silver bullet!
    • Comes with additional overhead

    View full-size slide

  93. @arnoutboks #dpc19
    Recap
    • Data
    • Functions
    • Interpretation and representation
    • String escaping
    • Misrepresentation and –interpretation
    • The string type
    • Value objects

    View full-size slide

  94. @arnoutboks #dpc19
    The meaning of data is defined
    by how we interpret it

    View full-size slide

  95. @arnoutboks #dpc19
    Feedback & Questions
    @arnoutboks
    @arnoutboks
    @aboks
    Arnout Boks
    Please leave your feedback on joind.in:
    https://joind.in/talk/4aaf0
    We’re hiring!

    View full-size slide

  96. @arnoutboks #dpc19
    Image Credits
    • https://pixabay.com/en/reading-relaxation-glasses-sight-
    3088491/
    • https://fr.m.wikipedia.org/wiki/Fichier:DARPA_Big_Data.jpg
    • https://www.flickr.com/photos/elefevre/3936916711
    • https://www.flickr.com/photos/perspective/9045532603
    • https://www.flickr.com/photos/wwarby/11644168395/
    • https://www.flickr.com/photos/opengridscheduler/16480450157
    • https://www.flickr.com/photos/cogdog/14401469262
    • https://www.flickr.com/photos/paulsimpson1976/3998279762
    • https://www.flickr.com/photos/pewari/3499963407

    View full-size slide