Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Of representation and interpretation: A unified theory - phpDay 2019

Of representation and interpretation: A unified theory - phpDay 2019

Video recording: https://www.youtube.com/watch?v=Uvy26uMys2c
Joind.in: https://joind.in/talk/85813

Many hard problems in programming originate from one single source: not properly distinguishing the representation of data from the way it is interpreted. Have you ever written code that filters $_GET for SQL injection attempts? Struggled with timezones? Tried to get escaping right for Javascript in HTML? Detected the character encoding of a string? All are examples of this one problem. In this talk we will look at some examples of the representation-interpretation problem and find the general pattern behind it. We will see how primitive types make it so hard for us to get this right, and how we can use value objects to steer us in the right direction. Once you notice the pattern, you’ll be able to reason about and solve these problems much more easily. Contains: math, character sets, strong opinions on string escaping, and an almost illegal slide.

Arnout Boks

May 11, 2019
Tweet

More Decks by Arnout Boks

Other Decks in Programming

Transcript

  1. Of representation and interpretation
    A unified theory
    @arnoutboks
    Arnout Boks
    #phpday
    11-05-2019

    View Slide

  2. @arnoutboks #phpday
    Hard problems in programming
    • Cache invalidation
    • Naming things
    • Off-by-one errors

    View Slide

  3. @arnoutboks #phpday
    Hard problems in programming
    • String escaping
    • Timezones
    • Character encoding

    View Slide

  4. @arnoutboks #phpday
    Many difficulties with
    these topics are related

    View Slide

  5. @arnoutboks #phpday
    This talk is…
    • My personal view
    • Not an absolute truth
    • Meant to make you think

    View Slide

  6. Data
    And its meaning

    View Slide

  7. @arnoutboks #phpday
    A number

    View Slide

  8. @arnoutboks #phpday
    A byte

    View Slide

  9. @arnoutboks #phpday
    Some JSON data
    {
    "income":100000
    }

    View Slide

  10. @arnoutboks #phpday
    A word

    View Slide

  11. @arnoutboks #phpday
    A prime number
    4931083597028501900275777672390764957284907772150208632080750184097926278850976588645578020
    1366007328679544734112831735367831201557535981978545054811571939345877330038009932619505876
    4525023820408110189885042615176579941704250889037029119015870030479432826073821469541570330
    2279875576818956016240300641115169008728798381942582716745647748166843479284645809291315318
    6007001004335318936319343912948604450370991980047709462921558180711169153031876288477878354
    1575932891093295447350881882465495060005019006274705305381164278294267474853496525745368151
    1706550281905552656221353146310421008662867971144467063669219825861581112515556504813420768
    6732340765505485910826956266693066236799702104812396562518006818323653959348395675357557532
    4619023481064700987753027956186892925380693305204238149969945456945774138335689906005870832
    1812704861133682026515905166351874029X18197693937677852928722109550412925792573818660584501
    5055250274994771883129310457698090915304613359419030258813205932277444385255046677902451869
    7062627788891979580423065750615669834695617797879659201644051939960716981112615195610276283
    2339825791423321726961443744381056485529348876349210309887028787453233132532122678633283702
    7925099749969488775936915917644588032718384740235933020374888506755706587919461134193230781
    4854436454375113207098606390746417564121635042388002967808558670370387509410769821183765499
    2052043682558546422885024299633226853691246485500075591664024729240716450725319674499952944
    8434741902107729606820558130923626837987951966199798285525887161096136561780745661592488660
    8898164568541721362920846656279131478466791550965154310113538586208196875836883595577893914
    5453935681996098808540476590735897289898342504712891841626587896821853808795627903997862944
    9397605467534821256750121517082737107646270712467532102483678159400087505452543537

    View Slide

  12. @arnoutboks #phpday
    Data has a different meaning
    under different interpretations

    View Slide

  13. Functions
    A bit of theory

    View Slide

  14. @arnoutboks #phpday
    Functions
    x
    y
    y = f(x)
    f

    View Slide

  15. @arnoutboks #phpday
    Functions
    D R
    x
    y
    f: D → R
    y = f(x)

    View Slide

  16. @arnoutboks #phpday
    Functions
    function f(D $x): R {
    $y = do_something_with($x);
    return $y;
    }

    View Slide

  17. @arnoutboks #phpday
    Functions
    float int
    3.84
    4
    round: float → int
    4 = round(3.84)

    View Slide

  18. @arnoutboks #phpday
    Functions
    D R
    x1
    y
    f: D → R
    x2

    View Slide

  19. @arnoutboks #phpday
    Functions
    float int
    3.84
    4
    round: float → int
    4.35

    View Slide

  20. @arnoutboks #phpday
    Functions
    D R
    x
    y1
    f: D → R
    y2

    View Slide

  21. @arnoutboks #phpday
    Pure functions
    D R
    x
    y1
    f: D → R
    y2

    View Slide

  22. @arnoutboks #phpday
    Impure functions
    • Randomness
    • State (global or local)
    • IO
    • Filesystem
    • Network
    • System clock

    View Slide

  23. @arnoutboks #phpday
    Functions
    D R
    x
    ?
    f: D → R

    View Slide

  24. @arnoutboks #phpday
    Functions (in mathematics)
    D R
    x
    f: D → R

    View Slide

  25. @arnoutboks #phpday
    Functions (in programming)
    D R
    x
    Exception!
    f: D → R

    View Slide

  26. Representation &
    Interpretation

    View Slide

  27. @arnoutboks #phpday
    Level of abstraction
    float int
    round: abstraction

    View Slide

  28. @arnoutboks #phpday
    Level of abstraction
    Money
    string
    serialize_money: abstraction

    View Slide

  29. @arnoutboks #phpday
    Level of abstraction
    string
    byte[]
    encode_as_utf16: abstraction

    View Slide

  30. @arnoutboks #phpday
    Representation
    • Translation of a high-level concept
    to a lower-level representation
    • All input values supported
    • Lossless (ideally)
    • Same data, just in a different form

    View Slide

  31. @arnoutboks #phpday
    Level of abstraction
    byte[]
    string
    decode_from_utf16: abstraction

    View Slide

  32. @arnoutboks #phpday
    Level of abstraction
    string
    Money
    parse_money: abstraction

    View Slide

  33. @arnoutboks #phpday
    Interpretation
    • Translation of a low-level representation
    (back) to a higher-level concept
    • Usually not all input values supported
    • Same data, just in a different form

    View Slide

  34. @arnoutboks #phpday
    Why do we do this?

    View Slide

  35. View Slide

  36. View Slide

  37. @arnoutboks #phpday
    Transmission of data
    Money
    string
    abstraction
    byte[]
    light pulses,
    magnetic grains,
    etc.

    View Slide

  38. Escaping
    A different view

    View Slide

  39. @arnoutboks #phpday
    Escaping for SQL
    $username = $_POST['username'];
    $sql = "SELECT * FROM users WHERE username =
    '" . db_escape_string($username) . "'";
    $result = db_query($sql);

    View Slide

  40. @arnoutboks #phpday
    Escaping for SQL
    // $_POST['username'] = "'; DROP TABLE users; --"
    $username = $_POST['username'];
    $sql = "SELECT * FROM users WHERE username =
    '" . db_escape_string($username) . "'";
    $result = db_query($sql);

    View Slide

  41. @arnoutboks #phpday
    Escaping for SQL
    // $_POST['username'] = "John O'Shea"
    $username = $_POST['username'];
    $sql = "SELECT * FROM users WHERE username =
    '" . db_escape_string($username) . "'";
    $result = db_query($sql);

    View Slide

  42. @arnoutboks #phpday
    Escaping for SQL
    // returns "John O'Shea"
    $username = get_username_using_api();
    $sql = "SELECT * FROM users WHERE username =
    '" . db_escape_string($username) . "'";
    $result = db_query($sql);

    View Slide

  43. @arnoutboks #phpday
    Escaping for SQL
    // $username = "John O'Shea"
    $username = /* (whatever) */;
    $sql = "SELECT * FROM users WHERE username =
    '" . db_escape_string($username) . "'";
    $result = db_query($sql);

    View Slide

  44. @arnoutboks #phpday
    String escaping is NOT
    a security measure

    View Slide

  45. @arnoutboks #phpday
    String escaping is just
    properly representing data

    View Slide

  46. @arnoutboks #phpday
    Escaping as representation
    string
    SQLFragment (?)
    db_escape_string:
    "O'Shea"
    "O\'Shea"

    View Slide

  47. @arnoutboks #phpday
    Escaping as representation
    db_escape_string("John O'Shea");
    // -> "John O\'Shea"
    as_sql_string_literal("John O'Shea");
    // -> "'John O\'Shea'"
    Better:

    View Slide

  48. @arnoutboks #phpday
    Escaping as representation
    string
    SQLStringLiteral
    as_sql_string_literal:
    "O'Shea"
    "'O\'Shea'"

    View Slide

  49. @arnoutboks #phpday
    Chained escaping as representation
    $name = "John O'Shea";
    $js_name = as_js_string_literal($name);
    $js_script = "showGreeting(" . $js_name . ")";
    $html_tag = "Show greeting";
    $sql = "INSERT INTO html_examples VALUES("
    . as_sql_string_literal($html_tag) . ")";

    View Slide

  50. @arnoutboks #phpday
    Premature representation
    foreach ($_REQUEST as $key => $value) {
    $_REQUEST[$key] = htmlentities(
    db_escape_string($value)
    );
    }
    // ‘Now the input data is safe’

    View Slide

  51. @arnoutboks #phpday
    Premature representation
    foreach ($_REQUEST as $key => $value) {
    $_REQUEST[$key] = htmlentities(
    db_escape_string($value)
    );
    }
    // ‘Now the input data is safe’

    View Slide

  52. @arnoutboks #phpday
    Premature representation
    foreach ($_REQUEST as $key => $value) {
    if (
    strpos($value, "DROP TABLE")
    !== false
    ) {
    die("SQL injection!");
    }
    }

    View Slide

  53. @arnoutboks #phpday
    Premature representation
    foreach ($_REQUEST as $key => $value) {
    if (
    strpos($value, "DROP TABLE")
    !== false
    ) {
    die("SQL injection!");
    }
    }

    View Slide

  54. What goes wrong
    and how to prevent that

    View Slide

  55. @arnoutboks #phpday
    Misrepresentation
    Money
    string
    serialize_money_ok:
    (€ 10)
    "€ 10"

    View Slide

  56. @arnoutboks #phpday
    Misrepresentation
    Money
    string
    serialize_money_bad:
    (€ 10)
    "10"

    View Slide

  57. @arnoutboks #phpday
    Misrepresentation
    Money
    string
    serialize_money_bad:
    (€ 10)
    "10"
    ($ 10)

    View Slide

  58. @arnoutboks #phpday
    Misrepresentation
    Money
    string
    (€ 10)
    "10"
    ($ 10)
    Money
    ?

    View Slide

  59. @arnoutboks #phpday
    That doesn’t happen to me!
    DateTime
    string
    2022-03-14 06:28:00
    Europe/Rome
    2022-03-14T\
    06:28:00+01:00

    View Slide

  60. @arnoutboks #phpday
    That doesn’t happen to me!
    fraction
    float
    1/3
    0.33333333333333

    View Slide

  61. @arnoutboks #phpday
    That doesn’t happen to me!
    string
    byte[]
    ⛽☂️
    ?
    encode_as_iso8859_1

    View Slide

  62. @arnoutboks #phpday
    Lossy representations
    cannot be re-interpreted

    View Slide

  63. @arnoutboks #phpday
    Misinterpretation
    byte[]
    […]
    string
    "café"

    View Slide

  64. @arnoutboks #phpday
    Misinterpretation
    byte[]
    […]
    string
    "café"
    utf8
    decode_from_utf8
    Exception!

    View Slide

  65. @arnoutboks #phpday
    Misinterpretation
    byte[]
    […]
    string
    "café"
    iso8859-1
    utf8

    View Slide

  66. @arnoutboks #phpday
    Misinterpretation
    byte[]
    […]
    string
    "café"
    iso8859-1
    utf8
    "café"

    View Slide

  67. @arnoutboks #phpday
    Duck typing
    byte[]
    […]
    iso8859-1
    utf8

    View Slide

  68. @arnoutboks #phpday
    Duck typing
    byte[]
    […]
    iso8859-1
    utf8
    HOW?

    View Slide

  69. @arnoutboks #phpday
    The meaning of data comes
    from our interpretation of it…

    View Slide

  70. @arnoutboks #phpday
    …which may very
    well be wrong

    View Slide

  71. @arnoutboks #phpday
    Interpretation hints
    Store/transmit desired interpretation along with data
    Examples:
    • HTTP headers
    • Content-Type
    • Content-Encoding
    • Content-Language
    • UTF8 byte order mark

    View Slide

  72. @arnoutboks #phpday
    Interpretation hints
    Store/transmit desired interpretation along with data
    Examples:
    • Music staves

    View Slide

  73. @arnoutboks #phpday
    Interpretation hints
    utf16 LE utf16 BE
    […]

    View Slide

  74. @arnoutboks #phpday
    Interpretation hints
    utf16 LE
    with BOM
    utf16 BE
    with BOM
    […]

    View Slide

  75. @arnoutboks #phpday
    Hints by variable naming
    Include intended interpretation in variable name:
    • $message_utf8
    • $username_sql
    • $title_html

    View Slide

  76. @arnoutboks #phpday
    Hints by variable naming
    header("Content-Type: text/plain;
    charset=iso-8859-1");
    print $message_utf8;
    Charset mismatch between header and content

    View Slide

  77. @arnoutboks #phpday
    Hints by variable naming
    $database->query("SELECT * FROM users
    WHERE username = " . $username);
    Username is not properly represented for SQL

    View Slide

  78. @arnoutboks #phpday
    Hints by variable naming
    print "" . htmlentities($title_html) .
    "";
    Double escaping, title is already represented as HTML

    View Slide

  79. PHP’s string type
    A strange hybrid

    View Slide

  80. @arnoutboks #phpday
    Role overloading of string
    string
    byte[]
    encode_as_utf16: abstraction

    View Slide

  81. @arnoutboks #phpday
    PHP’s string type is
    actually a byte array

    View Slide

  82. @arnoutboks #phpday
    Role overloading of string
    string
    = byte[]
    encode_as_utf16: abstraction

    View Slide

  83. @arnoutboks #phpday
    Role overloading of string
    string
    = byte[]
    abstraction
    ‘real’ string
    use as if
    utf16

    View Slide

  84. @arnoutboks #phpday
    Role overloading of string
    string
    = byte[]
    abstraction
    ‘real’ string
    use as if
    utf16
    use as if
    utf8

    View Slide

  85. @arnoutboks #phpday
    More accurate types
    • A better type system makes mis-interpretation
    more difficult
    • If our programming language does not provide
    the types, we have to do it ourselves
    • Value Objects!

    View Slide

  86. @arnoutboks #phpday
    More accurate types
    class UTF8Bytes {
    private string $bytes;
    private function __construct(string $bytes) {
    $this->bytes = $bytes;
    }
    public static function fromPHPString(string $bytes) {
    return new self($bytes);
    }
    public function toPHPString(): string {
    return $this->bytes;
    }
    // ...
    }

    View Slide

  87. @arnoutboks #phpday
    More accurate types
    class UTF8Bytes {
    private string $bytes; // <- PHP 7.4: typed properties
    private function __construct(string $bytes) {
    $this->bytes = $bytes;
    }
    public static function fromPHPString(string $bytes) {
    return new self($bytes);
    }
    public function toPHPString(): string {
    return $this->bytes;
    }
    // ...
    }

    View Slide

  88. @arnoutboks #phpday
    More accurate types
    class ISO88591Bytes {
    private string $bytes;
    private function __construct(string $bytes) {
    $this->bytes = $bytes;
    }
    public static function fromPHPString(string $bytes) {
    return new self($bytes);
    }
    public function toPHPString(): string {
    return $this->bytes;
    }
    // ...
    }

    View Slide

  89. @arnoutboks #phpday
    More accurate types
    class RealString {
    private string $value_utf16;
    private function __construct(string $value_utf16) {
    $this->value_utf16 = $value_utf16;
    }
    public static function fromUTF8(UTF8Bytes $utf8_bytes) {
    $value_utf16 = mb_convert_encoding(
    $utf8_bytes->toPHPString(), 'UTF-16', 'UTF-8');
    return new self($value_utf16);
    }
    public static function fromISO88591(ISO88591Bytes $iso88591_bytes) {
    // ...
    }
    // ...
    }

    View Slide

  90. @arnoutboks #phpday
    More accurate types
    class RealString {
    // ...
    public function toUTF8(): UTF8Bytes {
    $value_utf8 = mb_convert_encoding(
    $this->value_utf16, 'UTF-8', 'UTF-16');
    return UTF8Bytes::fromPHPString($value_utf8);
    }
    public function toISO88591(): ISO88591Bytes {
    $value_iso88591 = mb_convert_encoding(
    $this->value_utf16, 'ISO-8859-1', 'UTF-16');
    return ISO88591Bytes::fromPHPString($value_iso88591);
    }
    // ...
    }

    View Slide

  91. @arnoutboks #phpday
    More accurate types
    class RealString {
    // ...
    public function concat(RealString $other): RealString {
    return new RealString($this->value_utf16 .
    $other->value_utf16);
    }
    public function substring(int $start, int $length): RealString {
    $substr_utf16 = mb_substr($this->value_utf16,
    $start, $length, 'UTF-16');
    return new RealString($substr_utf16);
    }
    // ...
    }

    View Slide

  92. @arnoutboks #phpday
    More accurate types (usage)
    $string1_utf8 = UTF8Bytes::fromPHPString(
    file_get_contents("file1.txt"));
    $string1 = RealString::fromUTF8($string1_utf8);
    $string2_iso88591 = ISO88591Bytes::fromPHPString(
    file_get_contents("file2.txt"));
    $string2 = RealString::fromISO88591($string2_iso88591);
    // The original charsets do not matter anymore now...
    $new_string = $string1->concat($string2)->substring(7, 42);
    header("Content-Type: text/plain; charset=utf-8");
    print $new_string->toUTF8()->toPHPString();

    View Slide

  93. @arnoutboks #phpday
    Without value objects
    iso8859-1
    utf8
    string
    = byte[]
    abstraction

    View Slide

  94. @arnoutboks #phpday
    With value objects
    abstraction
    UTF8Bytes ISO88591Bytes
    RealString

    View Slide

  95. @arnoutboks #phpday
    Value objects
    • Achieve higher levels of abstraction
    • Avoid misinterpretation
    • No silver bullet!
    • Comes with additional overhead

    View Slide

  96. @arnoutboks #phpday
    Recap
    • Data
    • Functions
    • Interpretation and representation
    • String escaping
    • Misrepresentation and –interpretation
    • The string type
    • Value objects

    View Slide

  97. @arnoutboks #phpday
    The meaning of data is defined
    by how we interpret it

    View Slide

  98. @arnoutboks #phpday
    Feedback & Questions
    @arnoutboks
    @arnoutboks
    @aboks
    Arnout Boks
    Please leave your feedback on joind.in:
    https://joind.in/talk/85813

    View Slide

  99. @arnoutboks #phpday
    Image Credits
    • https://pixabay.com/en/reading-relaxation-glasses-sight-
    3088491/
    • https://fr.m.wikipedia.org/wiki/Fichier:DARPA_Big_Data.jpg
    • https://www.flickr.com/photos/elefevre/3936916711
    • https://www.flickr.com/photos/perspective/9045532603
    • https://www.flickr.com/photos/wwarby/11644168395/
    • https://www.flickr.com/photos/opengridscheduler/16480450157
    • https://www.flickr.com/photos/cogdog/14401469262
    • https://www.flickr.com/photos/paulsimpson1976/3998279762
    • https://www.flickr.com/photos/pewari/3499963407

    View Slide