Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Longhorn PHP 2023: Database Abstractions and Where They Leak

Sergei Morozov
November 03, 2023
36

Longhorn PHP 2023: Database Abstractions and Where They Leak

Doctrine DBAL is the most popular relational database abstraction library for PHP. Many developers prefer using a database abstraction since they usually provide a better developer experience than working directly with database drivers. However, despite the best effort of their designers, abstractions often make developers unhappy when it comes to using non-trivial database features.

Per Joel Spolsky[1], "all non-trivial abstractions, to some degree, are leaky". As a quite complex one, Doctrine DBAL leaks in a variety of aspects: drivers, SQL dialects, and schema management specifics.

In this talk, I will highlight the most notable inconsistencies of Doctrine DBAL APIs and provide recommendations on dealing with them.

[1] https://www.joelonsoftware.com/2002/11/11/the-law-of-leaky-abstractions/

Sergei Morozov

November 03, 2023
Tweet

Transcript

  1. Database Abstractions
    and Where They Leak
    Sergei Morozov
    Longhorn PHP 2023 morozov

    View full-size slide

  2. “ All non-trivial abstractions,
    to some degree, are leaky.
    – Joel Spolsky
    2

    Photo from Joel on Software

    View full-size slide

  3. My Journey With
    Database Abstractions
    3
    – Enterprise application developer
    – Maintainer of  Doctrine DBAL since 2017
    – Retired DBAL 2, released DBAL 3
    and DBAL 4 RC1
    – User of and contributor to         Debezium

    View full-size slide

  4. 29%
    of functional issues and
    pull-requests are platform- or
    driver-specific
    4
    51%
    of integration tests pass on
    all four major platforms
    only

    View full-size slide

  5. 5
    Why Use
    Database
    Abstractions

    View full-size slide

  6. 6
    – No de facto standard
    database platform
    – Portability increases reach
    – Enterprise customers
    prefer enterprise platforms
    Code Portability
    Photo by Ketut Subiyanto on Pexels

    View full-size slide

  7. 7
    function getGroupUsernames(
    $conn, $groupId
    ) {
    $query = <<<'SQL'
    SELECT username
    FROM users
    WHERE group_id=?
    SQL;
    $stmt = $conn->prepare($query);
    $stmt->bind_param('i', $groupId);
    $stmt->execute();
    $stmt->bind_result($username);
    $usernames = [];
    while ($stmt->fetch() === true) {
    $usernames[] = $username;
    }
    return $usernames;
    }
    Concrete mysqli Code †
    † – Error handling omitted for brevity. Do not try this at home!

    View full-size slide

  8. Abstract Code
    8
    function getGroupUsernames($conn, $groupId) {
    $query = <<<'SQL'
    SELECT username FROM users WHERE group_id=?
    SQL;
    return $conn->fetchFirstColumn($query, [$groupId]);
    }

    View full-size slide

  9. 9
    So,
    Where
    Are the
    Leaks?

    View full-size slide

  10. Drivers allows the application to talk to the database.
    10
    Driver
    Abstraction

    View full-size slide

  11. 11
    POST /gift-cards insert()
    1 row affected
    lastInsertId()
    12345
    Location: /gift-cards/12345
    Connection::lastInsertId()

    View full-size slide

  12. Connection::lastInsertId()
    12
    $conn->insert('gift_cards', [
    'code' => generateCode(),
    ]);
    return $conn->lastInsertId();

    View full-size slide

  13. Connection::lastInsertId()
    13
    (1/3) ✔ It works on MySQL
    (2/3) ✔ It works on PostgreSQL
    (3/3) ✔ It works on SQLite
    You support open-source platforms:

    View full-size slide

  14. Connection::lastInsertId()
    14
    (1/2) ✘ It works with oci8
    The ID is always FALSE
    (2/2) ✘ It works with pdo_oci
    SQLSTATE[IM001]: Driver does
    not support lastInsertId()
    Run tests on Oracle Database:

    View full-size slide

  15. Connection::lastInsertId()
    15
    $seqName = $platform
    ->getIdentitySequenceName('gift_cards', 'id');
    return $conn->lastInsertId($seqName);
    Pass the sequence name:

    View full-size slide

  16. Connection::lastInsertId()
    16
    (1/4) ✘ It works on MySQL
    (2/4) ✔ It works on PostgreSQL
    (3/4) ✘ It works on SQLite
    (4/4) ✔ It works on Oracle Database
    Test on all platforms:
    Operation AbstractPlatform::getIdentitySequenceName()
    is not supported by platform.

    View full-size slide

  17. Connection::lastInsertId()
    17
    if ($platform->usesSequenceEmulatedIdentityColumns()) {
    $seqName = $platform
    ->getIdentitySequenceName('gift_cards', 'id');
    return $conn->lastInsertId($seqName);
    } else {
    return $conn->lastInsertId();
    }
    Branch based on support for sequences:

    View full-size slide

  18. Connection::lastInsertId()
    18
    Works on all platforms!
    (1/4) ✔ It works on MySQL
    (2/4) ✔ It works on PostgreSQL
    (3/4) ✔ It works on SQLite
    (4/4) ✔ It works on Oracle Database

    View full-size slide

  19. 19
    Photo by Andrea Piacquadio on Pexels
    Alice [2:40pm]:
    Hello! Thank you for using our product.
    How can I assist you?
    Bob [2:42pm]:
    I'm having trouble redeeming my gift
    card. I receive an error message saying,
    "The card has already been redeemed."

    View full-size slide

  20. Connection::lastInsertId()
    20
    → New POST request from User #1
    → New POST request from User #2
    ← Gift card #12345 created for User #1
    ← Gift card #12345 created for User #2
    → New POST request from User #3
    ← Gift card #12347 created for User #3
    Logs:

    View full-size slide

  21. Connection::lastInsertId()
    21
    SELECT GIFT_CARDS_SEQ.CURRVAL
    FROM DUAL
    How it works internally:

    View full-size slide

  22. Connection::lastInsertId()
    22
    INSERT INTO GIFT_CARDS
    VALUES (NULL, :CODE)
    RETURNING ID INTO :ID
    How it should work instead:

    View full-size slide

  23. Connection::lastInsertId()
    23
    if ($platform instanceof OraclePlatform) {
    return createViaLastInsertId($conn);
    } else {
    return createViaReturningInto($conn);
    }
    Resulting code:

    View full-size slide

  24. Connection::lastInsertId()
    24
    Leaks:
    1. Hidden violations of the API contract
    2. Consumers should be aware of the
    target platform
    Mitigations:
    a) Use custom code for Oracle Database
    b) Generate identifiers in the application
    (e.g. UUIDv7), not in the database

    View full-size slide

  25. BLOB Columns
    25

    View full-size slide

  26. BLOB Is a Special Type
    INT (4 bytes)
    DATETIME (8 bytes)
    UUID (16 bytes)
    BLOB (up to 4 GB … 128 TB)
    26

    View full-size slide

  27. Storing BLOBs as Streams
    27
    $stream = fopen($_FILES['image']['tmp_name'], 'rb');
    $connection->executeStatement(
    'INSERT INTO images (image) VALUES (?)',
    [$stream],
    [ParameterType::LARGE_OBJECT],
    );

    View full-size slide

  28. Retrieving BLOBs as Streams
    28
    $stream = $connection->fetchOne(
    'SELECT image FROM images WHERE id = ?',
    [$id],
    );
    fpassthru($stream);

    View full-size slide

  29. 29
    Icon by Becris on Flaticon
    → GET /images/01H0E04TSJB28ZD6PT6BZ2Q7S6
    ← HTTP 1.1 200 OK
    Content-Type: image/jpeg
    ...
    → POST /images
    Content-Type: image/jpeg
    ...
    ← HTTP 1.1 201 Created
    Location: /images/01H0E0DBKDM73ESYWBCQYYYFXC
    Medical Information Service

    View full-size slide

  30. Testing With Small Files
    30
    (1/4) ✔ It uploads on MySQL
    (2/4) ✔ It downloads on MySQL
    (3/4) ✔ It uploads on PostgreSQL
    (4/4) ✔ It downloads on PostgreSQL

    View full-size slide

  31. Testing With Large Files
    31
    (1/4) ✔ It uploads on MySQL
    (2/4) ✘ It downloads on MySQL
    (3/4) ✘ It uploads on PostgreSQL
    (4/4) ✘ It downloads on PostgreSQL
    Fatal error: Allowed memory size of 134217728 bytes exhausted
    (tried to allocate 54 bytes)
    Photo by Umanoide on Unsplash

    View full-size slide

  32. DBAL Memory Usage Issues
    32
    Driver Write Overhead Read Overhead
    3x
    0
    Not supported
    mysqli
    pdo_mysql
    pgsql
    pdo_pgsql
    sqlsrv
    pdo_sqlsrv
    oci8
    pdo_oci
    0
    4x
    4x
    1x
    5x
    1x
    2x
    3x
    2x
    0

    View full-size slide

  33. Memory Usage With mysqli
    33
    if (is_string($value)) {
    $fp = fopen('php://temp', 'rb+');
    fwrite($fp, $value);
    fseek($fp, 0);
    $value = $fp;
    }

    View full-size slide

  34. 34
    $stream = sqlsrv_get_field(
    $statement,
    0,
    SQLSRV_PHPTYPE_STREAM(SQLSRV_ENC_BINARY)
    );
    Retrieving BLOBs With sqlsrv

    View full-size slide

  35. 35
    $stmt = oci_parse($conn, ‘INSERT INTO IMAGES (IMAGE)
    VALUES (EMPTY_BLOB()) RETURNING IMAGE INTO :image’);
    oci_bind_by_name($stmt, ':image', $lob, -1,
    OCI_B_BLOB);
    oci_execute($stmt, OCI_NO_AUTO_COMMIT);
    while (!feof($stream)) {
    $buf = fread($fp, 8192);
    $lob->write($buf);
    }
    oci_commit($conn);
    Storing BLOBs With oci8

    View full-size slide

  36. 36
    $oid = pg_lo_create($conn);
    $lo = pg_lo_open($conn, $oid, 'w');
    $fp = fopen($file, 'rb');
    while (!feof($fp)) {
    $buf = fread($fp, 8192);
    pg_lo_write($fd, $buf);
    }
    Storing BLOBs With pgsql
    – BYTEA ≠ BLOB
    – BYTEA ≈ VARBINARY
    – Large Objects API

    View full-size slide

  37. BLOB Columns
    37

    Leaks:
    1. Platform-specific behavior
    2. Hidden costs
    Mitigations:
    a) Implement an API for your use case
    b) Use a specialized object store

    View full-size slide

  38. Allows you to define data types in a platform-agnostic way.
    38
    Type
    Abstraction
    Photo by Daan Mooij on Unsplash

    View full-size slide

  39. 39
    → GET /ips?domain=amazon.com
    ← ["54.239.28.85", "205.251.242.103"]
    → GET /domains?ip=217.147.83.68
    ← ["newforestclinics.co.uk",
    "elmscreative.co.uk"]
    → GET /domains?mask=99.84.0.0/16
    ← ["aws.amazon.com", ...]
    Domain Name Service
    Icon by Freepik on Flaticon

    View full-size slide

  40. IP Address Conversion
    40
    "50.19.225.50"
    840_163_634
    ip2long()
    long2ip()
    unsigned
    32-bit int

    View full-size slide

  41. Signed and Unsigned Variants
    41

    View full-size slide

  42. class Record
    {
    #[ORM\Column(
    type: 'string',
    length: 255,
    )]
    private string $domain;
    #[ORM\Column(
    type: 'integer',
    options: ['unsigned' => true],
    )]
    private int $ip;
    }
    42
    Database
    Schema
    DNS icons on Flaticon
    Maps and converts
    4-byte integer values

    View full-size slide

  43. Testing on MySQL
    43
    (1/2) ✔ It maps yahoo.com
    to 74.6.231.21
    (2/2) ✔ It maps google.com
    to 142.251.46.238

    View full-size slide

  44. Testing on PostgreSQL
    44
    (1/2) ✔ It maps yahoo.com
    to 74.6.231.21
    (2/2) ✘ It maps google.com
    to 142.251.46.238
    Numeric value out of range: 7 ERROR: value "2398826222" is out
    of range for type integer

    View full-size slide

  45. Support For Unsigned Integers
    45
    $ git grep --name-only "unsigned" \
    -- src/Platforms
    src/Platforms/AbstractMySQLPlatform.php
    src/Platforms/SQLitePlatform.php

    View full-size slide

  46. Unsigned Integers
    46

    Leaks:
    1. Platform-specific behavior
    2. Input-specific behavior
    Mitigations:
    a) Use a type with larger capacity
    b) Convert unsigned to signed and back

    View full-size slide

  47. Type Conversion
    47
    function toSignedInt32($unsigned) {
    if ($unsigned < 0x80000000) {
    return $unsigned;
    }
    return $unsigned - 0x100000000;
    }
    function toUnsignedInt32($signed) {
    if ($signed >= 0) {
    return $signed;
    }
    return $signed + 0x100000000;
    }

    View full-size slide

  48. Allow you to define schema and build queries as objects.
    48
    Schema & Query
    Abstractions

    View full-size slide

  49. 49
    Business Process Engine
    $table = new Table('rules');
    $table->addColumn('id', 'integer');
    $table->addColumn('name', 'string');
    $table->addColumn('ignore', 'boolean');
    Icon by Freepik on Flaticon

    View full-size slide

  50. 50
    Business Process Engine
    $connection->insert('rules', [
    'id' => 1,
    'name' => 'Ignore me',
    'ignore' => true,
    ]);
    $connection->insert('rules', [
    'id' => 2,
    'name' => 'Do not ignore me',
    'ignore' => false,
    ]);
    Icon by Freepik on Flaticon

    View full-size slide

  51. Testing on PostgreSQL
    51
    (1/2) ✔ It deploys schema
    (2/2) ✔ It creates rules

    View full-size slide

  52. Testing on MySQL
    52
    (1/2) ✔ It deploys schema
    (2/2) ✘ It creates rules
    You have an error in your SQL syntax [...] near 'ignore = ?' at line 1

    View full-size slide

  53. INSERT IGNORE on MySQL
    53
    INSERT IGNORE INTO
    subscribers(email)
    VALUES
    ('[email protected]'),
    ('[email protected]');

    View full-size slide

  54. Column Name Semantics
    54
    Schema Manager
    The name is a value
    Query Builder
    The name is an SQL expression
    Transmission icons on Flaticon

    View full-size slide

  55. Reserved Keywords: The summary
    55

    Leaks:
    1. Platform-specific behavior
    2. Use case-specific behavior
    Mitigations:
    a) Do not use reserved keywords
    b) Quote names manually (this will have
    side effects on Oracle and DB2)
    c) Use table and column prefixes

    View full-size slide

  56. AbstractPlatform Leaks by Design
    56
    prefersIdentityColumns()
    supportsAlterTable()
    supportsColumnCollation()
    supportsColumnLengthIndexes()
    supportsCommentOnStatement()
    supportsCreateDropDatabase()
    supportsForeignKeyConstraints()
    supportsGettingAffectedRows()
    supportsIdentityColumns()
    supportsIndexes()
    supportsInlineColumnComments()
    supportsLimitOffset()
    supportsPartialIndexes()
    supportsPrimaryConstraints()
    supportsReleaseSavepoints()
    supportsSavepoints()
    supportsSchemas()
    supportsSequences()
    supportsTransactions()
    supportsViews()
    See the Tell, Don’t Ask principle.

    View full-size slide

  57. 57
    Icon icons8.com

    View full-size slide

  58. Doctrine DBAL is
    popular despite its
    API design flaws
    Yin Yang Vectors by Vecteezy
    58

    View full-size slide

  59. Doctrine DBAL API is
    flawed because it is
    popular
    Yin Yang Vectors by Vecteezy
    59

    View full-size slide

  60. General-Purpose DBAL
    60
    Photo by Patrick on Unsplash

    View full-size slide

  61. 61
    Specialized DBAL
    “Home wifi upgrade” by Lars Plougmann (CC BY 2.0)

    View full-size slide

  62. 62
    “Swiss Army Knife” by Dave Taylor (CC BY 2.0)
    Popular General-Purpose DBAL

    View full-size slide

  63. How Can
    We Mitigate
    Leaks?
    63

    View full-size slide

  64. – Avoid database-centric
    architectures
    – Implement business logic in the
    application
    – Consider views, triggers and stored
    procedures technical debt
    Treat Database as Storage
    64

    View full-size slide

  65. Put Abstractions at the Proper Level
    CRUD
    Query Builder
    UPSERT
    MySQL PostgreSQL
    MySQL UPSERT
    Query Builder
    PgSQL UPSERT
    Query Builder
    Driver
    DBAL

    Your abstraction
    65

    View full-size slide

  66. Do Not Mock Things
    You Do Not Own
    Business Logic
    IRepository
    DbalRepository
    DBAL
    Database
    3rd party
    Do not mock this
    Mock this…
    Write integration
    tests for this
    Write unit
    tests for this
    66
    FakeRepository
    …or fake it

    View full-size slide

  67. joind.in/talk/cc304
    Leave your feedback on Joind.in:
    Thank You!
    67

    View full-size slide

  68. Credits
    Special thanks to all the people who made and released these
    awesome resources for free:
    ╺ Presentation template by SlidesCarnival
    68

    View full-size slide