Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Database Abstractions and Where They Leak

Database Abstractions and Where They Leak

Doctrine DBAL is the most popular relational database abstraction library for PHP. Many developers prefer using a database abstraction since they usually provide a better developer experience than working directly with database drivers. However, despite the best effort of their designers, abstractions often make developers unhappy when it comes to using non-trivial database features.

Per Joel Spolsky[1], "all non-trivial abstractions, to some degree, are leaky". As a quite complex one, Doctrine DBAL leaks in a variety of aspects: drivers, SQL dialects, and schema management specifics.

In this talk, I will highlight the most notable inconsistencies of Doctrine DBAL APIs and provide recommendations on dealing with them.

[1] https://www.joelonsoftware.com/2002/11/11/the-law-of-leaky-abstractions/

Sergei Morozov

May 16, 2023
Tweet

More Decks by Sergei Morozov

Other Decks in Programming

Transcript

  1. “ All non-trivial abstractions, to some degree, are leaky. –

    Joel Spolsky 2 “ Photo from Joel on Software
  2. My Journey With Database Abstractions 3 – Maintainer of  Doctrine

    DBAL since 2017 – Retired DBAL 2, released DBAL 3; hope to release DBAL 4 – User of and contributor to    Debezium
  3. 29% of functional issues are platform- or driver-specific 4 51%

    of integration tests pass on all four major platforms only
  4. 6 – No de facto standard database platform – Portability

    increases reach – Enterprise customers prefer enterprise platforms Code Portability Photo by Ketut Subiyanto on Pexels
  5. 7 function getMessageSubjects( $conn, $userId ) { $query = <<<'SQL'

    SELECT subject FROM messages WHERE user_id=? SQL; $stmt = $conn->prepare($query); $stmt->bind_param('i', $userId); $stmt->execute(); $stmt->bind_result($subject); $subjects = []; while ($stmt->fetch() === true) { $subjects[] = $subject; } return $subjects; } Concrete mysqli Code † † – Error handling omitted for brevity. Do not try this at home!
  6. Abstract Code 8 function getMessageSubjects($conn, $userId) { $query = <<<'SQL'

    SELECT subject FROM messages WHERE user_id=? SQL; return $conn->fetchFirstColumn($query, [$userId]); }
  7. Connection::lastInsertId() 13 (1/3) ✔ It works on MySQL (2/3) ✔

    It works on PostgreSQL (3/3) ✔ It works on SQLite You support open-source platforms:
  8. Connection::lastInsertId() 14 (1/2) ✘ It works with oci8 The ID

    is always FALSE (2/2) ✘ It works with pdo_oci SQLSTATE[IM001]: Driver does not support lastInsertId() Run tests on Oracle Database:
  9. Connection::lastInsertId() 16 (1/4) ✘ It works on MySQL (2/4) ✔

    It works on PostgreSQL (3/4) ✘ It works on SQLite (4/4) ✔ It works on Oracle Database Test on all platforms: Operation AbstractPlatform::getIdentitySequenceName() is not supported by platform.
  10. Connection::lastInsertId() 17 if ($platform->usesSequenceEmulatedIdentityColumns()) { $seqName = $platform ->getIdentitySequenceName('gift_cards', 'id');

    return $conn->lastInsertId($seqName); } else { return $conn->lastInsertId(); } Branch based on support for sequences:
  11. Connection::lastInsertId() 18 Works on all platforms! (1/4) ✔ It works

    on MySQL (2/4) ✔ It works on PostgreSQL (3/4) ✔ It works on SQLite (4/4) ✔ It works on Oracle Database
  12. 19 Photo by Andrea Piacquadio on Pexels Alice [2:40pm]: Hello!

    Thank you for using our product. How can I assist you? Bob [2:42pm]: I'm having trouble redeeming my gift card. I receive an error message saying, "The card has already been redeemed."
  13. Connection::lastInsertId() 20 → New POST request from User #1 →

    New POST request from User #2 ← Gift card #12345 created for User #1 ← Gift card #12345 created for User #2 → New POST request from User #3 ← Gift card #12347 created for User #3 Logs:
  14. Connection::lastInsertId() 24 Leaks: 1. Consuming code should be aware of

    the target platform 2. Platform-specific behavior Mitigations: a) Use custom code for Oracle Database b) Generate identifiers in the application (UUID, ULID), not in the database
  15. BLOB Is a Special Type INT (4 bytes) DATETIME (8

    bytes) UUID (16 bytes) BLOB (up to 4 GB … 128 TB) 26
  16. Storing BLOBs as Streams 27 $stream = fopen($_FILES['image']['tmp_name'], 'rb'); $connection->executeStatement(

    'INSERT INTO images (image) VALUES (?)', [$stream], [ParameterType::LARGE_OBJECT], );
  17. Retrieving BLOBs as Streams 28 $stream = $connection->fetchOne( 'SELECT image

    FROM images WHERE id = ?', [$id], ); fpassthru($stream);
  18. 29 Icon by Becris on Flaticon → GET /images/01H0E04TSJB28ZD6PT6BZ2Q7S6 ←

    HTTP 1.1 200 OK Content-Type: image/jpeg ... → POST /images Content-Type: image/jpeg ... ← HTTP 1.1 201 Created Location: /images/01H0E0DBKDM73ESYWBCQYYYFXC Medical Information Service
  19. Testing With Small Files 30 (1/4) ✔ It uploads on

    MySQL (2/4) ✔ It downloads on MySQL (3/4) ✔ It uploads on PostgreSQL (4/4) ✔ It downloads on PostgreSQL
  20. Testing With Large Files 31 (1/4) ✔ It uploads on

    MySQL (2/4) ✘ It downloads on MySQL (3/4) ✘ It uploads on PostgreSQL (4/4) ✘ It downloads on PostgreSQL Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 54 bytes) Photo by Umanoide on Unsplash
  21. DBAL Memory Usage Issues 32 Driver Write Overhead Read Overhead

    3x 0 Not supported mysqli pdo_mysql pgsql pdo_pgsql sqlsrv pdo_sqlsrv oci8 pdo_oci 0 4x 4x 1x 5x 1x 2x 3x 2x 0
  22. Memory Usage With mysqli 33 if (is_string($value)) { $fp =

    fopen('php://temp', 'rb+'); fwrite($fp, $value); fseek($fp, 0); $value = $fp; }
  23. 35 $stmt = oci_parse($conn, ‘INSERT INTO IMAGES (IMAGE) VALUES (EMPTY_BLOB())

    RETURNING IMAGE INTO :image’); oci_bind_by_name($stmt, ':image', $lob, -1, OCI_B_BLOB); oci_execute($stmt, OCI_NO_AUTO_COMMIT); while (!feof($stream)) { $buf = fread($fp, 8192); $lob->write($buf); } oci_commit($conn); Storing BLOBs With oci8
  24. 36 $oid = pg_lo_create($conn); $lo = pg_lo_open($conn, $oid, 'w'); $fp

    = fopen($file, 'rb'); while (!feof($fp)) { $buf = fread($fp, 8192); pg_lo_write($fd, $buf); } Storing BLOBs With pgsql – BYTEA ≠ BLOB – BYTEA ≈ VARBINARY – Large Objects API
  25. BLOB Columns 37 Leaks: 1. Platform-specific behavior 2. Hidden costs

    Mitigations: a) Implement an API for your use case b) Use a specialized object store
  26. Allows you to define data types in a platform-agnostic way.

    38 Type Abstraction Photo by Daan Mooij on Unsplash
  27. 39 → GET /ips?domain=amazon.com ← ["54.239.28.85", "205.251.242.103"] → GET /domains?ip=217.147.83.68

    ← ["newforestclinics.co.uk", "elmscreative.co.uk"] → GET /domains?mask=99.84.0.0/16 ← ["aws.amazon.com", ...] Domain Name Service Icon by Freepik on Flaticon
  28. class Record { #[ORM\Column( type: 'string', length: 255, )] private

    string $domain; #[ORM\Column( type: 'integer', options: ['unsigned' => true], )] private int $ip; } 42 Database Schema DNS icons on Flaticon Maps and converts 4-byte integer values “
  29. Testing on MySQL 43 (1/2) ✔ It maps yahoo.com to

    74.6.231.21 (2/2) ✔ It maps google.com to 142.251.46.238
  30. Testing on PostgreSQL 44 (1/2) ✔ It maps yahoo.com to

    74.6.231.21 (2/2) ✘ It maps google.com to 142.251.46.238 Numeric value out of range: 7 ERROR: value "2398826222" is out of range for type integer
  31. Support For Unsigned Integers 45 $ git grep --name-only "unsigned"

    \ -- src/Platforms src/Platforms/AbstractMySQLPlatform.php src/Platforms/SQLitePlatform.php
  32. Unsigned Integers 46 Leaks: 1. Platform-specific behavior 2. Input-specific behavior

    Mitigations: a) Use a type with larger capacity b) Convert unsigned to signed and back
  33. Type Conversion 47 function toSignedInt32($unsigned) { if ($unsigned < 0x80000000)

    { return $unsigned; } return $unsigned - 0x100000000; } function toUnsignedInt32($signed) { if ($signed >= 0) { return $signed; } return $signed + 0x100000000; }
  34. 49 Business Process Engine $table = new Table('rules'); $table->addColumn('id', 'integer');

    $table->addColumn('name', 'string'); $table->addColumn('ignore', 'boolean'); Icon by Freepik on Flaticon
  35. 50 Business Process Engine $connection->insert('rules', [ 'id' => 1, 'name'

    => 'Ignore me', 'ignore' => true, ]); $connection->insert('rules', [ 'id' => 2, 'name' => 'Do not ignore me', 'ignore' => false, ]); Icon by Freepik on Flaticon
  36. Testing on MySQL 52 (1/2) ✔ It deploys schema (2/2)

    ✘ It creates rules You have an error in your SQL syntax [...] near 'ignore = ?' at line 1
  37. Column Name Semantics 54 Schema Manager The name is a

    value Query Builder The name is an SQL expression Transmission icons on Flaticon
  38. Reserved Keywords: The summary 55 Leaks: 1. Platform-specific behavior 2.

    Use case-specific behavior Mitigations: a) Do not use reserved keywords b) Quote names manually (this will have side effects on Oracle and DB2) c) Use table and column prefixes
  39. AbstractPlatform Leaks by Design 56 prefersIdentityColumns() supportsAlterTable() supportsColumnCollation() supportsColumnLengthIndexes() supportsCommentOnStatement()

    supportsCreateDropDatabase() supportsForeignKeyConstraints() supportsGettingAffectedRows() supportsIdentityColumns() supportsIndexes() supportsInlineColumnComments() supportsLimitOffset() supportsPartialIndexes() supportsPrimaryConstraints() supportsReleaseSavepoints() supportsSavepoints() supportsSchemas() supportsSequences() supportsTransactions() supportsViews() See the Tell, Don’t Ask principle.
  40. – Avoid database-centric architectures – Implement business logic in the

    application – Consider views, triggers and stored procedures technical debt Treat Database as Storage 64
  41. Put Abstractions at the Proper Level CRUD Query Builder UPSERT

    MySQL PostgreSQL MySQL UPSERT Query Builder PgSQL UPSERT Query Builder Driver DBAL ✘ Your abstraction 65
  42. Do Not Mock Things You Do Not Own Business Logic

    IRepository DbalRepository DBAL Database 3rd party Do not mock this Mock this Write integration tests for this Write unit tests for this 66
  43. Credits Special thanks to all the people who made and

    released these awesome resources for free: ╺ Presentation template by SlidesCarnival 68