Slide 1

Slide 1 text

Database Abstractions and Where They Leak Sergei Morozov Longhorn PHP 2023 morozov

Slide 2

Slide 2 text

“ All non-trivial abstractions, to some degree, are leaky. – Joel Spolsky 2 “ Photo from Joel on Software

Slide 3

Slide 3 text

My Journey With Database Abstractions 3 – Enterprise application developer – Maintainer of  Doctrine DBAL since 2017 – Retired DBAL 2, released DBAL 3 and DBAL 4 RC1 – User of and contributor to         Debezium

Slide 4

Slide 4 text

29% of functional issues and pull-requests are platform- or driver-specific 4 51% of integration tests pass on all four major platforms only

Slide 5

Slide 5 text

5 Why Use Database Abstractions

Slide 6

Slide 6 text

6 – No de facto standard database platform – Portability increases reach – Enterprise customers prefer enterprise platforms Code Portability Photo by Ketut Subiyanto on Pexels

Slide 7

Slide 7 text

7 function getGroupUsernames( $conn, $groupId ) { $query = <<<'SQL' SELECT username FROM users WHERE group_id=? SQL; $stmt = $conn->prepare($query); $stmt->bind_param('i', $groupId); $stmt->execute(); $stmt->bind_result($username); $usernames = []; while ($stmt->fetch() === true) { $usernames[] = $username; } return $usernames; } Concrete mysqli Code † † – Error handling omitted for brevity. Do not try this at home!

Slide 8

Slide 8 text

Abstract Code 8 function getGroupUsernames($conn, $groupId) { $query = <<<'SQL' SELECT username FROM users WHERE group_id=? SQL; return $conn->fetchFirstColumn($query, [$groupId]); }

Slide 9

Slide 9 text

9 So, Where Are the Leaks?

Slide 10

Slide 10 text

Drivers allows the application to talk to the database. 10 Driver Abstraction

Slide 11

Slide 11 text

11 POST /gift-cards insert() 1 row affected lastInsertId() 12345 Location: /gift-cards/12345 Connection::lastInsertId()

Slide 12

Slide 12 text

Connection::lastInsertId() 12 $conn->insert('gift_cards', [ 'code' => generateCode(), ]); return $conn->lastInsertId();

Slide 13

Slide 13 text

Connection::lastInsertId() 13 (1/3) ✔ It works on MySQL (2/3) ✔ It works on PostgreSQL (3/3) ✔ It works on SQLite You support open-source platforms:

Slide 14

Slide 14 text

Connection::lastInsertId() 14 (1/2) ✘ It works with oci8 The ID is always FALSE (2/2) ✘ It works with pdo_oci SQLSTATE[IM001]: Driver does not support lastInsertId() Run tests on Oracle Database:

Slide 15

Slide 15 text

Connection::lastInsertId() 15 $seqName = $platform ->getIdentitySequenceName('gift_cards', 'id'); return $conn->lastInsertId($seqName); Pass the sequence name:

Slide 16

Slide 16 text

Connection::lastInsertId() 16 (1/4) ✘ It works on MySQL (2/4) ✔ It works on PostgreSQL (3/4) ✘ It works on SQLite (4/4) ✔ It works on Oracle Database Test on all platforms: Operation AbstractPlatform::getIdentitySequenceName() is not supported by platform.

Slide 17

Slide 17 text

Connection::lastInsertId() 17 if ($platform->usesSequenceEmulatedIdentityColumns()) { $seqName = $platform ->getIdentitySequenceName('gift_cards', 'id'); return $conn->lastInsertId($seqName); } else { return $conn->lastInsertId(); } Branch based on support for sequences:

Slide 18

Slide 18 text

Connection::lastInsertId() 18 Works on all platforms! (1/4) ✔ It works on MySQL (2/4) ✔ It works on PostgreSQL (3/4) ✔ It works on SQLite (4/4) ✔ It works on Oracle Database

Slide 19

Slide 19 text

19 Photo by Andrea Piacquadio on Pexels Alice [2:40pm]: Hello! Thank you for using our product. How can I assist you? Bob [2:42pm]: I'm having trouble redeeming my gift card. I receive an error message saying, "The card has already been redeemed."

Slide 20

Slide 20 text

Connection::lastInsertId() 20 → New POST request from User #1 → New POST request from User #2 ← Gift card #12345 created for User #1 ← Gift card #12345 created for User #2 → New POST request from User #3 ← Gift card #12347 created for User #3 Logs:

Slide 21

Slide 21 text

Connection::lastInsertId() 21 SELECT GIFT_CARDS_SEQ.CURRVAL FROM DUAL How it works internally:

Slide 22

Slide 22 text

Connection::lastInsertId() 22 INSERT INTO GIFT_CARDS VALUES (NULL, :CODE) RETURNING ID INTO :ID How it should work instead:

Slide 23

Slide 23 text

Connection::lastInsertId() 23 if ($platform instanceof OraclePlatform) { return createViaLastInsertId($conn); } else { return createViaReturningInto($conn); } Resulting code:

Slide 24

Slide 24 text

Connection::lastInsertId() 24 Leaks: 1. Hidden violations of the API contract 2. Consumers should be aware of the target platform Mitigations: a) Use custom code for Oracle Database b) Generate identifiers in the application (e.g. UUIDv7), not in the database

Slide 25

Slide 25 text

BLOB Columns 25

Slide 26

Slide 26 text

BLOB Is a Special Type INT (4 bytes) DATETIME (8 bytes) UUID (16 bytes) BLOB (up to 4 GB … 128 TB) 26

Slide 27

Slide 27 text

Storing BLOBs as Streams 27 $stream = fopen($_FILES['image']['tmp_name'], 'rb'); $connection->executeStatement( 'INSERT INTO images (image) VALUES (?)', [$stream], [ParameterType::LARGE_OBJECT], );

Slide 28

Slide 28 text

Retrieving BLOBs as Streams 28 $stream = $connection->fetchOne( 'SELECT image FROM images WHERE id = ?', [$id], ); fpassthru($stream);

Slide 29

Slide 29 text

29 Icon by Becris on Flaticon → GET /images/01H0E04TSJB28ZD6PT6BZ2Q7S6 ← HTTP 1.1 200 OK Content-Type: image/jpeg ... → POST /images Content-Type: image/jpeg ... ← HTTP 1.1 201 Created Location: /images/01H0E0DBKDM73ESYWBCQYYYFXC Medical Information Service

Slide 30

Slide 30 text

Testing With Small Files 30 (1/4) ✔ It uploads on MySQL (2/4) ✔ It downloads on MySQL (3/4) ✔ It uploads on PostgreSQL (4/4) ✔ It downloads on PostgreSQL

Slide 31

Slide 31 text

Testing With Large Files 31 (1/4) ✔ It uploads on MySQL (2/4) ✘ It downloads on MySQL (3/4) ✘ It uploads on PostgreSQL (4/4) ✘ It downloads on PostgreSQL Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 54 bytes) Photo by Umanoide on Unsplash

Slide 32

Slide 32 text

DBAL Memory Usage Issues 32 Driver Write Overhead Read Overhead 3x 0 Not supported mysqli pdo_mysql pgsql pdo_pgsql sqlsrv pdo_sqlsrv oci8 pdo_oci 0 4x 4x 1x 5x 1x 2x 3x 2x 0

Slide 33

Slide 33 text

Memory Usage With mysqli 33 if (is_string($value)) { $fp = fopen('php://temp', 'rb+'); fwrite($fp, $value); fseek($fp, 0); $value = $fp; }

Slide 34

Slide 34 text

34 $stream = sqlsrv_get_field( $statement, 0, SQLSRV_PHPTYPE_STREAM(SQLSRV_ENC_BINARY) ); Retrieving BLOBs With sqlsrv

Slide 35

Slide 35 text

35 $stmt = oci_parse($conn, ‘INSERT INTO IMAGES (IMAGE) VALUES (EMPTY_BLOB()) RETURNING IMAGE INTO :image’); oci_bind_by_name($stmt, ':image', $lob, -1, OCI_B_BLOB); oci_execute($stmt, OCI_NO_AUTO_COMMIT); while (!feof($stream)) { $buf = fread($fp, 8192); $lob->write($buf); } oci_commit($conn); Storing BLOBs With oci8

Slide 36

Slide 36 text

36 $oid = pg_lo_create($conn); $lo = pg_lo_open($conn, $oid, 'w'); $fp = fopen($file, 'rb'); while (!feof($fp)) { $buf = fread($fp, 8192); pg_lo_write($fd, $buf); } Storing BLOBs With pgsql – BYTEA ≠ BLOB – BYTEA ≈ VARBINARY – Large Objects API

Slide 37

Slide 37 text

BLOB Columns 37 Leaks: 1. Platform-specific behavior 2. Hidden costs Mitigations: a) Implement an API for your use case b) Use a specialized object store

Slide 38

Slide 38 text

Allows you to define data types in a platform-agnostic way. 38 Type Abstraction Photo by Daan Mooij on Unsplash

Slide 39

Slide 39 text

39 → GET /ips?domain=amazon.com ← ["54.239.28.85", "205.251.242.103"] → GET /domains?ip=217.147.83.68 ← ["newforestclinics.co.uk", "elmscreative.co.uk"] → GET /domains?mask=99.84.0.0/16 ← ["aws.amazon.com", ...] Domain Name Service Icon by Freepik on Flaticon

Slide 40

Slide 40 text

IP Address Conversion 40 "50.19.225.50" 840_163_634 ip2long() long2ip() unsigned 32-bit int

Slide 41

Slide 41 text

Signed and Unsigned Variants 41

Slide 42

Slide 42 text

class Record { #[ORM\Column( type: 'string', length: 255, )] private string $domain; #[ORM\Column( type: 'integer', options: ['unsigned' => true], )] private int $ip; } 42 Database Schema DNS icons on Flaticon Maps and converts 4-byte integer values “

Slide 43

Slide 43 text

Testing on MySQL 43 (1/2) ✔ It maps yahoo.com to 74.6.231.21 (2/2) ✔ It maps google.com to 142.251.46.238

Slide 44

Slide 44 text

Testing on PostgreSQL 44 (1/2) ✔ It maps yahoo.com to 74.6.231.21 (2/2) ✘ It maps google.com to 142.251.46.238 Numeric value out of range: 7 ERROR: value "2398826222" is out of range for type integer

Slide 45

Slide 45 text

Support For Unsigned Integers 45 $ git grep --name-only "unsigned" \ -- src/Platforms src/Platforms/AbstractMySQLPlatform.php src/Platforms/SQLitePlatform.php

Slide 46

Slide 46 text

Unsigned Integers 46 Leaks: 1. Platform-specific behavior 2. Input-specific behavior Mitigations: a) Use a type with larger capacity b) Convert unsigned to signed and back

Slide 47

Slide 47 text

Type Conversion 47 function toSignedInt32($unsigned) { if ($unsigned < 0x80000000) { return $unsigned; } return $unsigned - 0x100000000; } function toUnsignedInt32($signed) { if ($signed >= 0) { return $signed; } return $signed + 0x100000000; }

Slide 48

Slide 48 text

Allow you to define schema and build queries as objects. 48 Schema & Query Abstractions

Slide 49

Slide 49 text

49 Business Process Engine $table = new Table('rules'); $table->addColumn('id', 'integer'); $table->addColumn('name', 'string'); $table->addColumn('ignore', 'boolean'); Icon by Freepik on Flaticon

Slide 50

Slide 50 text

50 Business Process Engine $connection->insert('rules', [ 'id' => 1, 'name' => 'Ignore me', 'ignore' => true, ]); $connection->insert('rules', [ 'id' => 2, 'name' => 'Do not ignore me', 'ignore' => false, ]); Icon by Freepik on Flaticon

Slide 51

Slide 51 text

Testing on PostgreSQL 51 (1/2) ✔ It deploys schema (2/2) ✔ It creates rules

Slide 52

Slide 52 text

Testing on MySQL 52 (1/2) ✔ It deploys schema (2/2) ✘ It creates rules You have an error in your SQL syntax [...] near 'ignore = ?' at line 1

Slide 53

Slide 53 text

INSERT IGNORE on MySQL 53 INSERT IGNORE INTO subscribers(email) VALUES ('john@example.com'), ('jane@example.com');

Slide 54

Slide 54 text

Column Name Semantics 54 Schema Manager The name is a value Query Builder The name is an SQL expression Transmission icons on Flaticon

Slide 55

Slide 55 text

Reserved Keywords: The summary 55 Leaks: 1. Platform-specific behavior 2. Use case-specific behavior Mitigations: a) Do not use reserved keywords b) Quote names manually (this will have side effects on Oracle and DB2) c) Use table and column prefixes

Slide 56

Slide 56 text

AbstractPlatform Leaks by Design 56 prefersIdentityColumns() supportsAlterTable() supportsColumnCollation() supportsColumnLengthIndexes() supportsCommentOnStatement() supportsCreateDropDatabase() supportsForeignKeyConstraints() supportsGettingAffectedRows() supportsIdentityColumns() supportsIndexes() supportsInlineColumnComments() supportsLimitOffset() supportsPartialIndexes() supportsPrimaryConstraints() supportsReleaseSavepoints() supportsSavepoints() supportsSchemas() supportsSequences() supportsTransactions() supportsViews() See the Tell, Don’t Ask principle.

Slide 57

Slide 57 text

57 Icon icons8.com

Slide 58

Slide 58 text

Doctrine DBAL is popular despite its API design flaws Yin Yang Vectors by Vecteezy 58

Slide 59

Slide 59 text

Doctrine DBAL API is flawed because it is popular Yin Yang Vectors by Vecteezy 59

Slide 60

Slide 60 text

General-Purpose DBAL 60 Photo by Patrick on Unsplash

Slide 61

Slide 61 text

61 Specialized DBAL “Home wifi upgrade” by Lars Plougmann (CC BY 2.0)

Slide 62

Slide 62 text

62 “Swiss Army Knife” by Dave Taylor (CC BY 2.0) Popular General-Purpose DBAL

Slide 63

Slide 63 text

How Can We Mitigate Leaks? 63

Slide 64

Slide 64 text

– Avoid database-centric architectures – Implement business logic in the application – Consider views, triggers and stored procedures technical debt Treat Database as Storage 64

Slide 65

Slide 65 text

Put Abstractions at the Proper Level CRUD Query Builder UPSERT MySQL PostgreSQL MySQL UPSERT Query Builder PgSQL UPSERT Query Builder Driver DBAL ✘ Your abstraction 65

Slide 66

Slide 66 text

Do Not Mock Things You Do Not Own Business Logic IRepository DbalRepository DBAL Database 3rd party Do not mock this Mock this… Write integration tests for this Write unit tests for this 66 FakeRepository …or fake it

Slide 67

Slide 67 text

joind.in/talk/cc304 Leave your feedback on Joind.in: Thank You! 67

Slide 68

Slide 68 text

Credits Special thanks to all the people who made and released these awesome resources for free: ╺ Presentation template by SlidesCarnival 68