WSC Database Workshop - Speaker Deck

Slide 1

Slide 1 text

Modern Database Design

Slide 2

Slide 2 text

Getting To Know You Have you… • …worked with SQL databases? • …worked with PostgreSQL? • …worked with MongoDB?

Slide 3

Slide 3 text

Getting To Know You Do you… • …use Laravel? • …use Symfony? • …no framework? • …use Doctrine?

Slide 4

Slide 4 text

No content

Slide 5

Slide 5 text

This Workshop • It uses some PHP • The main focus is databases • Database normalisation • Advanced PostgreSQL types and usage in PHP • Using MongoDB with PHP

Slide 6

Slide 6 text

The Setup Address Book Data { "name": "Andreas Braun", "email": [ { "label": "Private", "address": "[email protected]" }, { "label": "Work", "address": "[email protected]" } ], "phone": [ { "label": "Mobile", "number": "+491234567890" } ], "address": [ { "label": "Home", "line1": "...", "postCode": "...", "city": "...", "country": "..." } ] }

Slide 7

Slide 7 text

Exercise 1

Slide 8

Slide 8 text

Exercise 1 Design address book schema • Name (full name, optionally fi rst/last name) • Phone numbers • Email addresses • Addresses • Add labels (e.g. work, private) to data entries

Slide 9

Slide 9 text

Discuss Results

Slide 10

Slide 10 text

Database Normalisation

Slide 11

Slide 11 text

Database Normalisation Why normalise databases? • Reduce data redundancy • Improve data integrity • To use query languages like SQL • Codd, "A Relational Model of Data for Large Shared Data Banks", p. 381 • Avoid data anomalies • Minimise redesign when extending structure

Slide 12

Slide 12 text

Database Normalisation The World in 1970 • The Unix epoch started on Jan 1st • Intel released a DRAM chip with 1024 bit capacity • Apple and Microsoft did not exist, Linus Torvalds was just born (Dec 1969) • IBM released the System/370, a 32-bit mainframe

Slide 13

Slide 13 text

Source: IBM

Slide 14

Slide 14 text

No content

Slide 15

Slide 15 text

Not every normal form makes sense these days

Slide 16

Slide 16 text

Database Normalisation Reasons to not do it • Duplicate data to avoid joins to related tables • Group related, non-scalar data (e.g. embedded objects) • Store pre-aggregated data to avoid heavy loads

Slide 17

Slide 17 text

Your tools should help you, not hold you back!

Slide 18

Slide 18 text

Advanced Features

Slide 19

Slide 19 text

Advanced Features in PgSQL Check Constraints CREATE TABLE entry ( id UUID PRIMARY KEY, name TEXT CHECK (VALUE ~ '^.+ .+$') );

Slide 20

Slide 20 text

Advanced Features in PgSQL Domain Types CREATE DOMAIN phoneNumber AS TEXT CHECK (VALUE ~ '^[\d,\*#+ ]+$'); CREATE DOMAIN labelType AS VARCHAR(15) NOT NULL;

Slide 21

Slide 21 text

Advanced Features in PgSQL Composite Types CREATE TYPE emailType AS ( label labelType, address TEXT ); CREATE TABLE entry ( id UUID PRIMARY KEY, name TEXT NOT NULL, email emailType );

Slide 22

Slide 22 text

Advanced Features in PgSQL Composite Types (2) ROW('Home', '[email protected]') CAST(ROW('Home', '[email protected]') AS emailType) ROW('Home', '[email protected]')::emailType

Slide 23

Slide 23 text

Advanced Features in PgSQL Arrays CREATE TABLE entry ( id UUID PRIMARY KEY, name TEXT NOT NULL, emails TEXT[] );

Slide 24

Slide 24 text

Advanced Features in PgSQL Arrays (2) ARRAY['[email protected]', '[email protected]']

Slide 25

Slide 25 text

Advanced Features in PgSQL Composite Arrays CREATE TABLE entry ( id UUID PRIMARY KEY, name TEXT NOT NULL, emails emailType[] ); ARRAY[ ROW('Home', '[email protected]')::emailType, ROW('Work', '[email protected]')::emailType ]

Slide 26

Slide 26 text

What About JSON?

Slide 27

Slide 27 text

Exercise 2

Slide 28

Slide 28 text

Exercise 2a Use pgSQL features • Redesign your original database to fi t all data into a single table • Use composite types for embedded objects • Use arrays when storing lists of data • Create domain types and constraints if you’d like • Use sql fi ddle to build schema and insert a contact

Slide 29

Slide 29 text

Exercise 2b Write some queries • Find a record by email address • Add an email entry to a record • Remove an email entry from a record • Use UNNEST, ARRAY_APPEND, EXCEPT

Slide 30

Slide 30 text

Solutions

Slide 31

Slide 31 text

Exercise 2b Find a contact by email address SELECT contacts.* FROM contacts, UNNEST(emails) AS email WHERE email.address = '[email protected]';

Slide 32

Slide 32 text

Exercise 2b Add an email address UPDATE contacts SET emails = ARRAY_APPEND(emails, ROW('Other', '[email protected]')::emailType) WHERE id = 'fe40ef27-db32-46b6-9755-94220b67e935';

Slide 33

Slide 33 text

Exercise 2b Remove an email address UPDATE contacts SET emails = ARRAY( SELECT UNNEST(emails) EXCEPT ( SELECT ROW(label, address)::emailType FROM UNNEST(emails) as email WHERE email.address = '[email protected]' ) ) WHERE id = 'fe40ef27-db32-46b6-9755-94220b67e935';

Slide 34

Slide 34 text

What about PHP?

Slide 35

Slide 35 text

What about PHP? Inserting Data $sql = <<query($sql);

Slide 36

Slide 36 text

What about PHP? Reading Data $query = $connection->query('SELECT * FROM contacts'); $query->execute(); var_dump($query->fetch(PDO::FETCH_OBJ));

Slide 37

Slide 37 text

What about PHP? Oops… class stdClass#3 (5) { public $id => int(1) public $name => string(13) "Andreas Braun" public $email => string(27) "{"(Work,[email protected])"}" public $phone => string(26) "{"(Mobile,+491234567890)"}" public $address => string(56) "{"(Home,\"Street 1\",\"\",12345,\"Home City\",Germany)"}" }

Slide 38

Slide 38 text

What about PHP? What works with PDO? • Check constraints and domain types work • Arrays and composite types are returned as CSV • Row-based results don’t allow for complex structures • JSON can be a workaround (works with Doctrine)

Slide 39

Slide 39 text

No content

Slide 40

Slide 40 text

Let’s try MongoDB

Slide 41

Slide 41 text

pecl install mongodb

Slide 42

Slide 42 text

composer init —-name=wsc/address-book --require=mongodb/mongodb:^1.16 --require=ext-mongodb:^1.16 -n

Slide 43

Slide 43 text

mongodb+srv:// wsc:websummercamp@ wsc.jss3jsi.mongodb.net/? retryWrites=true&w=majority

Slide 44

Slide 44 text

MongoDB Some help to get started alcaeus->contacts; $collection->insertOne($data); var_dump($collection->findOne());

Slide 45

Slide 45 text

Exercise 3

Slide 46

Slide 46 text

Exercise 3 Write some queries in MongoDB • Find a record by email address • Add an email entry to a record • Remove an email entry from a record • Use $push and $pull

Slide 47

Slide 47 text

Exercise 3 Find a contact by email address $collection->findOne(['email.address' => '[email protected]']);

Slide 48

Slide 48 text

Exercise 3 Add an email address $collection->updateOne( ['name' => 'Andreas Braun'], ['$push' => [ 'email' => [ 'label' => 'Other', 'address' => '[email protected]', ], ]], );

Slide 49

Slide 49 text

Exercise 3 Remove an email address $collection->updateOne( ['name' => 'Andreas Braun'], ['$pull' => [ 'email' => ['address' => '[email protected]'], ]], );

Slide 50

Slide 50 text

Working With Larger Datasets

Slide 51

Slide 51 text

Working With Large Datasets From CSV… tankerkoenig-data/prices/2023/01 [master|✔] ❯❯❯ wc -l 2023-01-01-prices.csv 324618 2023-01-01-prices.csv

Slide 52

Slide 52 text

Working With Large Datasets From CSV… date,station_uuid,diesel,e5,e10,dieselchange,e5change,e10change 2023-01-01 00:00:10+01,4a6773b3-2c38-453c-812b-665be4ff6c3b,1.809,1.819,0.000,0,1,0 2023-01-01 00:00:10+01,24b93ce4-c8a6-4deb-836d-c061cb5b8654,1.839,1.729,1.669,1,0,0 2023-01-01 00:00:10+01,bce4b7c7-5f26-45bd-9906-c83388b24631,1.839,1.729,1.669,1,1,1 2023-01-01 00:00:10+01,a15722b3-7a77-49f9-a222-79c494391294,1.949,1.889,1.829,1,1,1 2023-01-01 00:00:10+01,790098a9-d408-4eaa-aef7-803399c062f2,1.809,1.689,1.629,1,1,1 2023-01-01 00:00:10+01,9542a4b5-86c5-4daf-a7a9-b3041622934c,1.819,1.669,1.609,1,1,1 2023-01-01 00:00:10+01,425a9715-1a1a-47ea-bdce-c6e4f0c3fc11,1.944,1.884,1.824,1,1,1 2023-01-01 00:00:10+01,5938437a-ea69-4513-9e78-e4bf52291ef1,1.819,1.669,1.609,1,1,1 2023-01-01 00:00:10+01,2ee78f2f-b4da-4bd9-9fe2-da7fc145b8d3,1.779,1.659,1.599,1,1,1 2023-01-01 00:00:10+01,dfc32f99-aa65-4fd4-b16e-a2cc53f9be15,1.949,1.889,1.829,1,1,1 ...

Slide 53

Slide 53 text

Working With Large Datasets …to a document { "date": "2023-01-01 00:00:10+01", "station_uuid": "4a6773b3-2c38-453c-812b-665be4ff6c3b", "diesel": "1.809", "e5": "1.819", "e10": "0.000", "dieselchange": "0", "e5change": "1", "e10change": "0" }

Slide 54

Slide 54 text

Working With Large Datasets …a messy document { "_id": "44e2bdb7-13e3-4156-8576-8326cdd20459", "name": "bft Tankstelle", "brand": "", "street": "Schellengasse ", "house_number": "53", "post_code": "36304", "city": "Alsfeld", "latitude": "50.7520089", "longitude": "9.2790394", "first_active": "1970-01-01 01:00:00+01" }

Slide 55

Slide 55 text

Working With Large Datasets Cleaning up the mess • Use sensible data types (everything is a string) • Structure data to accommodate queries • What do we want to query? • Price statistics per fuel type, post code, and brand • Let’s write some PHP • Or not…

Slide 56

Slide 56 text

Working With Large Datasets Cleaning up with PHP foreach ($collection->find() as $document) { $collection->updateOne( ['_id' => $document->_id], [ 'date' => new UTCDateTime(strtotime($document->date) * 1000), 'diesel' => (float) $document->diesel, 'dieselchange' => (bool) $document->dieselchange, 'e10' => (float) $document->e10, 'e10change' => (bool) $document->e10change, 'e5' => (float) $document->e5, 'e5change' => (bool) $document->e5change, ] ); }

Slide 57

Slide 57 text

Aggregate Is The New Find

Slide 58

Slide 58 text

Working With Large Datasets Using aggregation pipeline $cursor = $collection->aggregate([ [ '$addFields' => [ 'date' => ['$dateFromString' => ['dateString' => '$date']], 'diesel' => ['$toDouble' => '$diesel'], 'dieselchange' => ['$toBool' => ['$toInt' => '$dieselchange']], 'e10' => ['$toDouble' => '$e10'], 'e10change' => ['$toBool' => ['$toInt' => '$e10change']], 'e5' => ['$toDouble' => '$e5'], 'e5change' => ['$toBool' => ['$toInt' => '$e5change']] ]], ]);

Slide 59

Slide 59 text

Working With Large Datasets Remember this? SELECT request_mnth, round(avg(result), 2) AS number FROM (SELECT request_mnth, CAST(abs(division_result_one-division_result_two) AS decimal) AS result FROM (SELECT to_char(CAST(request_date AS date), 'YYYY-MM') AS request_mnth, distance_to_travel/monetary_cost AS division_result_one, sum(distance_to_travel) OVER (PARTITION BY to_char(CAST(request_date AS date), 'YYYY-MM')) / sum(monetary_cost) OVER (PARTITION BY to_char(CAST(request_date AS date), 'YYYY-MM')) AS division_result_two FROM uber_request_logs) a) b GROUP BY request_mnth

Slide 60

Slide 60 text

Working With Large Datasets Is your data prepared correctly? • Pretend it’s January 1st, 1 pm • You’re in Kissing, Germany (48.3 N, 10.983333 E) • Find the cheapest station selling E10 in a 5km radius