WSC Database Workshop

Modern Database Design

Getting To Know You Have you… • …worked with SQL
databases? • …worked with PostgreSQL? • …worked with MongoDB?

Getting To Know You Do you… • …use Laravel? •
…use Symfony? • …no framework? • …use Doctrine?

This Workshop • It uses some PHP • The main
focus is databases • Database normalisation • Advanced PostgreSQL types and usage in PHP • Using MongoDB with PHP

The Setup Address Book Data { "name": "Andreas Braun", "email":
[ { "label": "Private", "address": "[email protected]" }, { "label": "Work", "address": "[email protected]" } ], "phone": [ { "label": "Mobile", "number": "+491234567890" } ], "address": [ { "label": "Home", "line1": "...", "postCode": "...", "city": "...", "country": "..." } ] }

Exercise 1

Exercise 1 Design address book schema • Name (full name,
optionally fi rst/last name) • Phone numbers • Email addresses • Addresses • Add labels (e.g. work, private) to data entries

Discuss Results

Database Normalisation

Database Normalisation Why normalise databases? • Reduce data redundancy •
Improve data integrity • To use query languages like SQL • Codd, "A Relational Model of Data for Large Shared Data Banks", p. 381 • Avoid data anomalies • Minimise redesign when extending structure

Database Normalisation The World in 1970 • The Unix epoch
started on Jan 1st • Intel released a DRAM chip with 1024 bit capacity • Apple and Microsoft did not exist, Linus Torvalds was just born (Dec 1969) • IBM released the System/370, a 32-bit mainframe

Source: IBM

Not every normal form makes sense these days

Database Normalisation Reasons to not do it • Duplicate data
to avoid joins to related tables • Group related, non-scalar data (e.g. embedded objects) • Store pre-aggregated data to avoid heavy loads

Your tools should help you, not hold you back!

Advanced Features

Advanced Features in PgSQL Check Constraints CREATE TABLE entry (
id UUID PRIMARY KEY, name TEXT CHECK (VALUE ~ '^.+ .+$') );

Advanced Features in PgSQL Domain Types CREATE DOMAIN phoneNumber AS
TEXT CHECK (VALUE ~ '^[\d,\*#+ ]+$'); CREATE DOMAIN labelType AS VARCHAR(15) NOT NULL;

Advanced Features in PgSQL Composite Types CREATE TYPE emailType AS
( label labelType, address TEXT ); CREATE TABLE entry ( id UUID PRIMARY KEY, name TEXT NOT NULL, email emailType );

Advanced Features in PgSQL Composite Types (2) ROW('Home', '[email protected]') CAST(ROW('Home',
'[email protected]') AS emailType) ROW('Home', '[email protected]')::emailType

Advanced Features in PgSQL Arrays CREATE TABLE entry ( id
UUID PRIMARY KEY, name TEXT NOT NULL, emails TEXT[] );

Advanced Features in PgSQL Arrays (2) ARRAY['[email protected]', '[email protected]']

Advanced Features in PgSQL Composite Arrays CREATE TABLE entry (
id UUID PRIMARY KEY, name TEXT NOT NULL, emails emailType[] ); ARRAY[ ROW('Home', '[email protected]')::emailType, ROW('Work', '[email protected]')::emailType ]

What About JSON?

Exercise 2

Exercise 2a Use pgSQL features • Redesign your original database
to fi t all data into a single table • Use composite types for embedded objects • Use arrays when storing lists of data • Create domain types and constraints if you’d like • Use sql fi ddle to build schema and insert a contact

Exercise 2b Write some queries • Find a record by
email address • Add an email entry to a record • Remove an email entry from a record • Use UNNEST, ARRAY_APPEND, EXCEPT

Solutions

Exercise 2b Find a contact by email address SELECT contacts.*
FROM contacts, UNNEST(emails) AS email WHERE email.address = '[email protected]';

Exercise 2b Add an email address UPDATE contacts SET emails
= ARRAY_APPEND(emails, ROW('Other', '[email protected]')::emailType) WHERE id = 'fe40ef27-db32-46b6-9755-94220b67e935';

Exercise 2b Remove an email address UPDATE contacts SET emails
= ARRAY( SELECT UNNEST(emails) EXCEPT ( SELECT ROW(label, address)::emailType FROM UNNEST(emails) as email WHERE email.address = '[email protected]' ) ) WHERE id = 'fe40ef27-db32-46b6-9755-94220b67e935';

What about PHP?

What about PHP? Inserting Data $sql = <<<SQL INSERT INTO
contacts (name, email, phone, address) VALUES ( 'Andreas Braun', ARRAY[ROW('Work', '[email protected]')::emailType], ARRAY[ROW('Mobile', '+491234567890')::phoneType], ARRAY[ROW('Home', 'Street 1', '', '12345', 'Home City', 'Germany')::addressType] ) SQL; $connection->query($sql);

What about PHP? Reading Data $query = $connection->query('SELECT * FROM
contacts'); $query->execute(); var_dump($query->fetch(PDO::FETCH_OBJ));

What about PHP? Oops… class stdClass#3 (5) { public $id
=> int(1) public $name => string(13) "Andreas Braun" public $email => string(27) "{"(Work,[email protected])"}" public $phone => string(26) "{"(Mobile,+491234567890)"}" public $address => string(56) "{"(Home,\"Street 1\",\"\",12345,\"Home City\",Germany)"}" }

What about PHP? What works with PDO? • Check constraints
and domain types work • Arrays and composite types are returned as CSV • Row-based results don’t allow for complex structures • JSON can be a workaround (works with Doctrine)

Let’s try MongoDB

pecl install mongodb

composer init —-name=wsc/address-book --require=mongodb/mongodb:^1.16 --require=ext-mongodb:^1.16 -n

mongodb+srv:// wsc:websummercamp@ wsc.jss3jsi.mongodb.net/? retryWrites=true&w=majority

MongoDB Some help to get started <?php require 'vendor/autoload.php'; $url
= '...'; $client = new MongoDB\Client($url); $data = json_decode(file_get_contents(__DIR__ . '/data.json')); $collection = $client->alcaeus->contacts; $collection->insertOne($data); var_dump($collection->findOne());

Exercise 3

Exercise 3 Write some queries in MongoDB • Find a
record by email address • Add an email entry to a record • Remove an email entry from a record • Use $push and $pull

Exercise 3 Find a contact by email address $collection->findOne(['email.address' =>
'[email protected]']);

Exercise 3 Add an email address $collection->updateOne( ['name' => 'Andreas
Braun'], ['$push' => [ 'email' => [ 'label' => 'Other', 'address' => '[email protected]', ], ]], );

Exercise 3 Remove an email address $collection->updateOne( ['name' => 'Andreas
Braun'], ['$pull' => [ 'email' => ['address' => '[email protected]'], ]], );

Working With Larger Datasets

Working With Large Datasets From CSV… tankerkoenig-data/prices/2023/01 [master|✔] ❯❯❯ wc
-l 2023-01-01-prices.csv 324618 2023-01-01-prices.csv

Working With Large Datasets From CSV… date,station_uuid,diesel,e5,e10,dieselchange,e5change,e10change 2023-01-01 00:00:10+01,4a6773b3-2c38-453c-812b-665be4ff6c3b,1.809,1.819,0.000,0,1,0 2023-01-01
00:00:10+01,24b93ce4-c8a6-4deb-836d-c061cb5b8654,1.839,1.729,1.669,1,0,0 2023-01-01 00:00:10+01,bce4b7c7-5f26-45bd-9906-c83388b24631,1.839,1.729,1.669,1,1,1 2023-01-01 00:00:10+01,a15722b3-7a77-49f9-a222-79c494391294,1.949,1.889,1.829,1,1,1 2023-01-01 00:00:10+01,790098a9-d408-4eaa-aef7-803399c062f2,1.809,1.689,1.629,1,1,1 2023-01-01 00:00:10+01,9542a4b5-86c5-4daf-a7a9-b3041622934c,1.819,1.669,1.609,1,1,1 2023-01-01 00:00:10+01,425a9715-1a1a-47ea-bdce-c6e4f0c3fc11,1.944,1.884,1.824,1,1,1 2023-01-01 00:00:10+01,5938437a-ea69-4513-9e78-e4bf52291ef1,1.819,1.669,1.609,1,1,1 2023-01-01 00:00:10+01,2ee78f2f-b4da-4bd9-9fe2-da7fc145b8d3,1.779,1.659,1.599,1,1,1 2023-01-01 00:00:10+01,dfc32f99-aa65-4fd4-b16e-a2cc53f9be15,1.949,1.889,1.829,1,1,1 ...

Working With Large Datasets …to a document { "date": "2023-01-01
00:00:10+01", "station_uuid": "4a6773b3-2c38-453c-812b-665be4ff6c3b", "diesel": "1.809", "e5": "1.819", "e10": "0.000", "dieselchange": "0", "e5change": "1", "e10change": "0" }

Working With Large Datasets …a messy document { "_id": "44e2bdb7-13e3-4156-8576-8326cdd20459",
"name": "bft Tankstelle", "brand": "", "street": "Schellengasse ", "house_number": "53", "post_code": "36304", "city": "Alsfeld", "latitude": "50.7520089", "longitude": "9.2790394", "first_active": "1970-01-01 01:00:00+01" }

Working With Large Datasets Cleaning up the mess • Use
sensible data types (everything is a string) • Structure data to accommodate queries • What do we want to query? • Price statistics per fuel type, post code, and brand • Let’s write some PHP • Or not…

Working With Large Datasets Cleaning up with PHP foreach ($collection->find()
as $document) { $collection->updateOne( ['_id' => $document->_id], [ 'date' => new UTCDateTime(strtotime($document->date) * 1000), 'diesel' => (float) $document->diesel, 'dieselchange' => (bool) $document->dieselchange, 'e10' => (float) $document->e10, 'e10change' => (bool) $document->e10change, 'e5' => (float) $document->e5, 'e5change' => (bool) $document->e5change, ] ); }

Aggregate Is The New Find

Working With Large Datasets Using aggregation pipeline $cursor = $collection->aggregate([
[ '$addFields' => [ 'date' => ['$dateFromString' => ['dateString' => '$date']], 'diesel' => ['$toDouble' => '$diesel'], 'dieselchange' => ['$toBool' => ['$toInt' => '$dieselchange']], 'e10' => ['$toDouble' => '$e10'], 'e10change' => ['$toBool' => ['$toInt' => '$e10change']], 'e5' => ['$toDouble' => '$e5'], 'e5change' => ['$toBool' => ['$toInt' => '$e5change']] ]], ]);

Working With Large Datasets Remember this? SELECT request_mnth, round(avg(result), 2)
AS number FROM (SELECT request_mnth, CAST(abs(division_result_one-division_result_two) AS decimal) AS result FROM (SELECT to_char(CAST(request_date AS date), 'YYYY-MM') AS request_mnth, distance_to_travel/monetary_cost AS division_result_one, sum(distance_to_travel) OVER (PARTITION BY to_char(CAST(request_date AS date), 'YYYY-MM')) / sum(monetary_cost) OVER (PARTITION BY to_char(CAST(request_date AS date), 'YYYY-MM')) AS division_result_two FROM uber_request_logs) a) b GROUP BY request_mnth

Working With Large Datasets Is your data prepared correctly? •
Pretend it’s January 1st, 1 pm • You’re in Kissing, Germany (48.3 N, 10.983333 E) • Find the cheapest station selling E10 in a 5km radius

WSC Database Workshop

WSC Database Workshop

More Decks by alcaeus

Other Decks in Programming

Featured

Transcript