Upgrade to Pro — share decks privately, control downloads, hide ads and more …

WSC Database Workshop

alcaeus
October 09, 2023

WSC Database Workshop

Workshop held at Web Summer Camp Croatia

alcaeus

October 09, 2023
Tweet

More Decks by alcaeus

Other Decks in Programming

Transcript

  1. Getting To Know You Have you… • …worked with SQL

    databases? • …worked with PostgreSQL? • …worked with MongoDB?
  2. Getting To Know You Do you… • …use Laravel? •

    …use Symfony? • …no framework? • …use Doctrine?
  3. This Workshop • It uses some PHP • The main

    focus is databases • Database normalisation • Advanced PostgreSQL types and usage in PHP • Using MongoDB with PHP
  4. The Setup Address Book Data { "name": "Andreas Braun", "email":

    [ { "label": "Private", "address": "[email protected]" }, { "label": "Work", "address": "[email protected]" } ], "phone": [ { "label": "Mobile", "number": "+491234567890" } ], "address": [ { "label": "Home", "line1": "...", "postCode": "...", "city": "...", "country": "..." } ] }
  5. Exercise 1 Design address book schema • Name (full name,

    optionally fi rst/last name) • Phone numbers • Email addresses • Addresses • Add labels (e.g. work, private) to data entries
  6. Database Normalisation Why normalise databases? • Reduce data redundancy •

    Improve data integrity • To use query languages like SQL • Codd, "A Relational Model of Data for Large Shared Data Banks", p. 381 • Avoid data anomalies • Minimise redesign when extending structure
  7. Database Normalisation The World in 1970 • The Unix epoch

    started on Jan 1st • Intel released a DRAM chip with 1024 bit capacity • Apple and Microsoft did not exist, Linus Torvalds was just born (Dec 1969) • IBM released the System/370, a 32-bit mainframe
  8. Database Normalisation Reasons to not do it • Duplicate data

    to avoid joins to related tables • Group related, non-scalar data (e.g. embedded objects) • Store pre-aggregated data to avoid heavy loads
  9. Advanced Features in PgSQL Check Constraints CREATE TABLE entry (

    id UUID PRIMARY KEY, name TEXT CHECK (VALUE ~ '^.+ .+$') );
  10. Advanced Features in PgSQL Domain Types CREATE DOMAIN phoneNumber AS

    TEXT CHECK (VALUE ~ '^[\d,\*#+ ]+$'); CREATE DOMAIN labelType AS VARCHAR(15) NOT NULL;
  11. Advanced Features in PgSQL Composite Types CREATE TYPE emailType AS

    ( label labelType, address TEXT ); CREATE TABLE entry ( id UUID PRIMARY KEY, name TEXT NOT NULL, email emailType );
  12. Advanced Features in PgSQL Arrays CREATE TABLE entry ( id

    UUID PRIMARY KEY, name TEXT NOT NULL, emails TEXT[] );
  13. Advanced Features in PgSQL Composite Arrays CREATE TABLE entry (

    id UUID PRIMARY KEY, name TEXT NOT NULL, emails emailType[] ); ARRAY[ ROW('Home', '[email protected]')::emailType, ROW('Work', '[email protected]')::emailType ]
  14. Exercise 2a Use pgSQL features • Redesign your original database

    to fi t all data into a single table • Use composite types for embedded objects • Use arrays when storing lists of data • Create domain types and constraints if you’d like • Use sql fi ddle to build schema and insert a contact
  15. Exercise 2b Write some queries • Find a record by

    email address • Add an email entry to a record • Remove an email entry from a record • Use UNNEST, ARRAY_APPEND, EXCEPT
  16. Exercise 2b Find a contact by email address SELECT contacts.*

    FROM contacts, UNNEST(emails) AS email WHERE email.address = '[email protected]';
  17. Exercise 2b Add an email address UPDATE contacts SET emails

    = ARRAY_APPEND(emails, ROW('Other', '[email protected]')::emailType) WHERE id = 'fe40ef27-db32-46b6-9755-94220b67e935';
  18. Exercise 2b Remove an email address UPDATE contacts SET emails

    = ARRAY( SELECT UNNEST(emails) EXCEPT ( SELECT ROW(label, address)::emailType FROM UNNEST(emails) as email WHERE email.address = '[email protected]' ) ) WHERE id = 'fe40ef27-db32-46b6-9755-94220b67e935';
  19. What about PHP? Inserting Data $sql = <<<SQL INSERT INTO

    contacts (name, email, phone, address) VALUES ( 'Andreas Braun', ARRAY[ROW('Work', '[email protected]')::emailType], ARRAY[ROW('Mobile', '+491234567890')::phoneType], ARRAY[ROW('Home', 'Street 1', '', '12345', 'Home City', 'Germany')::addressType] ) SQL; $connection->query($sql);
  20. What about PHP? Reading Data $query = $connection->query('SELECT * FROM

    contacts'); $query->execute(); var_dump($query->fetch(PDO::FETCH_OBJ));
  21. What about PHP? Oops… class stdClass#3 (5) { public $id

    => int(1) public $name => string(13) "Andreas Braun" public $email => string(27) "{"(Work,[email protected])"}" public $phone => string(26) "{"(Mobile,+491234567890)"}" public $address => string(56) "{"(Home,\"Street 1\",\"\",12345,\"Home City\",Germany)"}" }
  22. What about PHP? What works with PDO? • Check constraints

    and domain types work • Arrays and composite types are returned as CSV • Row-based results don’t allow for complex structures • JSON can be a workaround (works with Doctrine)
  23. MongoDB Some help to get started <?php require 'vendor/autoload.php'; $url

    = '...'; $client = new MongoDB\Client($url); $data = json_decode(file_get_contents(__DIR__ . '/data.json')); $collection = $client->alcaeus->contacts; $collection->insertOne($data); var_dump($collection->findOne());
  24. Exercise 3 Write some queries in MongoDB • Find a

    record by email address • Add an email entry to a record • Remove an email entry from a record • Use $push and $pull
  25. Exercise 3 Add an email address $collection->updateOne( ['name' => 'Andreas

    Braun'], ['$push' => [ 'email' => [ 'label' => 'Other', 'address' => '[email protected]', ], ]], );
  26. Working With Large Datasets From CSV… date,station_uuid,diesel,e5,e10,dieselchange,e5change,e10change 2023-01-01 00:00:10+01,4a6773b3-2c38-453c-812b-665be4ff6c3b,1.809,1.819,0.000,0,1,0 2023-01-01

    00:00:10+01,24b93ce4-c8a6-4deb-836d-c061cb5b8654,1.839,1.729,1.669,1,0,0 2023-01-01 00:00:10+01,bce4b7c7-5f26-45bd-9906-c83388b24631,1.839,1.729,1.669,1,1,1 2023-01-01 00:00:10+01,a15722b3-7a77-49f9-a222-79c494391294,1.949,1.889,1.829,1,1,1 2023-01-01 00:00:10+01,790098a9-d408-4eaa-aef7-803399c062f2,1.809,1.689,1.629,1,1,1 2023-01-01 00:00:10+01,9542a4b5-86c5-4daf-a7a9-b3041622934c,1.819,1.669,1.609,1,1,1 2023-01-01 00:00:10+01,425a9715-1a1a-47ea-bdce-c6e4f0c3fc11,1.944,1.884,1.824,1,1,1 2023-01-01 00:00:10+01,5938437a-ea69-4513-9e78-e4bf52291ef1,1.819,1.669,1.609,1,1,1 2023-01-01 00:00:10+01,2ee78f2f-b4da-4bd9-9fe2-da7fc145b8d3,1.779,1.659,1.599,1,1,1 2023-01-01 00:00:10+01,dfc32f99-aa65-4fd4-b16e-a2cc53f9be15,1.949,1.889,1.829,1,1,1 ...
  27. Working With Large Datasets …to a document { "date": "2023-01-01

    00:00:10+01", "station_uuid": "4a6773b3-2c38-453c-812b-665be4ff6c3b", "diesel": "1.809", "e5": "1.819", "e10": "0.000", "dieselchange": "0", "e5change": "1", "e10change": "0" }
  28. Working With Large Datasets …a messy document { "_id": "44e2bdb7-13e3-4156-8576-8326cdd20459",

    "name": "bft Tankstelle", "brand": "", "street": "Schellengasse ", "house_number": "53", "post_code": "36304", "city": "Alsfeld", "latitude": "50.7520089", "longitude": "9.2790394", "first_active": "1970-01-01 01:00:00+01" }
  29. Working With Large Datasets Cleaning up the mess • Use

    sensible data types (everything is a string) • Structure data to accommodate queries • What do we want to query? • Price statistics per fuel type, post code, and brand • Let’s write some PHP • Or not…
  30. Working With Large Datasets Cleaning up with PHP foreach ($collection->find()

    as $document) { $collection->updateOne( ['_id' => $document->_id], [ 'date' => new UTCDateTime(strtotime($document->date) * 1000), 'diesel' => (float) $document->diesel, 'dieselchange' => (bool) $document->dieselchange, 'e10' => (float) $document->e10, 'e10change' => (bool) $document->e10change, 'e5' => (float) $document->e5, 'e5change' => (bool) $document->e5change, ] ); }
  31. Working With Large Datasets Using aggregation pipeline $cursor = $collection->aggregate([

    [ '$addFields' => [ 'date' => ['$dateFromString' => ['dateString' => '$date']], 'diesel' => ['$toDouble' => '$diesel'], 'dieselchange' => ['$toBool' => ['$toInt' => '$dieselchange']], 'e10' => ['$toDouble' => '$e10'], 'e10change' => ['$toBool' => ['$toInt' => '$e10change']], 'e5' => ['$toDouble' => '$e5'], 'e5change' => ['$toBool' => ['$toInt' => '$e5change']] ]], ]);
  32. Working With Large Datasets Remember this? SELECT request_mnth, round(avg(result), 2)

    AS number FROM (SELECT request_mnth, CAST(abs(division_result_one-division_result_two) AS decimal) AS result FROM (SELECT to_char(CAST(request_date AS date), 'YYYY-MM') AS request_mnth, distance_to_travel/monetary_cost AS division_result_one, sum(distance_to_travel) OVER (PARTITION BY to_char(CAST(request_date AS date), 'YYYY-MM')) / sum(monetary_cost) OVER (PARTITION BY to_char(CAST(request_date AS date), 'YYYY-MM')) AS division_result_two FROM uber_request_logs) a) b GROUP BY request_mnth
  33. Working With Large Datasets Is your data prepared correctly? •

    Pretend it’s January 1st, 1 pm • You’re in Kissing, Germany (48.3 N, 10.983333 E) • Find the cheapest station selling E10 in a 5km radius