$30 off During Our Annual Pro Sale. View Details »

Thoughts About Normal and Abnormal Data (PyCon UK 2017)

Markus H
October 27, 2017

Thoughts About Normal and Abnormal Data (PyCon UK 2017)

Markus H

October 27, 2017
Tweet

More Decks by Markus H

Other Decks in Technology

Transcript

  1. Thoughts About Normal and
    Abnormal Data
    Markus Holtermann
    @m_holtermann
    markusholtermann.eu

    View Slide

  2. @m_holtermann
    I am
    Markus Holtermann
    ● Senior Software Engineer at LaterPay
    ● Django Core Developer

    View Slide

  3. @m_holtermann
    How do we store our data?

    View Slide

  4. @m_holtermann
    Files
    CC-BY-NC 2.0 by Tim Gee
    https://flic.kr/p/rZm63

    View Slide

  5. @m_holtermann
    Document
    Stores CC-BY-SA 4.0 by Susan Gerbic
    https://commons.wikimedia.org/wiki/File%3AArchive_Room.JPG

    View Slide

  6. @m_holtermann
    Copyright Geek Batman
    https://www.youtube.com/watch?v=gPDx_IwdYMY

    View Slide

  7. @m_holtermann
    Name Home planet Gender
    Padmé Naboo Female
    Luke Tatooine Male
    Leia Alderaan, Naboo Female

    View Slide

  8. @m_holtermann
    First Normal Form
    (1NF)

    View Slide

  9. @m_holtermann
    PersonID Name Home planet Gender
    1 Padmé Naboo Female
    2 Luke Tatooine Male
    3 Leia Alderaan Female
    3 Leia Naboo Female

    View Slide

  10. @m_holtermann
    PersonID Name Home planet Gender
    3 Leia Alderaan Female
    3 Leia Naboo Male
    Update Anomalies

    View Slide

  11. @m_holtermann
    Second Normal Form
    (2NF)

    View Slide

  12. @m_holtermann
    PersonID Name Home planet Gender
    1 Padmé Naboo Female
    2 Luke Tatooine Male
    3 Leia Alderaan Female
    3 Leia Naboo Female

    View Slide

  13. @m_holtermann
    PersonID Planet Name
    1 Naboo
    2 Tatooine
    3 Alderaan
    3 Naboo
    PersonID Name Gender
    1 Padmé Female
    2 Luke Male
    3 Leia Female

    View Slide

  14. @m_holtermann
    PersonID Planet Name
    1 Naboo
    2 Tatooine
    3 Alderaan
    3 Naboo
    ??? Dagobah
    Insert Anomalies

    View Slide

  15. @m_holtermann
    Deletion Anomalies
    PersonID Planet Name
    1 Naboo
    2 Tatooine
    3 Alderaan
    3 Naboo
    PersonID Name Gender
    1 Padmé Female
    2 Luke Male
    3 Leia Female

    View Slide

  16. @m_holtermann
    Third Normal Form
    (3NF)

    View Slide

  17. @m_holtermann
    PlanetID Name Water
    10 Naboo 85%
    11 Tatooine 1%
    12 Alderaan 78%
    13 Dagobah 88%
    PersonID Name Gender
    1 Padmé Female
    2 Luke Male
    3 Leia Female
    PersonID PlanetID
    1 10
    2 11
    3 10
    3 12

    View Slide

  18. @m_holtermann
    Database normalization is
    great!

    View Slide

  19. @m_holtermann
    Always?

    View Slide

  20. @m_holtermann
    Yet Another Wiki

    View Slide

  21. @m_holtermann
    Page
    + PageID
    Name
    Slug
    Revision
    + RevisionID
    PageID
    Text
    Date
    Database Schema

    View Slide

  22. @m_holtermann
    Task 1: Fetch a single page
    and its current revision

    View Slide

  23. @m_holtermann
    Task 2: Fetch all page titles
    and
    the date of their current
    revision

    View Slide

  24. Task 1: Fetch a single page
    SELECT *
    FROM page
    INNER JOIN revision
    ON page.page_id = revision.page_id
    WHERE page.slug = 'some-slug'
    ORDER BY revision.date DESC
    LIMIT 1;

    View Slide

  25. Task 2: Fetch all pages
    SELECT page.name, last_revs.date
    FROM page
    INNER JOIN (
    SELECT revision.page_id, MAX(revision.date) date
    FROM revision
    GROUP BY revision.page_id
    ) last_revs
    ON page.page_id = last_revs.page_id;

    View Slide

  26. @m_holtermann
    Benchmark Environment
    ● Intel i7-6600U, 2.60GHz
    ● 8 GB Memory
    ● PostgreSQL 9.6.5
    ● 10k pages, 6m revisions

    View Slide

  27. @m_holtermann
    Task 1: Fetch a single page
    Concurrent queries 10
    Pages per connection 1000
    Queries per page 10
    Queries total 100000

    View Slide

  28. @m_holtermann
    Task 2: Fetch all pages
    Concurrent queries 1
    Queries per connection 10
    Queries total 10

    View Slide

  29. @m_holtermann
    Task 1: Fetch a single page

    View Slide

  30. @m_holtermann
    Task 2: Fetch all pages

    View Slide

  31. @m_holtermann
    Rae Knowler
    https://speakerdeck.com/bellisk/unsafe-at-any-speed-pycon-uk-26th-october-2017

    View Slide

  32. @m_holtermann
    Database Schema
    Page
    + PageID
    Name
    Slug
    LastRevision
    Revision
    + RevisionID
    PageID
    Text
    Date

    View Slide

  33. Task 1: Fetch a single page
    SELECT *
    FROM page
    INNER JOIN revision
    ON page.last_revision_id = revision.revision_id
    WHERE page.slug = 'some-slug';

    View Slide

  34. Task 2: Fetch all pages
    SELECT page.name, revision.date
    FROM page
    INNER JOIN revision
    ON page.last_revision_id = revision.revision_id;

    View Slide

  35. @m_holtermann
    Task 1: Fetch a single page

    View Slide

  36. @m_holtermann
    Task 2: Fetch all pages

    View Slide

  37. @m_holtermann
    Conclusion

    View Slide

  38. Thanks
    Markus Holtermann
    @m_holtermann
    markusholtermann.eu

    View Slide