Slide 1

Slide 1 text

Thoughts About Normal and Abnormal Data Markus Holtermann @m_holtermann markusholtermann.eu

Slide 2

Slide 2 text

@m_holtermann I am Markus Holtermann ● Senior Software Engineer at LaterPay ● Django Core Developer

Slide 3

Slide 3 text

@m_holtermann How do we store our data?

Slide 4

Slide 4 text

@m_holtermann Files CC-BY-NC 2.0 by Tim Gee https://flic.kr/p/rZm63

Slide 5

Slide 5 text

@m_holtermann Document Stores CC-BY-SA 4.0 by Susan Gerbic https://commons.wikimedia.org/wiki/File%3AArchive_Room.JPG

Slide 6

Slide 6 text

@m_holtermann Copyright Geek Batman https://www.youtube.com/watch?v=gPDx_IwdYMY

Slide 7

Slide 7 text

@m_holtermann Name Home planet Gender Padmé Naboo Female Luke Tatooine Male Leia Alderaan, Naboo Female

Slide 8

Slide 8 text

@m_holtermann First Normal Form (1NF)

Slide 9

Slide 9 text

@m_holtermann PersonID Name Home planet Gender 1 Padmé Naboo Female 2 Luke Tatooine Male 3 Leia Alderaan Female 3 Leia Naboo Female

Slide 10

Slide 10 text

@m_holtermann PersonID Name Home planet Gender 3 Leia Alderaan Female 3 Leia Naboo Male Update Anomalies

Slide 11

Slide 11 text

@m_holtermann Second Normal Form (2NF)

Slide 12

Slide 12 text

@m_holtermann PersonID Name Home planet Gender 1 Padmé Naboo Female 2 Luke Tatooine Male 3 Leia Alderaan Female 3 Leia Naboo Female

Slide 13

Slide 13 text

@m_holtermann PersonID Planet Name 1 Naboo 2 Tatooine 3 Alderaan 3 Naboo PersonID Name Gender 1 Padmé Female 2 Luke Male 3 Leia Female

Slide 14

Slide 14 text

@m_holtermann PersonID Planet Name 1 Naboo 2 Tatooine 3 Alderaan 3 Naboo ??? Dagobah Insert Anomalies

Slide 15

Slide 15 text

@m_holtermann Deletion Anomalies PersonID Planet Name 1 Naboo 2 Tatooine 3 Alderaan 3 Naboo PersonID Name Gender 1 Padmé Female 2 Luke Male 3 Leia Female

Slide 16

Slide 16 text

@m_holtermann Third Normal Form (3NF)

Slide 17

Slide 17 text

@m_holtermann PlanetID Name Water 10 Naboo 85% 11 Tatooine 1% 12 Alderaan 78% 13 Dagobah 88% PersonID Name Gender 1 Padmé Female 2 Luke Male 3 Leia Female PersonID PlanetID 1 10 2 11 3 10 3 12

Slide 18

Slide 18 text

@m_holtermann Database normalization is great!

Slide 19

Slide 19 text

@m_holtermann Always?

Slide 20

Slide 20 text

@m_holtermann Yet Another Wiki

Slide 21

Slide 21 text

@m_holtermann Page + PageID Name Slug Revision + RevisionID PageID Text Date Database Schema

Slide 22

Slide 22 text

@m_holtermann Task 1: Fetch a single page and its current revision

Slide 23

Slide 23 text

@m_holtermann Task 2: Fetch all page titles and the date of their current revision

Slide 24

Slide 24 text

Task 1: Fetch a single page SELECT * FROM page INNER JOIN revision ON page.page_id = revision.page_id WHERE page.slug = 'some-slug' ORDER BY revision.date DESC LIMIT 1;

Slide 25

Slide 25 text

Task 2: Fetch all pages SELECT page.name, last_revs.date FROM page INNER JOIN ( SELECT revision.page_id, MAX(revision.date) date FROM revision GROUP BY revision.page_id ) last_revs ON page.page_id = last_revs.page_id;

Slide 26

Slide 26 text

@m_holtermann Benchmark Environment ● Intel i7-6600U, 2.60GHz ● 8 GB Memory ● PostgreSQL 9.6.5 ● 10k pages, 6m revisions

Slide 27

Slide 27 text

@m_holtermann Task 1: Fetch a single page Concurrent queries 10 Pages per connection 1000 Queries per page 10 Queries total 100000

Slide 28

Slide 28 text

@m_holtermann Task 2: Fetch all pages Concurrent queries 1 Queries per connection 10 Queries total 10

Slide 29

Slide 29 text

@m_holtermann Task 1: Fetch a single page

Slide 30

Slide 30 text

@m_holtermann Task 2: Fetch all pages

Slide 31

Slide 31 text

@m_holtermann Rae Knowler https://speakerdeck.com/bellisk/unsafe-at-any-speed-pycon-uk-26th-october-2017

Slide 32

Slide 32 text

@m_holtermann Database Schema Page + PageID Name Slug LastRevision Revision + RevisionID PageID Text Date

Slide 33

Slide 33 text

Task 1: Fetch a single page SELECT * FROM page INNER JOIN revision ON page.last_revision_id = revision.revision_id WHERE page.slug = 'some-slug';

Slide 34

Slide 34 text

Task 2: Fetch all pages SELECT page.name, revision.date FROM page INNER JOIN revision ON page.last_revision_id = revision.revision_id;

Slide 35

Slide 35 text

@m_holtermann Task 1: Fetch a single page

Slide 36

Slide 36 text

@m_holtermann Task 2: Fetch all pages

Slide 37

Slide 37 text

@m_holtermann Conclusion

Slide 38

Slide 38 text

Thanks Markus Holtermann @m_holtermann markusholtermann.eu