Upgrade to Pro — share decks privately, control downloads, hide ads and more …

You Should(n't) Normalize Your Database

Markus H
October 18, 2014

You Should(n't) Normalize Your Database

Markus H

October 18, 2014
Tweet

More Decks by Markus H

Other Decks in Technology

Transcript

  1. YOU SHOULD(N'T) NORMALIZE YOUR DATABASE / Markus Holtermann @m_holtermann

  2. ABOUT ME Markus Holtermann M.Sc. student of Computer Science @

    TU Berlin EuroPython 2014 local organizer Part of the server team ubuntuusers.de
  3. HOW DO WE STORE OUR DATA? Files Relational Database Document

    Store / NoSQL
  4. FIRST APPROACH Name Home planet Gender Padmé Naboo Female Luke

    Tatooine Male Leia Alderaan, Naboo Female Problem Multiple values in a single cell
  5. FIRST NORMAL FORM (1NF)

  6. REQUIREMENTS (1NF) Only atomic values Each row is unique i.e.

    it has a primary key
  7. EXAMPLE (1NF) PersonID Name Home planet Gender 1 Padmé Naboo

    Female 2 Luke Tatooine Male 3 Leia Alderaan Female 3 Leia Naboo Female
  8. PROBLEM (1NF) Table suffers from update anomalies PersonID Name Home

    planet Gender 3 Leia Alderaan Female 3 Leia Naboo Male
  9. SECOND NORMAL FORM (2NF)

  10. REQUIREMENTS (2NF) 1NF All non-key attributes are fully functional dependent

    on the primary key
  11. EXAMPLE (2NF) PersonID Name Gender 1 Padmé Female 2 Luke

    Male 3 Leia Female PersonID Name 1 Naboo 2 Tatooine 3 Alderaan 3 Naboo
  12. PROBLEMS (2NF) Table suffers from insert anomalies PersonID Name 1

    Naboo 2 Tatooine 3 Alderaan 3 Naboo ___ Dagobah
  13. PROBLEMS (2NF) Table suffers from deletion anomalies PersonID Name Gender

    1 Padmé Female 2 Luke Male 3 Leia Female PersonID Name 1 Naboo 2 Tatooine 3 Alderaan 3 Naboo
  14. THIRD NORMAL FORM (3NF)

  15. REQUIREMENTS (3NF) 2NF All attributes are dependent on the primary

    key (and nothing else)
  16. EXAMPLE (3NF) PersonID Name Gender 1 Padmé Female 2 Luke

    Male 3 Leia Female PersonID PlanetID 1 10 2 11 3 10 3 12 PlanetID Name Water 10 Naboo 85% 11 Tatooine 1% 12 Alderaan 78% 13 Dagobah 88%
  17. PROBLEMS (3NF) Normally no anomalies Solvable by Boyce–Codd Normal Form

    (BCNF) If BCNF doesn't solve it: 4NF or 5NF
  18. REAL WORLD EXAMPLE

  19. YET ANOTHER WIKI

  20. SPECIFICATION A Page with a name and unique slug and

    text Unlimited number of revisions of the page text Latest revision of a page is determined by creation date
  21. DATABASE SCHEMA (1) Page + PageID Name Slug Revision +

    RevisionID PageID Text Version Date
  22. DJANGO MODEL f r o m d j a n

    g o . d b i m p o r t m o d e l s f r o m d j a n g o . u t i l s . f u n c t i o n a l i m p o r t c a c h e d _ p r o p e r t y f r o m d j a n g o . u t i l s . t i m e z o n e i m p o r t n o w c l a s s P a g e ( m o d e l s . M o d e l ) : n a m e = m o d e l s . C h a r F i e l d ( ' N a m e ' , m a x _ l e n g t h = 2 5 5 ) s l u g = m o d e l s . S l u g F i e l d ( ' S l u g ' , m a x _ l e n g t h = 2 5 5 , u n i q u e = T r u e ) @ c a c h e d _ p r o p e r t y d e f l a s t _ r e v ( s e l f ) : r e t u r n s e l f . r e v i s i o n _ s e t . f i r s t ( ) c l a s s R e v i s i o n ( m o d e l s . M o d e l ) : p a g e = m o d e l s . F o r e i g n K e y ( P a g e , o n _ d e l e t e = m o d e l s . P R O T E C T ) t e x t = m o d e l s . T e x t F i e l d ( ' T e x t ' ) v e r s i o n _ d a t e = m o d e l s . D a t e T i m e F i e l d ( ' D a t e ' , d e f a u l t = n o w ) c l a s s M e t a : o r d e r i n g = ( ' - v e r s i o n _ d a t e ' , )
  23. DATABASE SCHEMA (2) Page + PageID Name Slug Last Revision

    Revision + RevisionID PageID Text Version Date
  24. DJANGO MODEL f r o m d j a n

    g o . d b i m p o r t m o d e l s , t r a n s a c t i o n f r o m d j a n g o . u t i l s . t i m e z o n e i m p o r t n o w c l a s s P a g e ( m o d e l s . M o d e l ) : n a m e = m o d e l s . C h a r F i e l d ( ' N a m e ' , m a x _ l e n g t h = 2 5 5 ) s l u g = m o d e l s . S l u g F i e l d ( ' S l u g ' , m a x _ l e n g t h = 2 5 5 , u n i q u e = T r u e ) l a s t _ r e v = m o d e l s . F o r e i g n K e y ( ' R e v i s i o n ' , o n _ d e l e t e = m o d e l s . P R O T E C T , n u l l = T r u e , u n i q u e = T r u e , r e l a t e d _ n a m e = ' + ' ) c l a s s R e v i s i o n ( m o d e l s . M o d e l ) : p a g e = m o d e l s . F o r e i g n K e y ( P a g e , o n _ d e l e t e = m o d e l s . P R O T E C T ) t e x t = m o d e l s . T e x t F i e l d ( ' T e x t ' ) v e r s i o n _ d a t e = m o d e l s . D a t e T i m e F i e l d ( ' D a t e ' , d e f a u l t = n o w ) c l a s s M e t a : o r d e r i n g = ( ' - v e r s i o n _ d a t e ' , )
  25. BENCHMARKS

  26. BENCHMARK DATA PostgreSQL 9.3.5 6,300 pages approx. 6 million revisions

    in total 1 to 2,000 revisions per page
  27. TASK 1 SHOW A SINGLE PAGE AND ITS CURRENT REVISION

  28. NORMALIZED SCHEMA f r o m d j a n

    g o . s h o r t c u t s i m p o r t g e t _ o b j e c t _ o r _ 4 0 4 , r e n d e r f r o m n o r m a l i z e d . m o d e l s i m p o r t P a g e d e f s h o w ( r e q u e s t , s l u g ) : p a g e = g e t _ o b j e c t _ o r _ 4 0 4 ( P a g e , s l u g = s l u g ) r e v i s i o n = p a g e . l a s t _ r e v c o n t e x t = { ' p a g e ' : p a g e , ' r e v i s i o n ' : r e v i s i o n , } r e t u r n r e n d e r ( r e q u e s t , ' s h o w . h t m l ' , c o n t e x t )
  29. DENORMALIZED SCHEMA f r o m d j a n

    g o . s h o r t c u t s i m p o r t g e t _ o b j e c t _ o r _ 4 0 4 , r e n d e r f r o m d e n o r m a l i z e d . m o d e l s i m p o r t P a g e d e f s h o w ( r e q u e s t , s l u g ) : p a g e = g e t _ o b j e c t _ o r _ 4 0 4 ( P a g e , s l u g = s l u g ) r e v i s i o n = p a g e . l a s t _ r e v c o n t e x t = { ' p a g e ' : p a g e , ' r e v i s i o n ' : r e v i s i o n , } r e t u r n r e n d e r ( r e q u e s t , ' s h o w . h t m l ' , c o n t e x t )
  30. RESULTS normalized denormalized Requests / minute 13,000 15,000 Concurrent connections

    24 24 | 32 Quotient 541.67 625 | 468.75
  31. TASK 2 SHOW THE TITLES AND LAST VERSION DATE OF

    EVERY PAGE
  32. NORMALIZED SCHEMA f r o m d j a n

    g o . s h o r t c u t s i m p o r t g e t _ o b j e c t _ o r _ 4 0 4 , r e n d e r f r o m d j a n g o . d b i m p o r t m o d e l s f r o m n o r m a l i z e d . m o d e l s i m p o r t P a g e d e f l i s t _ v i e w ( r e q u e s t ) : p a g e s = P a g e . o b j e c t s . a l l ( ) r e t u r n r e n d e r ( r e q u e s t , ' l i s t . h t m l ' , { ' p a g e s ' : p a g e s } ) 6,301 QUERIES! ONE PER PAGE!
  33. NORMALIZED SCHEMA (SLIGHTLY OPTIMIZED) f r o m d j

    a n g o . s h o r t c u t s i m p o r t g e t _ o b j e c t _ o r _ 4 0 4 , r e n d e r f r o m d j a n g o . d b i m p o r t m o d e l s f r o m n o r m a l i z e d . m o d e l s i m p o r t P a g e d e f l i s t _ v i e w ( r e q u e s t ) : p a g e s = P a g e . o b j e c t s . a l l ( ) . a n n o t a t e ( l a s t _ r e v _ v e r s i o n _ d a t e = m o d e l s . M a x ( ' r e v i s i o n _ _ v e r s i o n _ d a t e ' ) ) r e t u r n r e n d e r ( r e q u e s t , ' l i s t . h t m l ' , { ' p a g e s ' : p a g e s } )
  34. DENORMALIZED SCHEMA f r o m d j a n

    g o . s h o r t c u t s i m p o r t g e t _ o b j e c t _ o r _ 4 0 4 , r e n d e r f r o m d e n o r m a l i z e d . m o d e l s i m p o r t P a g e d e f l i s t _ v i e w ( r e q u e s t ) : p a g e s = P a g e . o b j e c t s . s e l e c t _ r e l a t e d ( ' l a s t _ r e v ' ) . a l l ( ) r e t u r n r e n d e r ( r e q u e s t , ' l i s t . h t m l ' , { ' p a g e s ' : p a g e s } )
  35. RESULTS normalized denormalized Requests / minute 12 | 24 88

    Concurrent connections 4 4 Quotient 3 | 6 22
  36. QUESTIONS?