Slide 1

Slide 1 text

YOU SHOULD(N'T) NORMALIZE YOUR DATABASE / Markus Holtermann @m_holtermann

Slide 2

Slide 2 text

ABOUT ME Markus Holtermann M.Sc. student of Computer Science @ TU Berlin EuroPython 2014 local organizer Part of the server team ubuntuusers.de

Slide 3

Slide 3 text

HOW DO WE STORE OUR DATA? Files Relational Database Document Store / NoSQL

Slide 4

Slide 4 text

FIRST APPROACH Name Home planet Gender Padmé Naboo Female Luke Tatooine Male Leia Alderaan, Naboo Female Problem Multiple values in a single cell

Slide 5

Slide 5 text

FIRST NORMAL FORM (1NF)

Slide 6

Slide 6 text

REQUIREMENTS (1NF) Only atomic values Each row is unique i.e. it has a primary key

Slide 7

Slide 7 text

EXAMPLE (1NF) PersonID Name Home planet Gender 1 Padmé Naboo Female 2 Luke Tatooine Male 3 Leia Alderaan Female 3 Leia Naboo Female

Slide 8

Slide 8 text

PROBLEM (1NF) Table suffers from update anomalies PersonID Name Home planet Gender 3 Leia Alderaan Female 3 Leia Naboo Male

Slide 9

Slide 9 text

SECOND NORMAL FORM (2NF)

Slide 10

Slide 10 text

REQUIREMENTS (2NF) 1NF All non-key attributes are fully functional dependent on the primary key

Slide 11

Slide 11 text

EXAMPLE (2NF) PersonID Name Gender 1 Padmé Female 2 Luke Male 3 Leia Female PersonID Name 1 Naboo 2 Tatooine 3 Alderaan 3 Naboo

Slide 12

Slide 12 text

PROBLEMS (2NF) Table suffers from insert anomalies PersonID Name 1 Naboo 2 Tatooine 3 Alderaan 3 Naboo ___ Dagobah

Slide 13

Slide 13 text

PROBLEMS (2NF) Table suffers from deletion anomalies PersonID Name Gender 1 Padmé Female 2 Luke Male 3 Leia Female PersonID Name 1 Naboo 2 Tatooine 3 Alderaan 3 Naboo

Slide 14

Slide 14 text

THIRD NORMAL FORM (3NF)

Slide 15

Slide 15 text

REQUIREMENTS (3NF) 2NF All attributes are dependent on the primary key (and nothing else)

Slide 16

Slide 16 text

EXAMPLE (3NF) PersonID Name Gender 1 Padmé Female 2 Luke Male 3 Leia Female PersonID PlanetID 1 10 2 11 3 10 3 12 PlanetID Name Water 10 Naboo 85% 11 Tatooine 1% 12 Alderaan 78% 13 Dagobah 88%

Slide 17

Slide 17 text

PROBLEMS (3NF) Normally no anomalies Solvable by Boyce–Codd Normal Form (BCNF) If BCNF doesn't solve it: 4NF or 5NF

Slide 18

Slide 18 text

REAL WORLD EXAMPLE

Slide 19

Slide 19 text

YET ANOTHER WIKI

Slide 20

Slide 20 text

SPECIFICATION A Page with a name and unique slug and text Unlimited number of revisions of the page text Latest revision of a page is determined by creation date

Slide 21

Slide 21 text

DATABASE SCHEMA (1) Page + PageID Name Slug Revision + RevisionID PageID Text Version Date

Slide 22

Slide 22 text

DJANGO MODEL f r o m d j a n g o . d b i m p o r t m o d e l s f r o m d j a n g o . u t i l s . f u n c t i o n a l i m p o r t c a c h e d _ p r o p e r t y f r o m d j a n g o . u t i l s . t i m e z o n e i m p o r t n o w c l a s s P a g e ( m o d e l s . M o d e l ) : n a m e = m o d e l s . C h a r F i e l d ( ' N a m e ' , m a x _ l e n g t h = 2 5 5 ) s l u g = m o d e l s . S l u g F i e l d ( ' S l u g ' , m a x _ l e n g t h = 2 5 5 , u n i q u e = T r u e ) @ c a c h e d _ p r o p e r t y d e f l a s t _ r e v ( s e l f ) : r e t u r n s e l f . r e v i s i o n _ s e t . f i r s t ( ) c l a s s R e v i s i o n ( m o d e l s . M o d e l ) : p a g e = m o d e l s . F o r e i g n K e y ( P a g e , o n _ d e l e t e = m o d e l s . P R O T E C T ) t e x t = m o d e l s . T e x t F i e l d ( ' T e x t ' ) v e r s i o n _ d a t e = m o d e l s . D a t e T i m e F i e l d ( ' D a t e ' , d e f a u l t = n o w ) c l a s s M e t a : o r d e r i n g = ( ' - v e r s i o n _ d a t e ' , )

Slide 23

Slide 23 text

DATABASE SCHEMA (2) Page + PageID Name Slug Last Revision Revision + RevisionID PageID Text Version Date

Slide 24

Slide 24 text

DJANGO MODEL f r o m d j a n g o . d b i m p o r t m o d e l s , t r a n s a c t i o n f r o m d j a n g o . u t i l s . t i m e z o n e i m p o r t n o w c l a s s P a g e ( m o d e l s . M o d e l ) : n a m e = m o d e l s . C h a r F i e l d ( ' N a m e ' , m a x _ l e n g t h = 2 5 5 ) s l u g = m o d e l s . S l u g F i e l d ( ' S l u g ' , m a x _ l e n g t h = 2 5 5 , u n i q u e = T r u e ) l a s t _ r e v = m o d e l s . F o r e i g n K e y ( ' R e v i s i o n ' , o n _ d e l e t e = m o d e l s . P R O T E C T , n u l l = T r u e , u n i q u e = T r u e , r e l a t e d _ n a m e = ' + ' ) c l a s s R e v i s i o n ( m o d e l s . M o d e l ) : p a g e = m o d e l s . F o r e i g n K e y ( P a g e , o n _ d e l e t e = m o d e l s . P R O T E C T ) t e x t = m o d e l s . T e x t F i e l d ( ' T e x t ' ) v e r s i o n _ d a t e = m o d e l s . D a t e T i m e F i e l d ( ' D a t e ' , d e f a u l t = n o w ) c l a s s M e t a : o r d e r i n g = ( ' - v e r s i o n _ d a t e ' , )

Slide 25

Slide 25 text

BENCHMARKS

Slide 26

Slide 26 text

BENCHMARK DATA PostgreSQL 9.3.5 6,300 pages approx. 6 million revisions in total 1 to 2,000 revisions per page

Slide 27

Slide 27 text

TASK 1 SHOW A SINGLE PAGE AND ITS CURRENT REVISION

Slide 28

Slide 28 text

NORMALIZED SCHEMA f r o m d j a n g o . s h o r t c u t s i m p o r t g e t _ o b j e c t _ o r _ 4 0 4 , r e n d e r f r o m n o r m a l i z e d . m o d e l s i m p o r t P a g e d e f s h o w ( r e q u e s t , s l u g ) : p a g e = g e t _ o b j e c t _ o r _ 4 0 4 ( P a g e , s l u g = s l u g ) r e v i s i o n = p a g e . l a s t _ r e v c o n t e x t = { ' p a g e ' : p a g e , ' r e v i s i o n ' : r e v i s i o n , } r e t u r n r e n d e r ( r e q u e s t , ' s h o w . h t m l ' , c o n t e x t )

Slide 29

Slide 29 text

DENORMALIZED SCHEMA f r o m d j a n g o . s h o r t c u t s i m p o r t g e t _ o b j e c t _ o r _ 4 0 4 , r e n d e r f r o m d e n o r m a l i z e d . m o d e l s i m p o r t P a g e d e f s h o w ( r e q u e s t , s l u g ) : p a g e = g e t _ o b j e c t _ o r _ 4 0 4 ( P a g e , s l u g = s l u g ) r e v i s i o n = p a g e . l a s t _ r e v c o n t e x t = { ' p a g e ' : p a g e , ' r e v i s i o n ' : r e v i s i o n , } r e t u r n r e n d e r ( r e q u e s t , ' s h o w . h t m l ' , c o n t e x t )

Slide 30

Slide 30 text

RESULTS normalized denormalized Requests / minute 13,000 15,000 Concurrent connections 24 24 | 32 Quotient 541.67 625 | 468.75

Slide 31

Slide 31 text

TASK 2 SHOW THE TITLES AND LAST VERSION DATE OF EVERY PAGE

Slide 32

Slide 32 text

NORMALIZED SCHEMA f r o m d j a n g o . s h o r t c u t s i m p o r t g e t _ o b j e c t _ o r _ 4 0 4 , r e n d e r f r o m d j a n g o . d b i m p o r t m o d e l s f r o m n o r m a l i z e d . m o d e l s i m p o r t P a g e d e f l i s t _ v i e w ( r e q u e s t ) : p a g e s = P a g e . o b j e c t s . a l l ( ) r e t u r n r e n d e r ( r e q u e s t , ' l i s t . h t m l ' , { ' p a g e s ' : p a g e s } ) 6,301 QUERIES! ONE PER PAGE!

Slide 33

Slide 33 text

NORMALIZED SCHEMA (SLIGHTLY OPTIMIZED) f r o m d j a n g o . s h o r t c u t s i m p o r t g e t _ o b j e c t _ o r _ 4 0 4 , r e n d e r f r o m d j a n g o . d b i m p o r t m o d e l s f r o m n o r m a l i z e d . m o d e l s i m p o r t P a g e d e f l i s t _ v i e w ( r e q u e s t ) : p a g e s = P a g e . o b j e c t s . a l l ( ) . a n n o t a t e ( l a s t _ r e v _ v e r s i o n _ d a t e = m o d e l s . M a x ( ' r e v i s i o n _ _ v e r s i o n _ d a t e ' ) ) r e t u r n r e n d e r ( r e q u e s t , ' l i s t . h t m l ' , { ' p a g e s ' : p a g e s } )

Slide 34

Slide 34 text

DENORMALIZED SCHEMA f r o m d j a n g o . s h o r t c u t s i m p o r t g e t _ o b j e c t _ o r _ 4 0 4 , r e n d e r f r o m d e n o r m a l i z e d . m o d e l s i m p o r t P a g e d e f l i s t _ v i e w ( r e q u e s t ) : p a g e s = P a g e . o b j e c t s . s e l e c t _ r e l a t e d ( ' l a s t _ r e v ' ) . a l l ( ) r e t u r n r e n d e r ( r e q u e s t , ' l i s t . h t m l ' , { ' p a g e s ' : p a g e s } )

Slide 35

Slide 35 text

RESULTS normalized denormalized Requests / minute 12 | 24 88 Concurrent connections 4 4 Quotient 3 | 6 22

Slide 36

Slide 36 text

QUESTIONS?