$30 off During Our Annual Pro Sale. View Details »

You Should(n't) Normalize Your Database

Markus H
October 18, 2014

You Should(n't) Normalize Your Database

Markus H

October 18, 2014
Tweet

More Decks by Markus H

Other Decks in Technology

Transcript

  1. YOU SHOULD(N'T)
    NORMALIZE YOUR
    DATABASE
    /
    Markus Holtermann @m_holtermann

    View Slide

  2. ABOUT ME
    Markus Holtermann
    M.Sc. student of Computer Science @ TU Berlin
    EuroPython 2014 local organizer
    Part of the server team
    ubuntuusers.de

    View Slide

  3. HOW DO WE STORE OUR DATA?
    Files
    Relational Database
    Document Store / NoSQL

    View Slide

  4. FIRST APPROACH
    Name Home planet Gender
    Padmé Naboo Female
    Luke Tatooine Male
    Leia Alderaan, Naboo Female
    Problem
    Multiple values in a single cell

    View Slide

  5. FIRST NORMAL FORM
    (1NF)

    View Slide

  6. REQUIREMENTS (1NF)
    Only atomic values
    Each row is unique i.e. it has a primary key

    View Slide

  7. EXAMPLE (1NF)
    PersonID Name Home planet Gender
    1 Padmé Naboo Female
    2 Luke Tatooine Male
    3 Leia Alderaan Female
    3 Leia Naboo Female

    View Slide

  8. PROBLEM (1NF)
    Table suffers from update anomalies
    PersonID Name Home planet Gender
    3 Leia Alderaan Female
    3 Leia Naboo Male

    View Slide

  9. SECOND NORMAL FORM
    (2NF)

    View Slide

  10. REQUIREMENTS (2NF)
    1NF
    All non-key attributes are fully functional dependent on the
    primary key

    View Slide

  11. EXAMPLE (2NF)
    PersonID Name Gender
    1 Padmé Female
    2 Luke Male
    3 Leia Female
    PersonID Name
    1 Naboo
    2 Tatooine
    3 Alderaan
    3 Naboo

    View Slide

  12. PROBLEMS (2NF)
    Table suffers from insert anomalies
    PersonID Name
    1 Naboo
    2 Tatooine
    3 Alderaan
    3 Naboo
    ___ Dagobah

    View Slide

  13. PROBLEMS (2NF)
    Table suffers from deletion anomalies
    PersonID Name Gender
    1 Padmé Female
    2 Luke Male
    3 Leia Female
    PersonID Name
    1 Naboo
    2 Tatooine
    3 Alderaan
    3 Naboo

    View Slide

  14. THIRD NORMAL FORM
    (3NF)

    View Slide

  15. REQUIREMENTS (3NF)
    2NF
    All attributes are dependent on the primary key (and nothing
    else)

    View Slide

  16. EXAMPLE (3NF)
    PersonID Name Gender
    1 Padmé Female
    2 Luke Male
    3 Leia Female
    PersonID PlanetID
    1 10
    2 11
    3 10
    3 12
    PlanetID Name Water
    10 Naboo 85%
    11 Tatooine 1%
    12 Alderaan 78%
    13 Dagobah 88%

    View Slide

  17. PROBLEMS (3NF)
    Normally no anomalies
    Solvable by Boyce–Codd Normal Form (BCNF)
    If BCNF doesn't solve it: 4NF or 5NF

    View Slide

  18. REAL WORLD EXAMPLE

    View Slide

  19. YET ANOTHER WIKI

    View Slide

  20. SPECIFICATION
    A Page with a name and unique slug and text
    Unlimited number of revisions of the page text
    Latest revision of a page is determined by creation date

    View Slide

  21. DATABASE SCHEMA (1)
    Page
    + PageID
    Name
    Slug
    Revision
    + RevisionID
    PageID
    Text
    Version Date

    View Slide

  22. DJANGO MODEL
    f
    r
    o
    m d
    j
    a
    n
    g
    o
    .
    d
    b i
    m
    p
    o
    r
    t m
    o
    d
    e
    l
    s
    f
    r
    o
    m d
    j
    a
    n
    g
    o
    .
    u
    t
    i
    l
    s
    .
    f
    u
    n
    c
    t
    i
    o
    n
    a
    l i
    m
    p
    o
    r
    t c
    a
    c
    h
    e
    d
    _
    p
    r
    o
    p
    e
    r
    t
    y
    f
    r
    o
    m d
    j
    a
    n
    g
    o
    .
    u
    t
    i
    l
    s
    .
    t
    i
    m
    e
    z
    o
    n
    e i
    m
    p
    o
    r
    t n
    o
    w
    c
    l
    a
    s
    s P
    a
    g
    e
    (
    m
    o
    d
    e
    l
    s
    .
    M
    o
    d
    e
    l
    )
    :
    n
    a
    m
    e = m
    o
    d
    e
    l
    s
    .
    C
    h
    a
    r
    F
    i
    e
    l
    d
    (
    '
    N
    a
    m
    e
    '
    , m
    a
    x
    _
    l
    e
    n
    g
    t
    h
    =
    2
    5
    5
    )
    s
    l
    u
    g = m
    o
    d
    e
    l
    s
    .
    S
    l
    u
    g
    F
    i
    e
    l
    d
    (
    '
    S
    l
    u
    g
    '
    , m
    a
    x
    _
    l
    e
    n
    g
    t
    h
    =
    2
    5
    5
    , u
    n
    i
    q
    u
    e
    =
    T
    r
    u
    e
    )
    @
    c
    a
    c
    h
    e
    d
    _
    p
    r
    o
    p
    e
    r
    t
    y
    d
    e
    f l
    a
    s
    t
    _
    r
    e
    v
    (
    s
    e
    l
    f
    )
    :
    r
    e
    t
    u
    r
    n s
    e
    l
    f
    .
    r
    e
    v
    i
    s
    i
    o
    n
    _
    s
    e
    t
    .
    f
    i
    r
    s
    t
    (
    )
    c
    l
    a
    s
    s R
    e
    v
    i
    s
    i
    o
    n
    (
    m
    o
    d
    e
    l
    s
    .
    M
    o
    d
    e
    l
    )
    :
    p
    a
    g
    e = m
    o
    d
    e
    l
    s
    .
    F
    o
    r
    e
    i
    g
    n
    K
    e
    y
    (
    P
    a
    g
    e
    , o
    n
    _
    d
    e
    l
    e
    t
    e
    =
    m
    o
    d
    e
    l
    s
    .
    P
    R
    O
    T
    E
    C
    T
    )
    t
    e
    x
    t = m
    o
    d
    e
    l
    s
    .
    T
    e
    x
    t
    F
    i
    e
    l
    d
    (
    '
    T
    e
    x
    t
    '
    )
    v
    e
    r
    s
    i
    o
    n
    _
    d
    a
    t
    e = m
    o
    d
    e
    l
    s
    .
    D
    a
    t
    e
    T
    i
    m
    e
    F
    i
    e
    l
    d
    (
    '
    D
    a
    t
    e
    '
    , d
    e
    f
    a
    u
    l
    t
    =
    n
    o
    w
    )
    c
    l
    a
    s
    s M
    e
    t
    a
    :
    o
    r
    d
    e
    r
    i
    n
    g = (
    '
    -
    v
    e
    r
    s
    i
    o
    n
    _
    d
    a
    t
    e
    '
    , )

    View Slide

  23. DATABASE SCHEMA (2)
    Page
    + PageID
    Name
    Slug
    Last Revision
    Revision
    + RevisionID
    PageID
    Text
    Version Date

    View Slide

  24. DJANGO MODEL
    f
    r
    o
    m d
    j
    a
    n
    g
    o
    .
    d
    b i
    m
    p
    o
    r
    t m
    o
    d
    e
    l
    s
    , t
    r
    a
    n
    s
    a
    c
    t
    i
    o
    n
    f
    r
    o
    m d
    j
    a
    n
    g
    o
    .
    u
    t
    i
    l
    s
    .
    t
    i
    m
    e
    z
    o
    n
    e i
    m
    p
    o
    r
    t n
    o
    w
    c
    l
    a
    s
    s P
    a
    g
    e
    (
    m
    o
    d
    e
    l
    s
    .
    M
    o
    d
    e
    l
    )
    :
    n
    a
    m
    e = m
    o
    d
    e
    l
    s
    .
    C
    h
    a
    r
    F
    i
    e
    l
    d
    (
    '
    N
    a
    m
    e
    '
    , m
    a
    x
    _
    l
    e
    n
    g
    t
    h
    =
    2
    5
    5
    )
    s
    l
    u
    g = m
    o
    d
    e
    l
    s
    .
    S
    l
    u
    g
    F
    i
    e
    l
    d
    (
    '
    S
    l
    u
    g
    '
    , m
    a
    x
    _
    l
    e
    n
    g
    t
    h
    =
    2
    5
    5
    , u
    n
    i
    q
    u
    e
    =
    T
    r
    u
    e
    )
    l
    a
    s
    t
    _
    r
    e
    v = m
    o
    d
    e
    l
    s
    .
    F
    o
    r
    e
    i
    g
    n
    K
    e
    y
    (
    '
    R
    e
    v
    i
    s
    i
    o
    n
    '
    , o
    n
    _
    d
    e
    l
    e
    t
    e
    =
    m
    o
    d
    e
    l
    s
    .
    P
    R
    O
    T
    E
    C
    T
    ,
    n
    u
    l
    l
    =
    T
    r
    u
    e
    , u
    n
    i
    q
    u
    e
    =
    T
    r
    u
    e
    , r
    e
    l
    a
    t
    e
    d
    _
    n
    a
    m
    e
    =
    '
    +
    '
    )
    c
    l
    a
    s
    s R
    e
    v
    i
    s
    i
    o
    n
    (
    m
    o
    d
    e
    l
    s
    .
    M
    o
    d
    e
    l
    )
    :
    p
    a
    g
    e = m
    o
    d
    e
    l
    s
    .
    F
    o
    r
    e
    i
    g
    n
    K
    e
    y
    (
    P
    a
    g
    e
    , o
    n
    _
    d
    e
    l
    e
    t
    e
    =
    m
    o
    d
    e
    l
    s
    .
    P
    R
    O
    T
    E
    C
    T
    )
    t
    e
    x
    t = m
    o
    d
    e
    l
    s
    .
    T
    e
    x
    t
    F
    i
    e
    l
    d
    (
    '
    T
    e
    x
    t
    '
    )
    v
    e
    r
    s
    i
    o
    n
    _
    d
    a
    t
    e = m
    o
    d
    e
    l
    s
    .
    D
    a
    t
    e
    T
    i
    m
    e
    F
    i
    e
    l
    d
    (
    '
    D
    a
    t
    e
    '
    , d
    e
    f
    a
    u
    l
    t
    =
    n
    o
    w
    )
    c
    l
    a
    s
    s M
    e
    t
    a
    :
    o
    r
    d
    e
    r
    i
    n
    g = (
    '
    -
    v
    e
    r
    s
    i
    o
    n
    _
    d
    a
    t
    e
    '
    , )

    View Slide

  25. BENCHMARKS

    View Slide

  26. BENCHMARK DATA
    PostgreSQL 9.3.5
    6,300 pages
    approx. 6 million revisions in total
    1 to 2,000 revisions per page

    View Slide

  27. TASK 1
    SHOW A SINGLE PAGE AND ITS CURRENT
    REVISION

    View Slide

  28. NORMALIZED SCHEMA
    f
    r
    o
    m d
    j
    a
    n
    g
    o
    .
    s
    h
    o
    r
    t
    c
    u
    t
    s i
    m
    p
    o
    r
    t g
    e
    t
    _
    o
    b
    j
    e
    c
    t
    _
    o
    r
    _
    4
    0
    4
    , r
    e
    n
    d
    e
    r
    f
    r
    o
    m n
    o
    r
    m
    a
    l
    i
    z
    e
    d
    .
    m
    o
    d
    e
    l
    s i
    m
    p
    o
    r
    t P
    a
    g
    e
    d
    e
    f s
    h
    o
    w
    (
    r
    e
    q
    u
    e
    s
    t
    , s
    l
    u
    g
    )
    :
    p
    a
    g
    e = g
    e
    t
    _
    o
    b
    j
    e
    c
    t
    _
    o
    r
    _
    4
    0
    4
    (
    P
    a
    g
    e
    , s
    l
    u
    g
    =
    s
    l
    u
    g
    )
    r
    e
    v
    i
    s
    i
    o
    n = p
    a
    g
    e
    .
    l
    a
    s
    t
    _
    r
    e
    v
    c
    o
    n
    t
    e
    x
    t = {
    '
    p
    a
    g
    e
    '
    : p
    a
    g
    e
    ,
    '
    r
    e
    v
    i
    s
    i
    o
    n
    '
    : r
    e
    v
    i
    s
    i
    o
    n
    ,
    }
    r
    e
    t
    u
    r
    n r
    e
    n
    d
    e
    r
    (
    r
    e
    q
    u
    e
    s
    t
    , '
    s
    h
    o
    w
    .
    h
    t
    m
    l
    '
    , c
    o
    n
    t
    e
    x
    t
    )

    View Slide

  29. DENORMALIZED SCHEMA
    f
    r
    o
    m d
    j
    a
    n
    g
    o
    .
    s
    h
    o
    r
    t
    c
    u
    t
    s i
    m
    p
    o
    r
    t g
    e
    t
    _
    o
    b
    j
    e
    c
    t
    _
    o
    r
    _
    4
    0
    4
    , r
    e
    n
    d
    e
    r
    f
    r
    o
    m d
    e
    n
    o
    r
    m
    a
    l
    i
    z
    e
    d
    .
    m
    o
    d
    e
    l
    s i
    m
    p
    o
    r
    t P
    a
    g
    e
    d
    e
    f s
    h
    o
    w
    (
    r
    e
    q
    u
    e
    s
    t
    , s
    l
    u
    g
    )
    :
    p
    a
    g
    e = g
    e
    t
    _
    o
    b
    j
    e
    c
    t
    _
    o
    r
    _
    4
    0
    4
    (
    P
    a
    g
    e
    , s
    l
    u
    g
    =
    s
    l
    u
    g
    )
    r
    e
    v
    i
    s
    i
    o
    n = p
    a
    g
    e
    .
    l
    a
    s
    t
    _
    r
    e
    v
    c
    o
    n
    t
    e
    x
    t = {
    '
    p
    a
    g
    e
    '
    : p
    a
    g
    e
    ,
    '
    r
    e
    v
    i
    s
    i
    o
    n
    '
    : r
    e
    v
    i
    s
    i
    o
    n
    ,
    }
    r
    e
    t
    u
    r
    n r
    e
    n
    d
    e
    r
    (
    r
    e
    q
    u
    e
    s
    t
    , '
    s
    h
    o
    w
    .
    h
    t
    m
    l
    '
    , c
    o
    n
    t
    e
    x
    t
    )

    View Slide

  30. RESULTS
    normalized denormalized
    Requests /
    minute
    13,000 15,000
    Concurrent
    connections
    24 24 | 32
    Quotient 541.67 625 | 468.75

    View Slide

  31. TASK 2
    SHOW THE TITLES AND LAST VERSION DATE
    OF EVERY PAGE

    View Slide

  32. NORMALIZED SCHEMA
    f
    r
    o
    m d
    j
    a
    n
    g
    o
    .
    s
    h
    o
    r
    t
    c
    u
    t
    s i
    m
    p
    o
    r
    t g
    e
    t
    _
    o
    b
    j
    e
    c
    t
    _
    o
    r
    _
    4
    0
    4
    , r
    e
    n
    d
    e
    r
    f
    r
    o
    m d
    j
    a
    n
    g
    o
    .
    d
    b i
    m
    p
    o
    r
    t m
    o
    d
    e
    l
    s
    f
    r
    o
    m n
    o
    r
    m
    a
    l
    i
    z
    e
    d
    .
    m
    o
    d
    e
    l
    s i
    m
    p
    o
    r
    t P
    a
    g
    e
    d
    e
    f l
    i
    s
    t
    _
    v
    i
    e
    w
    (
    r
    e
    q
    u
    e
    s
    t
    )
    :
    p
    a
    g
    e
    s = P
    a
    g
    e
    .
    o
    b
    j
    e
    c
    t
    s
    .
    a
    l
    l
    (
    )
    r
    e
    t
    u
    r
    n r
    e
    n
    d
    e
    r
    (
    r
    e
    q
    u
    e
    s
    t
    , '
    l
    i
    s
    t
    .
    h
    t
    m
    l
    '
    , {
    '
    p
    a
    g
    e
    s
    '
    : p
    a
    g
    e
    s
    }
    )
    6,301 QUERIES! ONE PER PAGE!

    View Slide

  33. NORMALIZED SCHEMA
    (SLIGHTLY OPTIMIZED)
    f
    r
    o
    m d
    j
    a
    n
    g
    o
    .
    s
    h
    o
    r
    t
    c
    u
    t
    s i
    m
    p
    o
    r
    t g
    e
    t
    _
    o
    b
    j
    e
    c
    t
    _
    o
    r
    _
    4
    0
    4
    , r
    e
    n
    d
    e
    r
    f
    r
    o
    m d
    j
    a
    n
    g
    o
    .
    d
    b i
    m
    p
    o
    r
    t m
    o
    d
    e
    l
    s
    f
    r
    o
    m n
    o
    r
    m
    a
    l
    i
    z
    e
    d
    .
    m
    o
    d
    e
    l
    s i
    m
    p
    o
    r
    t P
    a
    g
    e
    d
    e
    f l
    i
    s
    t
    _
    v
    i
    e
    w
    (
    r
    e
    q
    u
    e
    s
    t
    )
    :
    p
    a
    g
    e
    s = P
    a
    g
    e
    .
    o
    b
    j
    e
    c
    t
    s
    .
    a
    l
    l
    (
    )
    .
    a
    n
    n
    o
    t
    a
    t
    e
    (
    l
    a
    s
    t
    _
    r
    e
    v
    _
    v
    e
    r
    s
    i
    o
    n
    _
    d
    a
    t
    e
    =
    m
    o
    d
    e
    l
    s
    .
    M
    a
    x
    (
    '
    r
    e
    v
    i
    s
    i
    o
    n
    _
    _
    v
    e
    r
    s
    i
    o
    n
    _
    d
    a
    t
    e
    '
    )
    )
    r
    e
    t
    u
    r
    n r
    e
    n
    d
    e
    r
    (
    r
    e
    q
    u
    e
    s
    t
    , '
    l
    i
    s
    t
    .
    h
    t
    m
    l
    '
    , {
    '
    p
    a
    g
    e
    s
    '
    : p
    a
    g
    e
    s
    }
    )

    View Slide

  34. DENORMALIZED SCHEMA
    f
    r
    o
    m d
    j
    a
    n
    g
    o
    .
    s
    h
    o
    r
    t
    c
    u
    t
    s i
    m
    p
    o
    r
    t g
    e
    t
    _
    o
    b
    j
    e
    c
    t
    _
    o
    r
    _
    4
    0
    4
    , r
    e
    n
    d
    e
    r
    f
    r
    o
    m d
    e
    n
    o
    r
    m
    a
    l
    i
    z
    e
    d
    .
    m
    o
    d
    e
    l
    s i
    m
    p
    o
    r
    t P
    a
    g
    e
    d
    e
    f l
    i
    s
    t
    _
    v
    i
    e
    w
    (
    r
    e
    q
    u
    e
    s
    t
    )
    :
    p
    a
    g
    e
    s = P
    a
    g
    e
    .
    o
    b
    j
    e
    c
    t
    s
    .
    s
    e
    l
    e
    c
    t
    _
    r
    e
    l
    a
    t
    e
    d
    (
    '
    l
    a
    s
    t
    _
    r
    e
    v
    '
    )
    .
    a
    l
    l
    (
    )
    r
    e
    t
    u
    r
    n r
    e
    n
    d
    e
    r
    (
    r
    e
    q
    u
    e
    s
    t
    , '
    l
    i
    s
    t
    .
    h
    t
    m
    l
    '
    , {
    '
    p
    a
    g
    e
    s
    '
    : p
    a
    g
    e
    s
    }
    )

    View Slide

  35. RESULTS
    normalized denormalized
    Requests /
    minute
    12 | 24 88
    Concurrent
    connections
    4 4
    Quotient 3 | 6 22

    View Slide

  36. QUESTIONS?

    View Slide