Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Database Normalisation

AllenHeard
January 11, 2017

Database Normalisation

Year 13 Lesson

AllenHeard

January 11, 2017
Tweet

More Decks by AllenHeard

Other Decks in Education

Transcript

  1. Introduction ▪ Desired characteristics of a database include it being

    efficient in terms of storage and easy to maintain. ▪ The first point, of storage, means redundant data should be avoided and the second point, of maintenance, means that a good design will logically separate data into tables. ▪ Normalisation is a design method that can be used to achieve this. ▪ What is normalisation? ▪ “a technique for designing relational database tables to minimize duplication of information and, in so doing, to safeguard the database against certain types of logical or structural problems” (wikipedia.com)
  2. Rules ▪ Normalisation provides rules that help: ▪ organise the

    data efficiently. ▪ eliminate redundant data. ▪ ensure that only related data are stored in a table.
  3. Normal Forms ▪ Over time some good rules have been

    developed that allows a database to be designed with different levels of efficiency. ▪ These sets of rules are called the 'Normal Forms' and are numbered from 0 to 5. So there is a first, second, third, fourth and fifth normal form. ▪ The short hand for these are 0NF, 1NF, 2NF, 3NF, 4NF, 5NF ▪ You do not need to be concerned with the fourth and fifth normal forms as these are not required on this course. We will only discuss first, second and third normal forms. ▪ You should be aware that normal forms are only guidelines for the database designer - they do not have to follow them! ▪ The designer will choose the form needed depending on what kind of database is to be developed. They may decide not to normalise the database at all, in which case it is called an 'unnormalised database' or 0NF.
  4. First Normal Form – 1NF ▪ For a database to

    be in first normal form (1NF), the following rules have to be met for each table in the database – There are no columns with repeated or similar data – Each data item cannot be broken down any further – Each row is unique i.e. it has a primary key – Each field has an unique name
  5. Atomic Data ▪ 'Atomic' is the word used to describe

    a data item that cannot be broken down any further. ▪ Examples of atomic data – An identity number – National Insurance number NY344599 – ISBN book reference e.g.1-931841-62-4 – Stock code PN10B – A first name, ’Dylan' – A surname, ’Roberts' – A telephone number – A school name: 'Abington Hall Comprehensive School' – A complete description 'A fountain pen is a writing instrument'. ▪ The point of the two last examples indicates that 'atomic' does not mean 'single word'. It simply means that it would make no sense to reduce the data item any further as it would lose its meaning.
  6. Non-Atomic Data ▪ Examples of non-atomic data – A full

    name: John Hunt when a first name and surname is present in the database – A full address: 6 Picton Road, London, WR1 4PG – Data that are part of a larger dataset e.g Oxford Book club and Coventry Health club ▪ In the last example, the database in question is storing club details and there are many different kind of clubs (Health, Book, Chess etc) being stored. So it would make sense to split this data into {Location, Club type}. So the context of the database needs to be understood when deciding if data is atomic or not.
  7. Repeated Data ▪ In this case the database is trying

    to store contact telephone numbers for each person. The designer has created three fields to hold telephone numbers. This is what is meant by 'repeating data'. The telephone numbers are the same kind of data.
  8. 1NF Example ▪ Reminder: Rules for first normal form –

    There are no columns with repeated or similar data – Each data item cannot be broken down any further. – Each row is unique i.e. it has a primary key – Each field has an unique name ▪ Which of these tables are NOT in first normal form? 1 2 3 4
  9. 1NF Example Table 1. This is not in 1NF. There

    is no primary key defined and so this record cannot be guaranteed to be unique. Full name is redundant - data is not atomic - as it is simply a combination of Firstname and Surname.
  10. 1NF Example Table 2. This is in at least 1NF.

    It has a primary key identified by the underline. The data is atomic. Each field has an unique name. There is no repeat data.
  11. 1NF Example Table 3. This is not in 1NF. It

    has a primary key, so it passes that test Data is atomic The colour the shoe can come in is being repeated - and furthermore the fields have the same name - so not in 1NF
  12. 1NF Example Table 4. This is in 1NF as it

    meets all the rules for the first normal form. ▪ Question to ask yourself to spot 1NF – Does it have a primary key? – Are each field name unique? – Is the data atomic? – Are there repeating / redundant fields?
  13. 1NF Example ▪ Suppose a designer has been asked to

    compile a database for a fan club web site. Fans visit the web site to find like-minded friends. ▪ The entities to be stored are ▪ This indicates that each band has many fans. ▪ Each person is a fan of only one group. Band Fan
  14. 1NF Example BAND FAN The attributes of band are: •band

    id •band name •musictype The attributes of a fan are: •fan id •firstname •surname •email address(es) The database needs to be in first normal form.
  15. 1NF Example ▪ First Attempt ▪ This is the first

    time this person has designed a database and is not really sure of how to go about it. He designs the FAN table and loads it with the following records ▪ He has correctly set up a primary key. He also used a foreign key to refer to the band. ▪ However this is not in 1NF because Mary has two email addresses loaded into the email field. The data is not atomic. Loading data in this way is also going to make it very difficult to extract email addresses. Also the data length of the email field now has to be long enough to handle many email addresses, this is very inefficient and would be a problem if exceeded.
  16. 1NF Example ▪ Second Attempt ▪ He soon realises this

    is not a good idea. So then he decides to create two email fields ▪ This is also a poor approach - note that email2 is not being used in Tom's record and so is causing wasted storage, so not 1NF which seeks to avoid wasted / redundant data. ▪ Another problem is what if a fan has many more emails? Adding more and more email fields will make the wasted storage even worse. ▪ Another problem is that the query to extract email addresses is far more complex than it needs to be as it has to examine each email field.
  17. 1NF Example ▪ Solution ▪ After trying out various ideas,

    he comes up with a good solution - create another entity called 'email' and use a foreign key in that table to link the fan and email tables together. The ER diagram is as follows: Band Fan Email
  18. 1NF Example ▪ The ER diagram shows that each fan

    can have many emails, but an email can only belong to one fan. ▪ The FAN and EMAIL table now look like this ▪ FAN ▪ EMAIL ▪ Mary (FanID = 2) has two entries in the email table. There is no problem adding even more emails for her. Extracting emails is now simple as there is only one email column. There is no wasted storage. ▪ The tables are now in first normal form (1NF) as they obey the following rules – Each table has a primary key – Each field name is unique – Data is atomic – No repeating / redundant fields
  19. Second Normal Form – 2NF ▪ Most tables tend to

    have a single-attribute (i.e. simple) primary key. Like this: CUSTOMER ▪ But sometimes a table has a primary key made up of more than one attribute i.e. it has a compound primary key. CONCERT ▪ The table above is using both the venue and artist as the compound primary key.
  20. Second Normal Form – 2NF ▪ It is in this

    situation that the extra rule for second normal form comes in handy. The rule states: – Non-key attributes must depend on every part of the primary key – The table must already be in first normal form ▪ So inherently, any table that is already in 1NF and has a simple primary key is automatically in second normal form as well. ▪ Consider the Concert example again - this is NOT in second normal form. Notice the attribute called Style. This is describing the style of artist - it has nothing to do with where the concert took place! And so its value does not depend on EVERY part of the primary key, so the rule for second normal form is not being met. ▪ The reason for this rule is to ensure there is no redundant data being stored.
  21. Second Normal Form – 2NF ▪ For example, let's add

    another Girls Aloud concert to the table: ▪ Notice that the 'girl band' value is being repeated and so is causing the database to be bigger than it needs to be.
  22. Forming a 2NF Database ▪ Continuing with the concert table

    CONCERT ▪ This table needs to be split so that non-dependent attributes are removed and only stored once.
  23. Forming a 2NF Database ▪ In this case a 'style'

    table is formed that has Artist as the simple primary key CONCERT STYLE ▪ Now the rule for 2NF is being met by both tables - every non-key attribute is depending on the complete primary key. There is no redundant data.
  24. Third Normal Form – 3NF ▪ For a database to

    be in third normal form, the following rules have to be met – It is already in 2NF – There are no non-key attributes that depend on another non-key attribute ▪ What this is trying to do is to spot yet another source of redundant data. ▪ If the value of an attribute can be obtained by simply making use of another attribute in the table, then it does not need to be there. Loading that attribute into another table and linking to it will make the database smaller. ▪ To clarify, consider the table below: CONCERT ▪ Notice that the country could be obtained by referring to the City - if the concert was in London then you know it is also in the UK - no need to look at the primary key!
  25. Third Normal Form – 3NF ▪ So to make this

    database into third normal form, split the table as follows CONCERT COUNTRIES ▪ The new table called COUNTRIES has City as the primary key and country as an attribute. The Concert table has City as a foreign key. So now you can obtain the country in which any particular concert took place and there is no redundant data.
  26. 3NF Example 1 ▪ Reminder, 3NF means: – It is

    already in 2NF – There are no non-key attributes that depend on another non-key attribute CUSTOMER ▪ This is not in strict 3NF as the City could be obtained from the Post code attribute. If you created a table containing postcodes then city could be derived.
  27. 3NF Example 1 ▪ This is not in strict 3NF

    as the City could be obtained from the Post code attribute. If you created a table containing postcodes then city could be derived. CUSTOMER POSTCODES
  28. 3NF Example 2 ▪ The Description of what the certificate

    means could be obtained from the certificate attribute - it does not need to refer to the primary key VideoID. So split it out and use the primary key / secondary key approach.
  29. 3NF Example 3 CLIENT CINEMAS ▪ In this case the

    database is almost in 3NF - for some reason the Cinema Address is being repeated in the Client table, even though it can be obtained from the Cinemas table. So simply remove the column from the client table
  30. Benefits of Normalisation ▪ 1. The database does not have

    redundant data, it is smaller in size so less money needs to be spent on storage ▪ 2. Because there is less data to search through, it is much faster to run a query on the data ▪ 3. Because there is no data duplication there is better data integrity and less risk of mistakes. ▪ 4. Because there is no data duplication there is less chance of storing two or more different copies of the data ▪ 5. One change can be made which can instantly be cascaded across any related records.
  31. Problems with Normalisation ▪ 1. You need to be careful

    with trying to make data atomic. Just because you can split some types of data further, it isn't always necessarily the correct thing to do. For example, telephone number might contain the code followed by the number 01234 567890. It wouldn't be sensible to separate out these two items. ▪ 2. You can end up with more tables than an un-normalised database ▪ 3. The more tables and the more complex the database, the slower queries can be to run ▪ 4. It is necessary to assign more relationships to interact with larger numbers of tables ▪ 5. With more tables, setting up queries can become more complex