Slide 1

Slide 1 text

DATABASE NORMALISATION

Slide 2

Slide 2 text

Introduction ■ Desired characteristics of a database include it being efficient in terms of storage and easy to maintain. ■ The first point, of storage, means redundant data should be avoided and the second point, of maintenance, means that a good design will logically separate data into tables. ■ Normalisation is a design method that can be used to achieve this. ■ What is normalisation? ■ “a technique for designing relational database tables to minimize duplication of information and, in so doing, to safeguard the database against certain types of logical or structural problems” (wikipedia.com)

Slide 3

Slide 3 text

Rules ■ Normalisation provides rules that help: ■ organise the data efficiently. ■ eliminate redundant data. ■ ensure that only related data are stored in a table.

Slide 4

Slide 4 text

Normal Forms ■ Over time some good rules have been developed that allows a database to be designed with different levels of efficiency. ■ These sets of rules are called the 'Normal Forms' and are numbered from 0 to 5. So there is a first, second, third, fourth and fifth normal form. ■ The short hand for these are 0NF, 1NF, 2NF, 3NF, 4NF, 5NF ■ You do not need to be concerned with the fourth and fifth normal forms as these are not required on this course. We will only discuss first, second and third normal forms. ■ You should be aware that normal forms are only guidelines for the database designer - they do not have to follow them! ■ The designer will choose the form needed depending on what kind of database is to be developed. They may decide not to normalise the database at all, in which case it is called an 'unnormalised database' or 0NF.

Slide 5

Slide 5 text

First Normal Form – 1NF ■ For a database to be in first normal form (1NF), the following rules have to be met for each table in the database – There are no columns with repeated or similar data – Each data item cannot be broken down any further – Each row is unique i.e. it has a primary key – Each field has an unique name

Slide 6

Slide 6 text

Atomic Data ■ 'Atomic' is the word used to describe a data item that cannot be broken down any further. ■ Examples of atomic data – An identity number – National Insurance number NY344599 – ISBN book reference e.g.1-931841-62-4 – Stock code PN10B – A first name, ’Dylan' – A surname, ’Roberts' – A telephone number – A school name: 'Abington Hall Comprehensive School' – A complete description 'A fountain pen is a writing instrument'. ■ The point of the two last examples indicates that 'atomic' does not mean 'single word'. It simply means that it would make no sense to reduce the data item any further as it would lose its meaning.

Slide 7

Slide 7 text

Non-Atomic Data ■ Examples of non-atomic data – A full name: John Hunt when a first name and surname is present in the database – A full address: 6 Picton Road, London, WR1 4PG – Data that are part of a larger dataset e.g Oxford Book club and Coventry Health club ■ In the last example, the database in question is storing club details and there are many different kind of clubs (Health, Book, Chess etc) being stored. So it would make sense to split this data into {Location, Club type}. So the context of the database needs to be understood when deciding if data is atomic or not.

Slide 8

Slide 8 text

Repeated Data ■ In this case the database is trying to store contact telephone numbers for each person. The designer has created three fields to hold telephone numbers. This is what is meant by 'repeating data'. The telephone numbers are the same kind of data.

Slide 9

Slide 9 text

1NF Example ■ Reminder: Rules for first normal form – There are no columns with repeated or similar data – Each data item cannot be broken down any further. – Each row is unique i.e. it has a primary key – Each field has an unique name ■ Which of these tables are NOT in first normal form? 1 2 3 4

Slide 10

Slide 10 text

1NF Example Table 1. This is not in 1NF. There is no primary key defined and so this record cannot be guaranteed to be unique. Full name is redundant - data is not atomic - as it is simply a combination of Firstname and Surname.

Slide 11

Slide 11 text

1NF Example Table 2. This is in at least 1NF. It has a primary key identified by the underline. The data is atomic. Each field has an unique name. There is no repeat data.

Slide 12

Slide 12 text

1NF Example Table 3. This is not in 1NF. It has a primary key, so it passes that test Data is atomic The colour the shoe can come in is being repeated - and furthermore the fields have the same name - so not in 1NF

Slide 13

Slide 13 text

1NF Example Table 4. This is in 1NF as it meets all the rules for the first normal form. ■ Question to ask yourself to spot 1NF – Does it have a primary key? – Are each field name unique? – Is the data atomic? – Are there repeating / redundant fields?

Slide 14

Slide 14 text

1NF Example ■ Suppose a designer has been asked to compile a database for a fan club web site. Fans visit the web site to find like-minded friends. ■ The entities to be stored are ■ This indicates that each band has many fans. ■ Each person is a fan of only one group. Band Fan

Slide 15

Slide 15 text

1NF Example BAND FAN The attributes of band are: •band id •band name •musictype The attributes of a fan are: •fan id •firstname •surname •email address(es) The database needs to be in first normal form.

Slide 16

Slide 16 text

1NF Example ■ First Attempt ■ This is the first time this person has designed a database and is not really sure of how to go about it. He designs the FAN table and loads it with the following records ■ He has correctly set up a primary key. He also used a foreign key to refer to the band. ■ However this is not in 1NF because Mary has two email addresses loaded into the email field. The data is not atomic. Loading data in this way is also going to make it very difficult to extract email addresses. Also the data length of the email field now has to be long enough to handle many email addresses, this is very inefficient and would be a problem if exceeded.

Slide 17

Slide 17 text

1NF Example ■ Second Attempt ■ He soon realises this is not a good idea. So then he decides to create two email fields ■ This is also a poor approach - note that email2 is not being used in Tom's record and so is causing wasted storage, so not 1NF which seeks to avoid wasted / redundant data. ■ Another problem is what if a fan has many more emails? Adding more and more email fields will make the wasted storage even worse. ■ Another problem is that the query to extract email addresses is far more complex than it needs to be as it has to examine each email field.

Slide 18

Slide 18 text

1NF Example ■ Solution ■ After trying out various ideas, he comes up with a good solution - create another entity called 'email' and use a foreign key in that table to link the fan and email tables together. The ER diagram is as follows: Band Fan Email

Slide 19

Slide 19 text

1NF Example ■ The ER diagram shows that each fan can have many emails, but an email can only belong to one fan. ■ The FAN and EMAIL table now look like this ■ FAN ■ EMAIL ■ Mary (FanID = 2) has two entries in the email table. There is no problem adding even more emails for her. Extracting emails is now simple as there is only one email column. There is no wasted storage. ■ The tables are now in first normal form (1NF) as they obey the following rules – Each table has a primary key – Each field name is unique – Data is atomic – No repeating / redundant fields

Slide 20

Slide 20 text

Second Normal Form – 2NF ■ Most tables tend to have a single-attribute (i.e. simple) primary key. Like this: CUSTOMER ■ But sometimes a table has a primary key made up of more than one attribute i.e. it has a compound primary key. CONCERT ■ The table above is using both the venue and artist as the compound primary key.

Slide 21

Slide 21 text

Second Normal Form – 2NF ■ It is in this situation that the extra rule for second normal form comes in handy. The rule states: – Non-key attributes must depend on every part of the primary key – The table must already be in first normal form ■ So inherently, any table that is already in 1NF and has a simple primary key is automatically in second normal form as well. ■ Consider the Concert example again - this is NOT in second normal form. Notice the attribute called Style. This is describing the style of artist - it has nothing to do with where the concert took place! And so its value does not depend on EVERY part of the primary key, so the rule for second normal form is not being met. ■ The reason for this rule is to ensure there is no redundant data being stored.

Slide 22

Slide 22 text

Second Normal Form – 2NF ■ For example, let's add another Girls Aloud concert to the table: ■ Notice that the 'girl band' value is being repeated and so is causing the database to be bigger than it needs to be.

Slide 23

Slide 23 text

Forming a 2NF Database ■ Continuing with the concert table CONCERT ■ This table needs to be split so that non-dependent attributes are removed and only stored once.

Slide 24

Slide 24 text

Forming a 2NF Database ■ In this case a 'style' table is formed that has Artist as the simple primary key CONCERT STYLE ■ Now the rule for 2NF is being met by both tables - every non-key attribute is depending on the complete primary key. There is no redundant data.

Slide 25

Slide 25 text

Third Normal Form – 3NF ■ For a database to be in third normal form, the following rules have to be met – It is already in 2NF – There are no non-key attributes that depend on another non-key attribute ■ What this is trying to do is to spot yet another source of redundant data. ■ If the value of an attribute can be obtained by simply making use of another attribute in the table, then it does not need to be there. Loading that attribute into another table and linking to it will make the database smaller. ■ To clarify, consider the table below: CONCERT ■ Notice that the country could be obtained by referring to the City - if the concert was in London then you know it is also in the UK - no need to look at the primary key!

Slide 26

Slide 26 text

Third Normal Form – 3NF ■ So to make this database into third normal form, split the table as follows CONCERT COUNTRIES ■ The new table called COUNTRIES has City as the primary key and country as an attribute. The Concert table has City as a foreign key. So now you can obtain the country in which any particular concert took place and there is no redundant data.

Slide 27

Slide 27 text

3NF Example 1 ■ Reminder, 3NF means: – It is already in 2NF – There are no non-key attributes that depend on another non-key attribute CUSTOMER ■ This is not in strict 3NF as the City could be obtained from the Post code attribute. If you created a table containing postcodes then city could be derived.

Slide 28

Slide 28 text

3NF Example 1 ■ This is not in strict 3NF as the City could be obtained from the Post code attribute. If you created a table containing postcodes then city could be derived. CUSTOMER POSTCODES

Slide 29

Slide 29 text

3NF Example 2 ■ The Description of what the certificate means could be obtained from the certificate attribute - it does not need to refer to the primary key VideoID. So split it out and use the primary key / secondary key approach.

Slide 30

Slide 30 text

3NF Example 3 CLIENT CINEMAS ■ In this case the database is almost in 3NF - for some reason the Cinema Address is being repeated in the Client table, even though it can be obtained from the Cinemas table. So simply remove the column from the client table

Slide 31

Slide 31 text

Benefits of Normalisation ■ 1. The database does not have redundant data, it is smaller in size so less money needs to be spent on storage ■ 2. Because there is less data to search through, it is much faster to run a query on the data ■ 3. Because there is no data duplication there is better data integrity and less risk of mistakes. ■ 4. Because there is no data duplication there is less chance of storing two or more different copies of the data ■ 5. One change can be made which can instantly be cascaded across any related records.

Slide 32

Slide 32 text

Problems with Normalisation ■ 1. You need to be careful with trying to make data atomic. Just because you can split some types of data further, it isn't always necessarily the correct thing to do. For example, telephone number might contain the code followed by the number 01234 567890. It wouldn't be sensible to separate out these two items. ■ 2. You can end up with more tables than an un-normalised database ■ 3. The more tables and the more complex the database, the slower queries can be to run ■ 4. It is necessary to assign more relationships to interact with larger numbers of tables ■ 5. With more tables, setting up queries can become more complex