Relational Database Design - Lecture 4 - Introduction to Databases (1007156ANR)

2 December 2005 Introduction to Databases Relational Database Design Prof.
Beat Signer Department of Computer Science Vrije Universiteit Brussel beatsigner.com

Beat Signer - Department of Computer Science - [email protected] 2
March 6, 2019 Relational Database Design ▪ There are two major relational database design approaches ▪ Top-down design ▪ develop a conceptual model (e.g. ER model) ▪ reduction (mapping) of the conceptual model to relation schemas ▪ use normalisation as a validation technique to check the quality of the resulting relation schemas - a relational database schema resulting from the mapping of a good ER model (with the correct entity sets) normally requires no further normalisation ▪ Bottom-up design ▪ design by decomposition ▪ use normalisation to iteratively create (decompose) a set of relations starting with a single relation

March 6, 2019 Relational Database Design ... ▪ A relation schema might contain certain dependencies in which case it should be decomposed (normalised) into multiple smaller relation schemas ▪ this normalisation process is based on functional dependencies and multivalued dependencies ▪ Sometimes multiple relations resulting from an ER to relation schema reduction might be merged to save some join query operations ▪ we have to ensure that the resulting larger relation schema does not introduce new undesirable dependencies

March 6, 2019 Reduction ▪ A conceptual ER model can be reduced to a set of relation schemas (relational database schema) ▪ The quality of the resulting set of relation schemas depends on the quality of the original ER design (there is no magic) ▪ In the following we discuss the reduction of the different ER model concepts introduced earlier

March 6, 2019 Strong Entity Sets ▪ A strong entity set E with only simple attributes a1 ,..., an is mapped to a relation R with attributes a1 ,..., an ▪ the primary key of the entity set E becomes the primary key of the relation R Employees id name Employee (id, name) id name 1234 Beat Signer 1576 Lode Hoste 3212 Sandra Trullemans ... ... relation schema employee = (Employee)

March 6, 2019 Composite Attributes ▪ For each component of a composite attribute, we create an attribute ai in the relation R ▪ no special attribute is created for the composite attribute itself Employee (id, name, street, city) Employees id name address street city

March 6, 2019 Multivalued Attributes ▪ Multivalued attributes are treated separately since a relation should only contain attributes with atomic values ▪ for each multivalued attribute ai of an entity set E, we create a new relation S containing the attribute ai as well as the primary key attributes of the relation R that is created for the entity set E - define a foreign key constraint to the original relation R Employees id name phone Phones (id, phone) id phone 1234 032 2 612 1337 1234 032 2 612 3123 1576 032 2 623 8765 ... ... phones = (Phones)

March 6, 2019 Weak Entity Sets ▪ A weak entity set E with attributes a1 ,..., an is mapped to a relation R with attributes a1 ,..., an combined with the primary key attributes b1 ,..., bm of the identifying entity set F ▪ the primary key of R is defined by the primary key attributes of the identifying entity set F combined with the discriminator of E ▪ a foreign key constraint is defined from the attributes b1 ,..., bm to the primary key of the relation that is created for the identifying entity set F

March 6, 2019 Weak Entity Sets ... Seat (id, number, colour) id number colour 1 1 red 1 20 black 4 1 black ... ... ... seat = (Seat)

March 6, 2019 Relationship Sets ▪ A relationship set over the entity sets E1 ,..., En with the optional descriptive attributes b1 ,..., bm is mapped to a relation R with the primary key attributes of E1 ,..., En combined with b1 ,..., bm ▪ The primary key of relation R is defined as follows ▪ binary many-to-many relationship - union of all primary key attributes of E1 and E2 ▪ binary one-to-one relationship - choose the primary key of E1 or E2 ▪ binary one-to-many or many-to-one relationship - choose the primary key of the entity set whose entity instances can only participate once (0..1 or 1..1)

March 6, 2019 Relationship Sets ... ▪ The primary key of relation R is defined as follows ... ▪ n-ary relationship without cardinality constraints - union of all primary key attributes of E1 ,..., En ▪ n-ary relationship with one 0..1 or 1..1 cardinality constraint over the entity set Ej - union of all primary key attributes of E1 ,..., En , except the primary key of Ej - note that we allow only one such 0..1 or 1..1 cardinality constraint for n-ary relationships ▪ A foreign key constraint is defined for each set of primary key attributes (provided by the entity set Ei ) to the primary key of the corresponding relation that is defined for Ei

March 6, 2019 Relationship Sets ... LocatedAt (id, name, address, duration) id name address duration 1234 10F721 Pleinlaan 2 1 1576 10F733 Pleinlaan 2 1 ... ... ... ... locatedAt = (LocatedAt) LocatedAt Offices Employees id name name address duration 0..* 0..* size

March 6, 2019 Relationship Sets ... LocatedAt (id, name, address, duration) id name address duration 1234 10F721 Pleinlaan 2 1 1576 10F733 Pleinlaan 2 1 ... ... ... ... locatedAt = (LocatedAt) LocatedAt Offices Employees id name name address duration 1..1 size 0..*

March 6, 2019 Weak Entity Existence Relationship ▪ The special relationship set from a weak entity set to its defining entity set is always a many-to-one relationship ▪ the special weak entity existence relationship does not have to be mapped to a separate relation since it is already covered by the relation that is created for the weak entity set - e.g. potential Offers relation schema already covered by Seat relation schema Seat (id, number, colour)

March 6, 2019 Combination of Schemas ▪ Relations resulting from the mapping of a relationship set with a total participation constraint can be integrated with the relation over which the constraint is defined ▪ key of the relation with the constraint (1..1) used as primary key LocatedAt Offices Employees id name name address duration 1..1 size 0..* Employee (id, employeeName, duration, name, address) Office (name, address, size)

March 6, 2019 Specialisation and Generalisation ▪ Create a new relation R for each entity subset ▪ combine the attributes of the entity set with the primary key attributes of the superclass Persons id name Students ISA Teachers teaching hours studentID Person (id, name) Student (id, studentID) Teacher (id, teachingHours)

March 6, 2019 Specialisation and Generalisation ... ▪ For a disjoint and total ISA constraint we might omit the separate superclass relation ▪ saves some join operations but it is no longer possible to define a foreign key constraint on the id attribute (now at two places) Persons id name Students ISA Teachers teaching hours studentID disjoint Student (id, name, studentID) Teacher (id, name, teachingHours)

March 6, 2019 Aggregations ▪ Like the regular relationship set mapping ▪ note that the name attribute is the one from the Companies entity set WorksFor Companies Employees id name name address Durations from to Manages Managers mId name Manages (id, from, to, name, address, mId)

March 6, 2019 Relational Database Design ▪ The goal of relational database design is to create a set of relation schemas that ▪ can be used to store information without unnecessary redundancy ▪ allow us to easily retrieve information ▪ The quality of the set of schemas resulting from a reduction (top-down design) depends on how good the original ER design was ▪ In a design by decomposition approach (bottom-up design) we need a way to reduce any redundancy via a decomposition process ▪ split large relation schemas into multiple smaller relation schemas

March 6, 2019 Update Anomalies ▪ Insertion anomaly ▪ redundant information has to be kept consistent - e.g. insertion of a new order for an already existing CD ▪ information about a CD can only be inserted if there is an order or we have to populate the customer information (i.e. name and street) with null values id name street cdName price 1 Max Frisch Bahnhofstrasse 7 Falling into Place 17.90 2 Eddy Merckx Pleinlaan 25 Falling into Place 17.90 53 Albert Einstein Bergstrasse 18 Chromatic 16.50 5 Max Frisch Bahnhofstrasse 7 Carcassonne 15.50 Order (id, name, street, cdName, price) order = (Order)

March 6, 2019 Update Anomalies ... ▪ Modification anomaly ▪ if we want to modify information about a particular CD, we have to ensure that the information is updated in all redudant entries - e.g. modification of the price of the CD named "Falling into Place" ▪ Deletion anomaly ▪ if we delete a customer who is the only buyer of a specific CD, we also lose the information about that specific CD - e.g. deletion of the customer "Albert Einstein" id name street cdName price 1 Max Frisch Bahnhofstrasse 7 Falling into Place 17.90 2 Eddy Merckx Pleinlaan 25 Falling into Place 17.90 53 Albert Einstein Bergstrasse 18 Chromatic 16.50 5 Max Frisch Bahnhofstrasse 7 Carcassonne 15.50

March 6, 2019 Normalisation ▪ Normalisation is a formal method to analyse relation schemas based on their keys, functional dependencies (FD) as well as multivalued dependencies (MVD) ▪ remove redundancy ▪ prevent certain update anomalies - insertion, modification and deletion ▪ There exists a set of rules to check if a relation is in a specific normal form original normal forms described by Codd Fifth Normal Form (5NF) Fourth Normal Form (4NF) Boyce-Codd Normal Form (BCNF) Third Normal Form (3NF) Second Normal Form (2NF) First Normal Form (1NF) stronger

March 6, 2019 Normalisation ... ▪ A relation that does not conform to a certain degree of normalisation can be decomposed (lossless-join decomposition) into multiple relations that are in the desired normal form ▪ can be done automatically ▪ Normalisation is often done in a stepwise manner ▪ a higher normal form means a more restricted format and less problems with update anomalies ▪ note that only the first normal form (1NF) is mandatory for the relational model and all the other normal forms are optional

March 6, 2019 First Normal Form (1NF) ▪ As we have seen earlier, the ER model supports complex attributes ▪ composite attributes ▪ multivalued attributes ▪ In the reduction process, we remove this substructure from attributes to create a relational model with atomic attribute values only ▪ A relation schema R is in first normal form (1NF) if the domains D1 ,..., Dn of all attributes a1 ,..., an of R are atomic ▪ no composite attributes or attributes with a set of values ▪ the intersection of each row and column contains one and only one value

March 6, 2019 Functional Dependencies ▪ In this example, there are various sets of attributes that uniquely identify a set of other attributes ▪ teacherID → teacher ▪ teacherID → salary ▪ teacherID → {teacher, salary} ▪ {teacherID, teacher} → {salary} ▪ department → {building, budget} ▪ ... ▪ We say that there is a functional dependency (→) between these two sets of attributes ▪ a functional dependency should always hold on a relation schema and not just on a particular relation instance TeacherDept (teacherID, teacher, salary, department, building, budget)

March 6, 2019 Functional Dependencies ... ▪ A functional dependency can be used to express constraints (generalisation of keys) over a set of attributes (determinant) that uniquely identify a set of other attributes (dependent attributes) ▪ For a relation schema R with a  R and b  R the functional dependency a → b holds on R, if for any r(R) ▪ " t1 ,t2  r(R) with t1 [a] = t2 [a] → t1 [b] = t2 [b] ▪ Note that any K  R is a superkey if K → R ▪ we can use functional dependencies to check whether K is a superkey

March 6, 2019 Functional Dependencies ... ▪ The relation r(R) contains the following set F of functional dependencies ▪ A → B ▪ C → E ▪ ... ▪ A functional dependency a → b is trivial if b  a ▪ trivial dependencies are satisfied by all relations ▪ A full functional dependency has a minimal determinant ▪ if the determinant is not minimal, we talk about a partial functional dependency (e.g. AD → B in the example) ▪ For a relation r(R) with a → b and b →  we say that  is transitively dependent on a via b A B C D E a1 b1 c1 d1 e1 a2 b2 c2 d1 e2 a2 b2 c3 d1 e3 a3 b2 c4 d3 e3 r(R)

March 6, 2019 Closure of Attributes ▪ For a given relation schema R, a number of functional dependencies and a set of attributes a  R, the closure a+ is defined by all attributes Bi such that a → Bi ▪ Computing the closure ▪ If the closure a+ contains all attributes of the relation schema R, then the attributes a form a superkey of R Initialise the set s with the attributes of a Repeat until the set s does not grow anymore { if there is a functional dependency b →  and b is in s, then add  to the set s }

March 6, 2019 Computation of Superkeys ▪ We can test whether a is a superkey for a given relation schema R by checking whether the closure a+ contains all attributes of R ▪ We can further use this approach to find all the superkeys for a relation schema R and a given set of functional dependencies ▪ check for each set a  R of attributes whether the closure a+ contains all attributes ▪ the search process can be slightly optimised by starting with the smallest possible subsets

March 6, 2019 Functional Dependency Inference ▪ For a given set F of functional dependencies we can derive new functional dependencies based on a set of axioms to compute the closure F+ of F ▪ the closure F+ includes all functional dependencies that are logically implied by F ▪ Three rules (Armstrong's axioms) can be used to compute F+ ▪ reflexivity - for a given set of attributes a and b  a, a → b holds (see trivial dependency) ▪ augmentation - for given a set of attributes ; if a → b then a → b holds ▪ transitivity - if a → b and b → , then a →  holds

March 6, 2019 Functional Dependency Inference ... ▪ Armstrong's axioms are sound (produce only elements of F+) and complete (produce all elements in F+) ▪ since it may take a lot of time to compute F+ with Armstrong's axioms only, there exist some additional rules ▪ Decomposition ▪ if a → b, then a → b and a →  hold ▪ Union ▪ if a → b and a → , then a → b holds ▪ Trivial dependency rules ▪ if a → b, then a → a  b holds ▪ if a → b, then a → a  b holds

March 6, 2019 Second Normal Form (2NF) ▪ A relation schema R is in second normal form (2NF) if it is in 1NF and if there exists no non-prime attribute that is functionally dependent on a part of a candidate key ▪ every non-prime attribute has to be fully functionally dependent on a candidate key ▪ a non-prime attribute is an attribute that is not part of any candidate key ▪ the Lecturer relation schema shown in the example is not in 2NF since the office attribute functionally depends on the teacher attribute teacher course office Beat Signer Databases 10G731d Beat Signer WIS 10G731d Lode Hoste Databases 10F716 Lode Hoste ATIS 10F716 Sandra Trullemans WIS 10G731e Lecturer (teacher, course, office) lecturer = (Lecturer)

March 6, 2019 Second Normal Form (2NF) ... ▪ 2NF normalisation process ▪ remove any partially dependent attributes from the relation and put them in a new relation together with their determinant ▪ The original Lecturer relation can be losslessly decomposed into two relations which are both in 2NF ▪ relations with single attribute keys are automatically in 2NF teacher office Beat Signer 10G731d Lode Hoste 10F716 Sandra Trullemans 10G731e Lecturer (teacher, office) Course (teacher, course) teacher course Beat Signer Databases Beat Signer WIS Lode Hoste Databases Lode Hoste ATIS Sandra Trullemans WIS lecturer = (Lecturer) course = (Course)

March 6, 2019 Lossless Decomposition ▪ Given a relation schema R and the two decompositions R1 and R2 of R, we say that R1 and R2 form a lossless decomposition if pR1 (r) ⋈ pR2 (r) = r ▪ Let F be a set of functional dependencies on R ▪ R1 and R2 form a lossless decomposition of R if either R1  R2 → R1 or R1  R2 → R2 are in F+ - this means that R1  R2 is a superkey of R1 or R2

March 6, 2019 Third Normal Form (3NF) ▪ A relation schema R is in third normal form (3NF) if it is in 2NF and no non-prime attribute is transitively dependent on a candidate key, i.e. for all functional dependencies a → b in F+ one of the following has to hold ▪ a → b is a trivial functional dependency (i.e. b  a) ▪ a is a superkey of R ▪ each attribute Ai in b - a is contained in a candidate key of R - note that each Ai can be in different candidate keys ▪ Each non-key attribute "must provide a fact about the key, the whole key, and nothing but the key" [Bill Kent]

March 6, 2019 Third Normal Form (3NF) ... ▪ The Prize relation example schema is in 2NF ▪ The Prize relation schema is not in 3NF since birthdate is functionally dependent on winner and none of the three conditions holds for this functional dependency ▪ birthdate is transitively dependent on the key (award, year) award year winner birthdate ACM Turing Award 1981 Edgar F. Codd 23.08.1923 Nobel Peace Prize 1979 Mother Teresa 26.08.1910 ACM Turing Award 1984 Niklaus Wirth 15.02.1934 Nobel Peace Prize 1984 Desmond Tutu 07.10.1931 prize = (Prize) Prize (award, year, winner, birthdate)

March 6, 2019 Third Normal Form (3NF) ... ▪ 3NF normalisation process ▪ remove any transitively dependent attributes from the relation and place them in a new relation together with their determinant ▪ Decomposition of the Prize relation schema into two 3NF relation schemas winner birthdate Edgar F. Codd 23.08.1923 Mother Teresa 09.01.1959 Niklaus Wirth 15.02.1934 Desmond Tutu 07.10.1931 prize = (Prize) Prize (award, year, winner) Birthdate (winner, birthdate) award year winner ACM Turing Award 1981 Edgar F. Codd Nobel Peace Prize 1992 Mother Teresa ACM Turing Award 1984 Niklaus Wirth Nobel Peace Prize 1984 Desmond Tutu bdate = (Birthdate)

March 6, 2019 Boyce-Codd Normal Form (BCNF) ▪ The Boyce-Codd normal form is a stronger form of 3NF ▪ A relation schema R is in Boyce-Codd Normal Form (BCNF) if it is in 3NF and if every determinant is a candidate key, i.e. for all functional dependencies a → b in F+ one of the following holds ▪ a → b is a trivial functional dependency (i.e. b  a) ▪ a is a superkey of R ▪ Any relation that is in BCNF is also in 3NF since the BCNF conditions are equivalent to the first two 3NF conditions

March 6, 2019 BCNF Decomposition ▪ If a relation R is not in BCNF, then there exists a least one nontrivial functional dependency a → b where a is not a superkey of R ▪ the relation R can then be decomposed into the two relation schemas R1 (a  b) and R2 (R - (b - a)) ▪ We can for example apply the BCNF decomposition to the previous Prize relation schema example with the functional dependency winner → birthdate ▪ a  b = (winner, birthdate) ▪ (R - (b - a)) = (award, year, winner) ▪ Further details about the algorithms for BCNF and 3NF decomposition can be found in the course book

March 6, 2019 Multivalued Dependencies ▪ Some relation schemas that are in BCNF may still contain redundant information ▪ The fourth normal form (4NF) deals with some of these problems based on multivalued dependencies ▪ for a given relation schema R with a  R and b  R the multivalued dependency a ↠ b holds if for all pairs of tuples t1 and t2 in r(R) (with t1 [a] = t2 [a]) there exist tuples t3 and t4 in r(R) such that - t1 [a] = t2 [a] = t3 [a] = t4 [a] - t3 [b] = t1 [b] - t3 [R - b] = t2 [R - b] - t4 [b] = t2 [b] - t4 [R - b] = t1 [R - b] a b R - a - b t1 a1 ...ai ai+1 ...aj aj+1 ...an t2 a1 ...ai bi+1 ...bj bj+1 ...bn t3 a1 ...ai ai+1 ...aj bj+1 ...bn t4 a1 ...ai bi+1 ...bj aj+1 ...an

March 6, 2019 Multivalued Dependencies ... ▪ Every functional dependency is also a multivalued dependency, e.g. if a → b then a ↠ b

March 6, 2019 Fourth Normal Form (4NF) ▪ A relation schema R is in fourth normal form (4NF) if it is in BCNF and if any non-trivial multivalued dependency is a dependency on a candidate key, i.e. for all multivalued dependencies a ↠ b in D+ one of the following has to hold ▪ a ↠ b is a trivial functional dependency (i.e. b  a or b  a = R) ▪ a is a superkey of R ▪ Note that the fourth normal form is very similar to BCNF except that we use multivalued dependencies ▪ 4NF normalisation process ▪ remove any multivalued attributes from the relation and place them in a new relation together with their determinant

March 6, 2019 Fifth Normal Form (5NF) ▪ There are some forms of constraints called join dependencies that generalise multivalued dependencies ▪ leads to the project-join normal form or fifth normal form (5NF) ▪ not discussed in detail in this course

March 6, 2019 Normalisation Summary ▪ Relations in higher normal forms are less vulnerable to update anomalies ▪ generally it is recommended that relations are at least in 3NF Fifth Normal Form (5NF) Fourth Normal Form (4NF) Boyce-Codd Normal Form (BCNF) Third Normal Form (3NF) Second Normal Form (2NF) First Normal Form (1NF) stronger Unnormalised (UN) remove repeating groups remove partial dependencies remove transitive dependencies every determinant has to be a candidate key remove multivalued dependencies remove join dependencies

March 6, 2019 Denormalisation ▪ Sometimes a database designer decides to store information in a redundant way to save join operations and improve the performance ▪ may result in additional work for insert, update and delete operations ▪ An alternative is to keep the normalised schema and introduce additional materialised views

March 6, 2019 Homework ▪ Study the following chapter of the Database System Concepts book ▪ chapter 7 - sections 7.6 and 7.8.6 - Reduction to Relation Schemas ▪ chapter 8 - sections 8.1-8.9 - Relational Database Design

March 6, 2019 Exercise 4 ▪ Relational algebra ▪ Relational database design ▪ ER to relational model reduction

March 6, 2019 References ▪ A. Silberschatz, H. Korth and S. Sudarshan, Database System Concepts (Sixth Edition), McGraw-Hill, 2010

2 December 2005 Next Lecture Structured Query Language (SQL)

Relational Database Design - Lecture 4 - Introd...

Relational Database Design - Lecture 4 - Introduction to Databases (1007156ANR)

More Decks by Beat Signer

Other Decks in Education

Featured

Transcript