Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Relational Database Design - Lecture 4 - Introduction to Databases (1007156ANR)

Relational Database Design - Lecture 4 - Introduction to Databases (1007156ANR)

This lecture forms part of the course Introduction to Databases given at the Vrije Universiteit Brussel.

Beat Signer

March 08, 2019
Tweet

More Decks by Beat Signer

Other Decks in Education

Transcript

  1. 2 December 2005 Introduction to Databases Relational Database Design Prof.

    Beat Signer Department of Computer Science Vrije Universiteit Brussel beatsigner.com
  2. Beat Signer - Department of Computer Science - [email protected] 2

    March 6, 2019 Relational Database Design ▪ There are two major relational database design approaches ▪ Top-down design ▪ develop a conceptual model (e.g. ER model) ▪ reduction (mapping) of the conceptual model to relation schemas ▪ use normalisation as a validation technique to check the quality of the resulting relation schemas - a relational database schema resulting from the mapping of a good ER model (with the correct entity sets) normally requires no further normalisation ▪ Bottom-up design ▪ design by decomposition ▪ use normalisation to iteratively create (decompose) a set of relations starting with a single relation
  3. Beat Signer - Department of Computer Science - [email protected] 3

    March 6, 2019 Relational Database Design ... ▪ A relation schema might contain certain dependencies in which case it should be decomposed (normalised) into multiple smaller relation schemas ▪ this normalisation process is based on functional dependencies and multivalued dependencies ▪ Sometimes multiple relations resulting from an ER to relation schema reduction might be merged to save some join query operations ▪ we have to ensure that the resulting larger relation schema does not introduce new undesirable dependencies
  4. Beat Signer - Department of Computer Science - [email protected] 4

    March 6, 2019 Reduction ▪ A conceptual ER model can be reduced to a set of relation schemas (relational database schema) ▪ The quality of the resulting set of relation schemas depends on the quality of the original ER design (there is no magic) ▪ In the following we discuss the reduction of the different ER model concepts introduced earlier
  5. Beat Signer - Department of Computer Science - [email protected] 5

    March 6, 2019 Strong Entity Sets ▪ A strong entity set E with only simple attributes a1 ,..., an is mapped to a relation R with attributes a1 ,..., an ▪ the primary key of the entity set E becomes the primary key of the relation R Employees id name Employee (id, name) id name 1234 Beat Signer 1576 Lode Hoste 3212 Sandra Trullemans ... ... relation schema employee = (Employee)
  6. Beat Signer - Department of Computer Science - [email protected] 6

    March 6, 2019 Composite Attributes ▪ For each component of a composite attribute, we create an attribute ai in the relation R ▪ no special attribute is created for the composite attribute itself Employee (id, name, street, city) Employees id name address street city
  7. Beat Signer - Department of Computer Science - [email protected] 7

    March 6, 2019 Multivalued Attributes ▪ Multivalued attributes are treated separately since a relation should only contain attributes with atomic values ▪ for each multivalued attribute ai of an entity set E, we create a new relation S containing the attribute ai as well as the primary key attributes of the relation R that is created for the entity set E - define a foreign key constraint to the original relation R Employees id name phone Phones (id, phone) id phone 1234 032 2 612 1337 1234 032 2 612 3123 1576 032 2 623 8765 ... ... phones = (Phones)
  8. Beat Signer - Department of Computer Science - [email protected] 8

    March 6, 2019 Weak Entity Sets ▪ A weak entity set E with attributes a1 ,..., an is mapped to a relation R with attributes a1 ,..., an combined with the pri- mary key attributes b1 ,..., bm of the identifying entity set F ▪ the primary key of R is defined by the primary key attributes of the identifying entity set F combined with the discriminator of E ▪ a foreign key constraint is defined from the attributes b1 ,..., bm to the primary key of the relation that is created for the identifying entity set F
  9. Beat Signer - Department of Computer Science - [email protected] 9

    March 6, 2019 Weak Entity Sets ... Seat (id, number, colour) id number colour 1 1 red 1 20 black 4 1 black ... ... ... seat = (Seat)
  10. Beat Signer - Department of Computer Science - [email protected] 10

    March 6, 2019 Relationship Sets ▪ A relationship set over the entity sets E1 ,..., En with the optional descriptive attributes b1 ,..., bm is mapped to a relation R with the primary key attributes of E1 ,..., En combined with b1 ,..., bm ▪ The primary key of relation R is defined as follows ▪ binary many-to-many relationship - union of all primary key attributes of E1 and E2 ▪ binary one-to-one relationship - choose the primary key of E1 or E2 ▪ binary one-to-many or many-to-one relationship - choose the primary key of the entity set whose entity instances can only participate once (0..1 or 1..1)
  11. Beat Signer - Department of Computer Science - [email protected] 11

    March 6, 2019 Relationship Sets ... ▪ The primary key of relation R is defined as follows ... ▪ n-ary relationship without cardinality constraints - union of all primary key attributes of E1 ,..., En ▪ n-ary relationship with one 0..1 or 1..1 cardinality constraint over the entity set Ej - union of all primary key attributes of E1 ,..., En , except the primary key of Ej - note that we allow only one such 0..1 or 1..1 cardinality constraint for n-ary relationships ▪ A foreign key constraint is defined for each set of primary key attributes (provided by the entity set Ei ) to the primary key of the corresponding relation that is defined for Ei
  12. Beat Signer - Department of Computer Science - [email protected] 12

    March 6, 2019 Relationship Sets ... LocatedAt (id, name, address, duration) id name address duration 1234 10F721 Pleinlaan 2 1 1576 10F733 Pleinlaan 2 1 ... ... ... ... locatedAt = (LocatedAt) LocatedAt Offices Employees id name name address duration 0..* 0..* size
  13. Beat Signer - Department of Computer Science - [email protected] 13

    March 6, 2019 Relationship Sets ... LocatedAt (id, name, address, duration) id name address duration 1234 10F721 Pleinlaan 2 1 1576 10F733 Pleinlaan 2 1 ... ... ... ... locatedAt = (LocatedAt) LocatedAt Offices Employees id name name address duration 1..1 size 0..*
  14. Beat Signer - Department of Computer Science - [email protected] 14

    March 6, 2019 Weak Entity Existence Relationship ▪ The special relationship set from a weak entity set to its defining entity set is always a many-to-one relationship ▪ the special weak entity existence relationship does not have to be mapped to a separate relation since it is already covered by the relation that is created for the weak entity set - e.g. potential Offers relation schema already covered by Seat relation schema Seat (id, number, colour)
  15. Beat Signer - Department of Computer Science - [email protected] 15

    March 6, 2019 Combination of Schemas ▪ Relations resulting from the mapping of a relationship set with a total participation constraint can be integrated with the relation over which the constraint is defined ▪ key of the relation with the constraint (1..1) used as primary key LocatedAt Offices Employees id name name address duration 1..1 size 0..* Employee (id, employeeName, duration, name, address) Office (name, address, size)
  16. Beat Signer - Department of Computer Science - [email protected] 16

    March 6, 2019 Specialisation and Generalisation ▪ Create a new relation R for each entity subset ▪ combine the attributes of the entity set with the primary key attributes of the superclass Persons id name Students ISA Teachers teaching hours studentID Person (id, name) Student (id, studentID) Teacher (id, teachingHours)
  17. Beat Signer - Department of Computer Science - [email protected] 17

    March 6, 2019 Specialisation and Generalisation ... ▪ For a disjoint and total ISA constraint we might omit the separate superclass relation ▪ saves some join operations but it is no longer possible to define a foreign key constraint on the id attribute (now at two places) Persons id name Students ISA Teachers teaching hours studentID disjoint Student (id, name, studentID) Teacher (id, name, teachingHours)
  18. Beat Signer - Department of Computer Science - [email protected] 18

    March 6, 2019 Aggregations ▪ Like the regular relationship set mapping ▪ note that the name attribute is the one from the Companies entity set WorksFor Companies Employees id name name address Durations from to Manages Managers mId name Manages (id, from, to, name, address, mId)
  19. Beat Signer - Department of Computer Science - [email protected] 19

    March 6, 2019 Relational Database Design ▪ The goal of relational database design is to create a set of relation schemas that ▪ can be used to store information without unnecessary redundancy ▪ allow us to easily retrieve information ▪ The quality of the set of schemas resulting from a reduction (top-down design) depends on how good the original ER design was ▪ In a design by decomposition approach (bottom-up design) we need a way to reduce any redundancy via a decomposition process ▪ split large relation schemas into multiple smaller relation schemas
  20. Beat Signer - Department of Computer Science - [email protected] 20

    March 6, 2019 Update Anomalies ▪ Insertion anomaly ▪ redundant information has to be kept consistent - e.g. insertion of a new order for an already existing CD ▪ information about a CD can only be inserted if there is an order or we have to populate the customer information (i.e. name and street) with null values id name street cdName price 1 Max Frisch Bahnhofstrasse 7 Falling into Place 17.90 2 Eddy Merckx Pleinlaan 25 Falling into Place 17.90 53 Albert Einstein Bergstrasse 18 Chromatic 16.50 5 Max Frisch Bahnhofstrasse 7 Carcassonne 15.50 Order (id, name, street, cdName, price) order = (Order)
  21. Beat Signer - Department of Computer Science - [email protected] 21

    March 6, 2019 Update Anomalies ... ▪ Modification anomaly ▪ if we want to modify information about a particular CD, we have to ensure that the information is updated in all redudant entries - e.g. modification of the price of the CD named "Falling into Place" ▪ Deletion anomaly ▪ if we delete a customer who is the only buyer of a specific CD, we also lose the information about that specific CD - e.g. deletion of the customer "Albert Einstein" id name street cdName price 1 Max Frisch Bahnhofstrasse 7 Falling into Place 17.90 2 Eddy Merckx Pleinlaan 25 Falling into Place 17.90 53 Albert Einstein Bergstrasse 18 Chromatic 16.50 5 Max Frisch Bahnhofstrasse 7 Carcassonne 15.50
  22. Beat Signer - Department of Computer Science - [email protected] 22

    March 6, 2019 Normalisation ▪ Normalisation is a formal method to analyse relation schemas based on their keys, functional dependen- cies (FD) as well as multivalued dependencies (MVD) ▪ remove redundancy ▪ prevent certain update anomalies - insertion, modification and deletion ▪ There exists a set of rules to check if a relation is in a specific normal form original normal forms described by Codd Fifth Normal Form (5NF) Fourth Normal Form (4NF) Boyce-Codd Normal Form (BCNF) Third Normal Form (3NF) Second Normal Form (2NF) First Normal Form (1NF) stronger
  23. Beat Signer - Department of Computer Science - [email protected] 23

    March 6, 2019 Normalisation ... ▪ A relation that does not conform to a certain degree of normalisation can be decomposed (lossless-join decomposition) into multiple relations that are in the desired normal form ▪ can be done automatically ▪ Normalisation is often done in a stepwise manner ▪ a higher normal form means a more restricted format and less problems with update anomalies ▪ note that only the first normal form (1NF) is mandatory for the relational model and all the other normal forms are optional
  24. Beat Signer - Department of Computer Science - [email protected] 24

    March 6, 2019 First Normal Form (1NF) ▪ As we have seen earlier, the ER model supports complex attributes ▪ composite attributes ▪ multivalued attributes ▪ In the reduction process, we remove this substructure from attributes to create a relational model with atomic attribute values only ▪ A relation schema R is in first normal form (1NF) if the domains D1 ,..., Dn of all attributes a1 ,..., an of R are atomic ▪ no composite attributes or attributes with a set of values ▪ the intersection of each row and column contains one and only one value
  25. Beat Signer - Department of Computer Science - [email protected] 25

    March 6, 2019 Functional Dependencies ▪ In this example, there are various sets of attributes that uniquely identify a set of other attributes ▪ teacherID → teacher ▪ teacherID → salary ▪ teacherID → {teacher, salary} ▪ {teacherID, teacher} → {salary} ▪ department → {building, budget} ▪ ... ▪ We say that there is a functional dependency (→) between these two sets of attributes ▪ a functional dependency should always hold on a relation schema and not just on a particular relation instance TeacherDept (teacherID, teacher, salary, department, building, budget)
  26. Beat Signer - Department of Computer Science - [email protected] 26

    March 6, 2019 Functional Dependencies ... ▪ A functional dependency can be used to express constraints (generalisation of keys) over a set of attributes (determinant) that uniquely identify a set of other attributes (dependent attributes) ▪ For a relation schema R with a  R and b  R the functional dependency a → b holds on R, if for any r(R) ▪ " t1 ,t2  r(R) with t1 [a] = t2 [a] → t1 [b] = t2 [b] ▪ Note that any K  R is a superkey if K → R ▪ we can use functional dependencies to check whether K is a superkey
  27. Beat Signer - Department of Computer Science - [email protected] 27

    March 6, 2019 Functional Dependencies ... ▪ The relation r(R) contains the follow- ing set F of functional dependencies ▪ A → B ▪ C → E ▪ ... ▪ A functional dependency a → b is trivial if b  a ▪ trivial dependencies are satisfied by all relations ▪ A full functional dependency has a minimal determinant ▪ if the determinant is not minimal, we talk about a partial functional dependency (e.g. AD → B in the example) ▪ For a relation r(R) with a → b and b →  we say that  is transitively dependent on a via b A B C D E a1 b1 c1 d1 e1 a2 b2 c2 d1 e2 a2 b2 c3 d1 e3 a3 b2 c4 d3 e3 r(R)
  28. Beat Signer - Department of Computer Science - [email protected] 28

    March 6, 2019 Closure of Attributes ▪ For a given relation schema R, a number of functional dependencies and a set of attributes a  R, the closure a+ is defined by all attributes Bi such that a → Bi ▪ Computing the closure ▪ If the closure a+ contains all attributes of the relation schema R, then the attributes a form a superkey of R Initialise the set s with the attributes of a Repeat until the set s does not grow anymore { if there is a functional dependency b →  and b is in s, then add  to the set s }
  29. Beat Signer - Department of Computer Science - [email protected] 29

    March 6, 2019 Computation of Superkeys ▪ We can test whether a is a superkey for a given relation schema R by checking whether the closure a+ contains all attributes of R ▪ We can further use this approach to find all the superkeys for a relation schema R and a given set of functional dependencies ▪ check for each set a  R of attributes whether the closure a+ contains all attributes ▪ the search process can be slightly optimised by starting with the smallest possible subsets
  30. Beat Signer - Department of Computer Science - [email protected] 30

    March 6, 2019 Functional Dependency Inference ▪ For a given set F of functional dependencies we can derive new functional dependencies based on a set of axioms to compute the closure F+ of F ▪ the closure F+ includes all functional dependencies that are logically implied by F ▪ Three rules (Armstrong's axioms) can be used to compute F+ ▪ reflexivity - for a given set of attributes a and b  a, a → b holds (see trivial dependency) ▪ augmentation - for given a set of attributes ; if a → b then a → b holds ▪ transitivity - if a → b and b → , then a →  holds
  31. Beat Signer - Department of Computer Science - [email protected] 31

    March 6, 2019 Functional Dependency Inference ... ▪ Armstrong's axioms are sound (produce only elements of F+) and complete (produce all elements in F+) ▪ since it may take a lot of time to compute F+ with Armstrong's axioms only, there exist some additional rules ▪ Decomposition ▪ if a → b, then a → b and a →  hold ▪ Union ▪ if a → b and a → , then a → b holds ▪ Trivial dependency rules ▪ if a → b, then a → a  b holds ▪ if a → b, then a → a  b holds
  32. Beat Signer - Department of Computer Science - [email protected] 32

    March 6, 2019 Second Normal Form (2NF) ▪ A relation schema R is in second normal form (2NF) if it is in 1NF and if there exists no non-prime attribute that is functionally dependent on a part of a candidate key ▪ every non-prime attribute has to be fully functionally dependent on a candidate key ▪ a non-prime attribute is an attribute that is not part of any candidate key ▪ the Lecturer relation schema shown in the example is not in 2NF since the office attribute functionally depends on the teacher attribute teacher course office Beat Signer Databases 10G731d Beat Signer WIS 10G731d Lode Hoste Databases 10F716 Lode Hoste ATIS 10F716 Sandra Trullemans WIS 10G731e Lecturer (teacher, course, office) lecturer = (Lecturer)
  33. Beat Signer - Department of Computer Science - [email protected] 33

    March 6, 2019 Second Normal Form (2NF) ... ▪ 2NF normalisation process ▪ remove any partially dependent attributes from the relation and put them in a new relation together with their determinant ▪ The original Lecturer relation can be losslessly decomposed into two relations which are both in 2NF ▪ relations with single attribute keys are automatically in 2NF teacher office Beat Signer 10G731d Lode Hoste 10F716 Sandra Trullemans 10G731e Lecturer (teacher, office) Course (teacher, course) teacher course Beat Signer Databases Beat Signer WIS Lode Hoste Databases Lode Hoste ATIS Sandra Trullemans WIS lecturer = (Lecturer) course = (Course)
  34. Beat Signer - Department of Computer Science - [email protected] 34

    March 6, 2019 Lossless Decomposition ▪ Given a relation schema R and the two decompositions R1 and R2 of R, we say that R1 and R2 form a lossless decomposition if pR1 (r) ⋈ pR2 (r) = r ▪ Let F be a set of functional dependencies on R ▪ R1 and R2 form a lossless decomposition of R if either R1  R2 → R1 or R1  R2 → R2 are in F+ - this means that R1  R2 is a superkey of R1 or R2
  35. Beat Signer - Department of Computer Science - [email protected] 35

    March 6, 2019 Third Normal Form (3NF) ▪ A relation schema R is in third normal form (3NF) if it is in 2NF and no non-prime attribute is transitively de- pendent on a candidate key, i.e. for all functional dependencies a → b in F+ one of the following has to hold ▪ a → b is a trivial functional dependency (i.e. b  a) ▪ a is a superkey of R ▪ each attribute Ai in b - a is contained in a candidate key of R - note that each Ai can be in different candidate keys ▪ Each non-key attribute "must provide a fact about the key, the whole key, and nothing but the key" [Bill Kent]
  36. Beat Signer - Department of Computer Science - [email protected] 36

    March 6, 2019 Third Normal Form (3NF) ... ▪ The Prize relation example schema is in 2NF ▪ The Prize relation schema is not in 3NF since birthdate is functionally dependent on winner and none of the three conditions holds for this functional dependency ▪ birthdate is transitively dependent on the key (award, year) award year winner birthdate ACM Turing Award 1981 Edgar F. Codd 23.08.1923 Nobel Peace Prize 1979 Mother Teresa 26.08.1910 ACM Turing Award 1984 Niklaus Wirth 15.02.1934 Nobel Peace Prize 1984 Desmond Tutu 07.10.1931 prize = (Prize) Prize (award, year, winner, birthdate)
  37. Beat Signer - Department of Computer Science - [email protected] 37

    March 6, 2019 Third Normal Form (3NF) ... ▪ 3NF normalisation process ▪ remove any transitively dependent attributes from the relation and place them in a new relation together with their determinant ▪ Decomposition of the Prize relation schema into two 3NF relation schemas winner birthdate Edgar F. Codd 23.08.1923 Mother Teresa 09.01.1959 Niklaus Wirth 15.02.1934 Desmond Tutu 07.10.1931 prize = (Prize) Prize (award, year, winner) Birthdate (winner, birthdate) award year winner ACM Turing Award 1981 Edgar F. Codd Nobel Peace Prize 1992 Mother Teresa ACM Turing Award 1984 Niklaus Wirth Nobel Peace Prize 1984 Desmond Tutu bdate = (Birthdate)
  38. Beat Signer - Department of Computer Science - [email protected] 38

    March 6, 2019 Boyce-Codd Normal Form (BCNF) ▪ The Boyce-Codd normal form is a stronger form of 3NF ▪ A relation schema R is in Boyce-Codd Normal Form (BCNF) if it is in 3NF and if every determinant is a candidate key, i.e. for all functional dependencies a → b in F+ one of the following holds ▪ a → b is a trivial functional dependency (i.e. b  a) ▪ a is a superkey of R ▪ Any relation that is in BCNF is also in 3NF since the BCNF conditions are equivalent to the first two 3NF conditions
  39. Beat Signer - Department of Computer Science - [email protected] 39

    March 6, 2019 BCNF Decomposition ▪ If a relation R is not in BCNF, then there exists a least one nontrivial functional dependency a → b where a is not a superkey of R ▪ the relation R can then be decomposed into the two relation schemas R1 (a  b) and R2 (R - (b - a)) ▪ We can for example apply the BCNF decomposition to the previous Prize relation schema example with the functional dependency winner → birthdate ▪ a  b = (winner, birthdate) ▪ (R - (b - a)) = (award, year, winner) ▪ Further details about the algorithms for BCNF and 3NF decomposition can be found in the course book
  40. Beat Signer - Department of Computer Science - [email protected] 40

    March 6, 2019 Multivalued Dependencies ▪ Some relation schemas that are in BCNF may still contain redundant information ▪ The fourth normal form (4NF) deals with some of these problems based on multivalued dependencies ▪ for a given relation schema R with a  R and b  R the multivalued dependency a ↠ b holds if for all pairs of tuples t1 and t2 in r(R) (with t1 [a] = t2 [a]) there exist tuples t3 and t4 in r(R) such that - t1 [a] = t2 [a] = t3 [a] = t4 [a] - t3 [b] = t1 [b] - t3 [R - b] = t2 [R - b] - t4 [b] = t2 [b] - t4 [R - b] = t1 [R - b] a b R - a - b t1 a1 ...ai ai+1 ...aj aj+1 ...an t2 a1 ...ai bi+1 ...bj bj+1 ...bn t3 a1 ...ai ai+1 ...aj bj+1 ...bn t4 a1 ...ai bi+1 ...bj aj+1 ...an
  41. Beat Signer - Department of Computer Science - [email protected] 41

    March 6, 2019 Multivalued Dependencies ... ▪ Every functional dependency is also a multivalued dependency, e.g. if a → b then a ↠ b
  42. Beat Signer - Department of Computer Science - [email protected] 42

    March 6, 2019 Fourth Normal Form (4NF) ▪ A relation schema R is in fourth normal form (4NF) if it is in BCNF and if any non-trivial multivalued depen- dency is a dependency on a candidate key, i.e. for all multivalued dependencies a ↠ b in D+ one of the following has to hold ▪ a ↠ b is a trivial functional dependency (i.e. b  a or b  a = R) ▪ a is a superkey of R ▪ Note that the fourth normal form is very similar to BCNF except that we use multivalued dependencies ▪ 4NF normalisation process ▪ remove any multivalued attributes from the relation and place them in a new relation together with their determinant
  43. Beat Signer - Department of Computer Science - [email protected] 43

    March 6, 2019 Fifth Normal Form (5NF) ▪ There are some forms of constraints called join dependencies that generalise multivalued dependencies ▪ leads to the project-join normal form or fifth normal form (5NF) ▪ not discussed in detail in this course
  44. Beat Signer - Department of Computer Science - [email protected] 44

    March 6, 2019 Normalisation Summary ▪ Relations in higher normal forms are less vulnerable to update anomalies ▪ generally it is recommended that relations are at least in 3NF Fifth Normal Form (5NF) Fourth Normal Form (4NF) Boyce-Codd Normal Form (BCNF) Third Normal Form (3NF) Second Normal Form (2NF) First Normal Form (1NF) stronger Unnormalised (UN) remove repeating groups remove partial dependencies remove transitive dependencies every determinant has to be a candidate key remove multivalued dependencies remove join dependencies
  45. Beat Signer - Department of Computer Science - [email protected] 45

    March 6, 2019 Denormalisation ▪ Sometimes a database designer decides to store information in a redundant way to save join operations and improve the performance ▪ may result in additional work for insert, update and delete operations ▪ An alternative is to keep the normalised schema and introduce additional materialised views
  46. Beat Signer - Department of Computer Science - [email protected] 46

    March 6, 2019 Homework ▪ Study the following chapter of the Database System Concepts book ▪ chapter 7 - sections 7.6 and 7.8.6 - Reduction to Relation Schemas ▪ chapter 8 - sections 8.1-8.9 - Relational Database Design
  47. Beat Signer - Department of Computer Science - [email protected] 47

    March 6, 2019 Exercise 4 ▪ Relational algebra ▪ Relational database design ▪ ER to relational model reduction
  48. Beat Signer - Department of Computer Science - [email protected] 48

    March 6, 2019 References ▪ A. Silberschatz, H. Korth and S. Sudarshan, Database System Concepts (Sixth Edition), McGraw-Hill, 2010