Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Relational Model and Relational Algebra - Lecture 3 - Introduction to Databases (1007156ANR)

Relational Model and Relational Algebra - Lecture 3 - Introduction to Databases (1007156ANR)

This lecture forms part of the course Introduction to Databases given at the Vrije Universiteit Brussel.

Beat Signer

March 01, 2019
Tweet

More Decks by Beat Signer

Other Decks in Education

Transcript

  1. 2 December 2005 Introduction to Databases Relational Model and Relational

    Algebra Prof. Beat Signer Department of Computer Science Vrije Universiteit Brussel beatsigner.com
  2. Beat Signer - Department of Computer Science - [email protected] 2

    February 27, 2019 Relational Model ▪ Theory for data management developed by Edgar F. Codd while working at IBM ▪ Edgar F. Codd, A Relational Model of Data for Large Shared Data Banks, Communications of the ACM 13(6), June 1970 ▪ data independence - between logical and physical level ▪ set-based query language - relational query language ▪ normalisation - avoid redundancy ▪ IBM first did not implement the relational model in order to "protect" their IMS/DB revenues Edgar F. Codd
  3. Beat Signer - Department of Computer Science - [email protected] 3

    February 27, 2019 Relational Model … ▪ IBM's System R (1974) was a DBMS prototype implementing Codd's relational model ▪ first implementation of the Structured English Query Language (SEQUEL) - later renamed to Structured Query Language (SQL) ▪ The System R prototype finally led to the development of different commercial DBMSs including IBM's DB2, Oracle or Microsoft SQL Server
  4. Beat Signer - Department of Computer Science - [email protected] 4

    February 27, 2019 Relational Database ▪ A relational database consists of a number of tables ▪ each table row defines a relationship between a set of values ▪ There is an analogy between the concept of a table (collection of relationships) and the mathematical concept of a relation ▪ in the following we therefore talk about relations instead of tables ▪ → a relational database consists of a collection of relations customerID name street postcode city 1 Max Frisch Bahnhofstrasse 7 8001 Zurich 2 Eddy Merckx Pleinlaan 25 1050 Brussels ... ... ... ... ...
  5. Beat Signer - Department of Computer Science - [email protected] 5

    February 27, 2019 Relational Database ... ▪ Information is normally partitioned into different relations since the storage in a single relation would lead to ▪ a replication of information (redundancy) ▪ a large number of necessary null values ▪ While tables are used at the logical level, different storage structures can be used at the physical level ▪ data independence
  6. Beat Signer - Department of Computer Science - [email protected] 6

    February 27, 2019 Relation ▪ The column headers of a table are called attributes and for each attribute ai there is a set of permitted values called the domain Di of ai ▪ Given the domains D1 , D2 ,..., Dn , a relation r is defined as a subset of the cartesian product D1 D2 ...Dn name street ... city Max Frisch Bahnhofstrasse 7 ... Zurich ... ... ... ... D1 D2 Dn t1 tm a1 a2 an attributes tuples relation r degree cardinality
  7. Beat Signer - Department of Computer Science - [email protected] 7

    February 27, 2019 Relation ... ▪ A relation r is a set of n-ary tuples ti = (a1 , a2 ,..., an ) where each ai Di ▪ the number of attributes is called a relation's degree ▪ the number of tuples is called a relation's cardinality ▪ Since a relation is a set of tuples, the order of the tuples is irrelevant ▪ each tuple is distinctive (no duplicate tuples) ▪ The order of attributes is irrelevant ▪ The domain of each attribute has to be atomic ▪ all members of the domain have to be indivisible units ▪ a relation with only atomic values is normalised (first normal form)
  8. Beat Signer - Department of Computer Science - [email protected] 8

    February 27, 2019 Relation ... ▪ Multiple attributes can have the same domain Di ▪ we will see later how the domain types can be defined (in SQL) ▪ A tuple variable is a variable that stands for a tuple ▪ its domain is defined by the set of all tuples ▪ The special null value is part of any domain Di ▪ used to represent an unknown or non-existing value ▪ null values are not easy to handle (not the same as empty string) - avoid null values whenever possible
  9. Beat Signer - Department of Computer Science - [email protected] 9

    February 27, 2019 Example of a Relation ▪ Let us assume that we have the following attributes with their domains ▪ name = {Merckx, Frisch, Botta, ...} ▪ street = {Bahnhofstrasse, Pleinlaan, Via Nassa, ...} ▪ city = {Zurich, Brussels, Lugano, Paris, ...} ▪ Then ▪ r = {(Merckx, Pleinlaan, Brussels), (Frisch, Bahnhofstrasse, Zurich), (Botta, Via Nassa, Lugano), (Botta, Bahnhofstrasse, Lugano)} forms a relation over namestreetcity
  10. Beat Signer - Department of Computer Science - [email protected] 10

    February 27, 2019 Relational Database Example customerID name street postcode city 1 Max Frisch Bahnhofstrasse 7 8001 Zurich 2 Eddy Merckx Pleinlaan 25 1050 Brussels 5 Claude Debussy 12 Rue Louise 75008 Paris 53 Albert Einstein Bergstrasse 18 8037 Zurich 8 Max Frisch ETH Zentrum 8092 Zurich cdID name duration price year 1 Falling into Place 2007 17.90 2007 2 Carcassonne 3156 15.50 1993 customer cd
  11. Beat Signer - Department of Computer Science - [email protected] 11

    February 27, 2019 Relational Database Example ... orderID customerID cdID date amount status 1 53 2 13.02.2010 2 open 2 2 1 15.02.2010 1 delivered order supplierID name city 5 Max Frisch Zurich 2 Mario Botta Lugano supplier Customer (customerID, name, street, postcode, city) CD (cdID, name, duration, price, year) Order (orderId, customerID, cdID, date, amount, status) Supplier (supplierID, name, city) relational database schema
  12. Beat Signer - Department of Computer Science - [email protected] 12

    February 27, 2019 Database Schema ▪ The logical design of a database is defined by the database schema ▪ A database instance is a snapshot of the data stored in a database at a given time ▪ A relation schema R = (A1 , A2 ,..., An ) is defined by a list of attributes ▪ by convention, the name of a relation schema starts with an uppercase letter (in contrast to relations) - e.g. Customer (customerID, name, street, postcode, city) ▪ a relation can be defined based on the relation schema - e.g. customer = (Customer) ▪ a relation instance contains the relation's actual values (tuples)
  13. Beat Signer - Department of Computer Science - [email protected] 13

    February 27, 2019 Keys ▪ For the keys we use the same terminology as introduced earlier for the ER model ▪ K  R is a superkey of R if the values of K uniquely identify any tuple of possible relation instances r(R) ▪ t1 and t2  r and t1  t2 → t1 [K]  t2 [K] ▪ e.g. {cdID} and {cdID, name} are both superkeys of CD ▪ K is a candidate key if K is minimal ▪ e.g. {cdID} is a candidate key of CD ▪ The DB designer has to choose one of the candidate keys for each relation schema as primary key ▪ if possible, the value of a primary key should not change or only in very rare cases
  14. Beat Signer - Department of Computer Science - [email protected] 14

    February 27, 2019 Foreign Keys ▪ A relation schema may have one or multiple attributes that correspond to the primary key of another relation schema and are called foreign keys ▪ e.g. customerID is a foreign key of the order relation schema ▪ note that a foreign key of the referencing relation can only contain values that occur in the primary key of the referenced relation or it must be null → referential integrity customerID name street postcode city cdID name duration price year orderID customerID cdID date amount status customer cd order schema diagram
  15. Beat Signer - Department of Computer Science - [email protected] 15

    February 27, 2019 ▪ A query language is a language that is used to access (read) information stored in a database as well as to create, update and delete information (CRUD) ▪ There are different types of query languages ▪ procedural - e.g. relational algebra ▪ declarative - e.g. Structured Query Language (SQL) ▪ The relational algebra as well as the tuple and domain relational calculus form the basis for "higher level" query languages (e.g. SQL) Query Language
  16. Beat Signer - Department of Computer Science - [email protected] 16

    February 27, 2019 Relational Algebra ▪ The relational algebra consists of six fundamental operations ▪ unary operations ▪ selection: s ▪ projection: p ▪ rename: r ▪ binary operations ▪ union:  ▪ set difference: - ▪ cartesian product:  ▪ An operator takes one (unary) or two (binary) relations as input and returns a new single relation as output
  17. Beat Signer - Department of Computer Science - [email protected] 17

    February 27, 2019 Relational Algebra ... ▪ Based on the six fundamental operations, we can define additional relational operations (add no additional power) ▪ set intersection:  ▪ natural join: ⋈ ▪ theta join: q ▪ equijoin: q ▪ semijoin: ⋉ ▪ division:  ▪ assignment:  ▪ Extended operations with additional expressiveness ▪ generalised projection ▪ aggregate function ▪ outer join
  18. Beat Signer - Department of Computer Science - [email protected] 18

    February 27, 2019 Selection (s) ▪ Goal: Select specific tuples (rows) from a relation (table) ▪ sp (r) = {t | t  r and p(t)} ▪ p is a selection predicate - consists of terms pi connected by an and (), or () or not () ▪ a term pi has the form attributem operator attributen or constant - the available operators are: =, , >, <, ,  ▪ Examples ▪ "Find all tuples in the customer relation that are in the city of Zurich and have a postcode greater than 8010." ▪ scity = "Zurich"  postcode > 8010 (customer) customerID name street postcode city 53 Albert Einstein Bergstrasse 18 8037 Zurich 8 Max Frisch ETH Zentrum 8092 Zurich
  19. Beat Signer - Department of Computer Science - [email protected] 19

    February 27, 2019 Selection (s) ... ▪ Examples ... ▪ A selection predicate may contain a comparison between two attributes ▪ "Find all tuples in the cd relation with a value for the duration attribute that is equal to the year of release." ▪ sduration = year (cd) ▪ Attention: The selection operation in relational algebra has a different meaning than the SELECT statement used in SQL ▪ SELECT in SQL corresponds to a projection in relational algebra cdID name duration price year 1 Falling into Place 2007 17.90 2007
  20. Beat Signer - Department of Computer Science - [email protected] 20

    February 27, 2019 Projection (p) ▪ Goal: Select specific attributes (columns) from a relation ▪ pA1, A2,..., Am (r) ▪ returns a relation instance that only contains the columns for which an attribute Ai has been listed ▪ duplicate tuples are removed from the resulting relation (set) ▪ Example ▪ "Return the name and city of all tuples in the customer relation." ▪ pname,city (customer) name city Max Frisch Zurich Eddy Merckx Brussels Claude Debussy Paris Albert Einstein Zurich
  21. Beat Signer - Department of Computer Science - [email protected] 21

    February 27, 2019 Composition of Relational Operations ▪ Since the result of a relational operation is a new single relation instance, multiple operations can be combined ▪ Example ▪ "Find the names of all customers who live in Zurich." ▪ pname (scity = "Zurich" (customer)) name Max Frisch Albert Einstein
  22. Beat Signer - Department of Computer Science - [email protected] 22

    February 27, 2019 Union () ▪ Goal: Unify tuples from two relations ▪ r  s = {t | t  r or t  s} ▪ r and s must have the same degree (same number of attributes) ▪ the corresponding attribute domains must have a compatible type ▪ Example ▪ "Find the names of persons who are either customers or suppliers." ▪ pname (customer)  pname (supplier) name Max Frisch Eddy Merckx Claude Debussy Albert Einstein Mario Botta
  23. Beat Signer - Department of Computer Science - [email protected] 23

    February 27, 2019 Set Difference (-) ▪ Goal: Find tuples that are in one relation but not in another relation ▪ r - s = {t | t  r and t  s} ▪ r and s must have the same degree ▪ the corresponding attribute domains must have a compatible type ▪ finds tuples that are in a relation r but not in another relation s ▪ Example ▪ "Find the names of suppliers who are no customers." ▪ pname (supplier) - pname (customer) name Mario Botta
  24. Beat Signer - Department of Computer Science - [email protected] 24

    February 27, 2019 Cartesian Product () ▪ Goal: Combine information from any two relations ▪ r  s = {tu | t  r and u  s} ▪ the attribute names of r(R) and s(S) have to be distinct - attributes with the same name have to be renamed via the rename operator r ▪ Example ▪ pname (customer)  pcity (customer) name city Max Frisch Zurich Max Frisch Brussels Max Frisch Paris Eddy Merckx Zurich Eddy Merckx Brussels ... ...
  25. Beat Signer - Department of Computer Science - [email protected] 25

    February 27, 2019 Cartesian Product () ... ▪ Example ... ▪ "List the names of all customers with at least one order." ▪ pname ( scustomer.customerID = order.customerID (customer  order) ) ▪ Note that we will later see another operator (natural join) for the combination of two tables based on common attributes name Eddy Merckx Albert Einstein
  26. Beat Signer - Department of Computer Science - [email protected] 26

    February 27, 2019 Rename (r) ▪ Goal: Rename a relation or an attribute ▪ rx (E) ▪ renames the result of expression E to x ▪ rx(A1,A2,..., An) (E) ▪ renames the result of expression E to x and renames the attributes to A1 ,A2 ,...,An ▪ Example ▪ rperson(name,location) (pname,city (customer)) name location Max Frisch Zurich ... ... person
  27. Beat Signer - Department of Computer Science - [email protected] 27

    February 27, 2019 Rename (r) ... ▪ Example ▪ "Find the price of the most expensive CD in the cd relation." ▪ Pprice (cd) - Pcd.price (scd.price < d.price (cd  rd (cd))) ▪ We will later see another operator (aggregate function) that can be used to simplify this type of query for a max value price 17.90
  28. Beat Signer - Department of Computer Science - [email protected] 28

    February 27, 2019 Formal Definition ▪ A basic relational algebra expression is either a relation from the database or a constant relation ▪ The set of all relational algebra expressions is defined by ▪ E1  E2 ▪ E1 - E2 ▪ E1  E2 ▪ sp (E1 ) ▪ pA1,A2,..., Am (E1 ) ▪ rx(A1,A2,..., An) (E1 )
  29. Beat Signer - Department of Computer Science - [email protected] 29

    February 27, 2019 Set Intersection ▪ r  s = {t | t  r and t  s} ▪ r and s must have the same degree ▪ the corresponding attribute domains must have a compatible type ▪ Note that the set intersection can be implemented via a pair of set difference operations ▪ r  s = r - (r - s) ▪ Example ▪ "Find the names of people who are customers and suppliers." ▪ pname (customer)  pname (supplier) name Max Frisch
  30. Beat Signer - Department of Computer Science - [email protected] 30

    February 27, 2019 Natural Join (⋈) ▪ r ⋈ s = pR  S (sr.A1= s.A1  r.A2= s.A2 ... r.An= s.An (r  s)) ▪ where R  S = {A1 , A2 ,..., An } ▪ Keep all tuples of the cartesian product r  s that have the same value for the shared attributes of r(R) and s(S) ▪ the natural join is an associative operation: (r ⋈ s) ⋈ t = r ⋈ (s ⋈ t) ▪ Example ▪ "List the name and street of customers whose order is still open." ▪ pname, street (sstatus="open" (order ⋈ customer)) ▪ note that we could do the selection first (more efficient) - we will review this later when discussing query optimisation name street Albert Einstein Bergstrasse 18
  31. Beat Signer - Department of Computer Science - [email protected] 31

    February 27, 2019 Theta Join (q) and Equijoin ▪ r ⋈ p s = sp (r  s) ▪ where the predicate p is of the form r.Ai q s.Aj and q is a comparison operator (=, , >, <, , ) ▪ Example ▪ customer ⋈ postcode  duration (rcd(cdID,cdName,duration,price,year) (cd)) ▪ An equijoin (r ⋈ p s) is a special form of a theta join where only the equality operator (=) may be used ▪ note that a natural join is a equijoin over all common attributes of a relation r and s customerID name ... cdID cdName ... 2 Eddy Merckx ... 1 Falling into Place ... 2 Eddy Merckx ... 2 Carcassonne ...
  32. Beat Signer - Department of Computer Science - [email protected] 32

    February 27, 2019 Semijoin (⋉) ▪ r ⋉ s = pA1, A2,..., An (r ⋈ s) ▪ where A1 , A2 ,..., An are all the attributes of r ▪ Example ▪ "List the customers whose order is still open." ▪ customer ⋉ sstatus="open" (order) customerID Name Street Postcode city 53 Albert Einstein Bergstrasse 18 8037 Zurich
  33. Beat Signer - Department of Computer Science - [email protected] 33

    February 27, 2019 Division () ▪ r  s = {t | t  pR-S (r) and " u  s → tu  r} ▪ for the relations r(R) and s(S) and S  R ▪ suited for queries that include the phrase "for all" ▪ Example ▪ r  s A B C D a 3 a 1 a 1 a 3 a 3 b 3 b 1 a 1 c 4 b 3 b 2 b 2 c 4 a 1 r C D b 3 a 1 A B a 3 c 4 s r  s
  34. Beat Signer - Department of Computer Science - [email protected] 34

    February 27, 2019 Assigment ▪ variable  E ▪ Works like an assignment in programming languages ▪ Assignments must always be made to temporary relation variables ▪ no database modification ▪ Example ▪ temp1  pname, street (sstatus="open" (order ⋈ customer)) ▪ temp2  pname,street (scity = "Brussels" (customer)) ▪ result  temp1  temp2 name street Eddy Merckx Pleinlaan 25 Albert Einstein Bergstrasse 18
  35. Beat Signer - Department of Computer Science - [email protected] 35

    February 27, 2019 Generalised Projection (p) ▪ pF1, F2,..., Fm (E) ▪ generalisation of the project operation that supports the projection to arithmetic expressions F1 , F2 ,..., Fm ▪ Example ▪ "Show a list of the names of all CDs together with a reduced price (80% of the original price)." ▪ pname, price  0.8 as reducedPrice (cd) name reducedPrice Falling into Place 14.32 Carcassonne 12.72
  36. Beat Signer - Department of Computer Science - [email protected] 36

    February 27, 2019 Aggregate Function ▪ G1, G2,..., Gm G F1(A1), F2(A2),..., Fn(An) (E) ▪ takes a collection of values as input and returns a single value based on the following operations - min: minimum value - max: maximum value - sum: sum of values - count: number of value - avg: average value ▪ G 1 , G 2 ,..., G m lists the attributes on which to group - can be empty ▪ F i are the aggregate functions ▪ by default, duplicates are not eliminated before aggregating - can use the keyword distict to eliminate duplicates (e.g. sum-distinct)
  37. Beat Signer - Department of Computer Science - [email protected] 37

    February 27, 2019 Aggregate Function ... ▪ Examples ▪ "List the average amount of items in an order." ▪ G avg(amount) (order) ▪ "List the number of customers for each city." ▪ city G count(customerID) as customerNo (customer) avg(amount) 1.5 city customerNo Zurich 3 Brussels 1 Paris 1
  38. Beat Signer - Department of Computer Science - [email protected] 38

    February 27, 2019 Outer Joins ▪ The left outer join (⟕), right outer join (⟖) and full outer join (⟗) are extensions of the natural join operation ▪ Computes the natural join and then adds the tuples from one relation that do not match the other relation ▪ filled up with null values ▪ Example ▪ customer ⟕ order customerID name ... orderID cdID ... 53 Albert Einstein ... 1 2 ... 2 Eddy Merckx ... 2 1 ... 5 Claude Debussy ... null null ... ... ... ... ... ... ...
  39. Beat Signer - Department of Computer Science - [email protected] 39

    February 27, 2019 Outer Joins ... ▪ left outer join (⟕) ▪ keeps all tuples of the left relation ▪ non-matching parts filled up with null values ▪ right outer join (⟖) ▪ keeps all tuples of the right relation ▪ non-matching parts filled up with null values ▪ full outer join (⟗) ▪ keeps all tuples of both relations ▪ non-matching parts filled up with null values
  40. Beat Signer - Department of Computer Science - [email protected] 40

    February 27, 2019 Null Values ▪ A null value means that the value is unknown or non-existent ▪ Primary key attributes can never be null (entity integrity) ▪ The result of an arithmetic operation that involves a null value is null ▪ For grouping and duplicate elimination null values are treated like other values (two null values are the same) ▪ Null values are ignored in aggregate functions
  41. Beat Signer - Department of Computer Science - [email protected] 41

    February 27, 2019 Database Modifications ▪ A database may be modified by using one of the following operations ▪ insert ▪ update ▪ delete ▪ These operations are defined via the assignment operator
  42. Beat Signer - Department of Computer Science - [email protected] 42

    February 27, 2019 Insert ▪ r  r  E ▪ To insert new data into a relation we can ▪ explicitly specify the tuples to be inserted ▪ write a query and insert the result tuple set ▪ Example ▪ cd  cd  {(3, "Chromatic", 3012, 16.50, 1994)} cdID name duration price year 1 Falling into Place 2007 17.90 2007 2 Carcassonne 3156 15.50 1993 3 Chromatic 3012 16.50 1994
  43. Beat Signer - Department of Computer Science - [email protected] 43

    February 27, 2019 Update ▪ r  pF1, F2,..., Fn (r) ▪ Update specific values in a tuple r ▪ Fi is either the old value of r or a new value if the attribute has to be updated ▪ Example ▪ "Increase the price of all CDs by 10%." ▪ cd  pcdID, name, duration, price  1.1, year (cd) cdID name duration price year 1 Falling into Place 2007 19.69 2007 2 Carcassonne 3156 17.05 1993 3 Chromatic 3012 18.15 1994
  44. Beat Signer - Department of Computer Science - [email protected] 44

    February 27, 2019 Delete ▪ r  r - E ▪ A delete is expressed similar to a query except that the result is removed from the database ▪ Cannot remove single attributes ▪ Example ▪ "Remove the CD with the name 'Carcassonne' from the database." ▪ cd  cd - sname = "Carcassonne" (cd) cdID name duration price year 1 Falling into Place 2007 19.69 2007 3 Chromatic 3012 18.15 1994
  45. Beat Signer - Department of Computer Science - [email protected] 45

    February 27, 2019 Homework ▪ Study the following chapters of the Database System Concepts book ▪ chapter 2 - Relational Model ▪ chapter 6 - section 6.1 - Relational Algebra
  46. Beat Signer - Department of Computer Science - [email protected] 46

    February 27, 2019 Exercise 3 ▪ Relational model ▪ Relational algebra
  47. Beat Signer - Department of Computer Science - [email protected] 47

    February 27, 2019 References ▪ A. Silberschatz, H. Korth and S. Sudarshan, Database System Concepts (Sixth Edition), McGraw-Hill, 2010 ▪ E. F. Codd, A Relational Model of Data for Large Shared Data Banks, Communications of the ACM 13(6), June 1970 ▪ https://doi.org/10.1145/362384.362685