February 27, 2019 Relational Model ▪ Theory for data management developed by Edgar F. Codd while working at IBM ▪ Edgar F. Codd, A Relational Model of Data for Large Shared Data Banks, Communications of the ACM 13(6), June 1970 ▪ data independence - between logical and physical level ▪ set-based query language - relational query language ▪ normalisation - avoid redundancy ▪ IBM first did not implement the relational model in order to "protect" their IMS/DB revenues Edgar F. Codd
February 27, 2019 Relational Model … ▪ IBM's System R (1974) was a DBMS prototype implementing Codd's relational model ▪ first implementation of the Structured English Query Language (SEQUEL) - later renamed to Structured Query Language (SQL) ▪ The System R prototype finally led to the development of different commercial DBMSs including IBM's DB2, Oracle or Microsoft SQL Server
February 27, 2019 Relational Database ▪ A relational database consists of a number of tables ▪ each table row defines a relationship between a set of values ▪ There is an analogy between the concept of a table (collection of relationships) and the mathematical concept of a relation ▪ in the following we therefore talk about relations instead of tables ▪ → a relational database consists of a collection of relations customerID name street postcode city 1 Max Frisch Bahnhofstrasse 7 8001 Zurich 2 Eddy Merckx Pleinlaan 25 1050 Brussels ... ... ... ... ...
February 27, 2019 Relational Database ... ▪ Information is normally partitioned into different relations since the storage in a single relation would lead to ▪ a replication of information (redundancy) ▪ a large number of necessary null values ▪ While tables are used at the logical level, different storage structures can be used at the physical level ▪ data independence
February 27, 2019 Relation ▪ The column headers of a table are called attributes and for each attribute ai there is a set of permitted values called the domain Di of ai ▪ Given the domains D1 , D2 ,..., Dn , a relation r is defined as a subset of the cartesian product D1 D2 ...Dn name street ... city Max Frisch Bahnhofstrasse 7 ... Zurich ... ... ... ... D1 D2 Dn t1 tm a1 a2 an attributes tuples relation r degree cardinality
February 27, 2019 Relation ... ▪ A relation r is a set of n-ary tuples ti = (a1 , a2 ,..., an ) where each ai Di ▪ the number of attributes is called a relation's degree ▪ the number of tuples is called a relation's cardinality ▪ Since a relation is a set of tuples, the order of the tuples is irrelevant ▪ each tuple is distinctive (no duplicate tuples) ▪ The order of attributes is irrelevant ▪ The domain of each attribute has to be atomic ▪ all members of the domain have to be indivisible units ▪ a relation with only atomic values is normalised (first normal form)
February 27, 2019 Relation ... ▪ Multiple attributes can have the same domain Di ▪ we will see later how the domain types can be defined (in SQL) ▪ A tuple variable is a variable that stands for a tuple ▪ its domain is defined by the set of all tuples ▪ The special null value is part of any domain Di ▪ used to represent an unknown or non-existing value ▪ null values are not easy to handle (not the same as empty string) - avoid null values whenever possible
February 27, 2019 Example of a Relation ▪ Let us assume that we have the following attributes with their domains ▪ name = {Merckx, Frisch, Botta, ...} ▪ street = {Bahnhofstrasse, Pleinlaan, Via Nassa, ...} ▪ city = {Zurich, Brussels, Lugano, Paris, ...} ▪ Then ▪ r = {(Merckx, Pleinlaan, Brussels), (Frisch, Bahnhofstrasse, Zurich), (Botta, Via Nassa, Lugano), (Botta, Bahnhofstrasse, Lugano)} forms a relation over namestreetcity
February 27, 2019 Relational Database Example customerID name street postcode city 1 Max Frisch Bahnhofstrasse 7 8001 Zurich 2 Eddy Merckx Pleinlaan 25 1050 Brussels 5 Claude Debussy 12 Rue Louise 75008 Paris 53 Albert Einstein Bergstrasse 18 8037 Zurich 8 Max Frisch ETH Zentrum 8092 Zurich cdID name duration price year 1 Falling into Place 2007 17.90 2007 2 Carcassonne 3156 15.50 1993 customer cd
February 27, 2019 Database Schema ▪ The logical design of a database is defined by the database schema ▪ A database instance is a snapshot of the data stored in a database at a given time ▪ A relation schema R = (A1 , A2 ,..., An ) is defined by a list of attributes ▪ by convention, the name of a relation schema starts with an uppercase letter (in contrast to relations) - e.g. Customer (customerID, name, street, postcode, city) ▪ a relation can be defined based on the relation schema - e.g. customer = (Customer) ▪ a relation instance contains the relation's actual values (tuples)
February 27, 2019 Keys ▪ For the keys we use the same terminology as introduced earlier for the ER model ▪ K R is a superkey of R if the values of K uniquely identify any tuple of possible relation instances r(R) ▪ t1 and t2 r and t1 t2 → t1 [K] t2 [K] ▪ e.g. {cdID} and {cdID, name} are both superkeys of CD ▪ K is a candidate key if K is minimal ▪ e.g. {cdID} is a candidate key of CD ▪ The DB designer has to choose one of the candidate keys for each relation schema as primary key ▪ if possible, the value of a primary key should not change or only in very rare cases
February 27, 2019 Foreign Keys ▪ A relation schema may have one or multiple attributes that correspond to the primary key of another relation schema and are called foreign keys ▪ e.g. customerID is a foreign key of the order relation schema ▪ note that a foreign key of the referencing relation can only contain values that occur in the primary key of the referenced relation or it must be null → referential integrity customerID name street postcode city cdID name duration price year orderID customerID cdID date amount status customer cd order schema diagram
February 27, 2019 ▪ A query language is a language that is used to access (read) information stored in a database as well as to create, update and delete information (CRUD) ▪ There are different types of query languages ▪ procedural - e.g. relational algebra ▪ declarative - e.g. Structured Query Language (SQL) ▪ The relational algebra as well as the tuple and domain relational calculus form the basis for "higher level" query languages (e.g. SQL) Query Language
February 27, 2019 Relational Algebra ▪ The relational algebra consists of six fundamental operations ▪ unary operations ▪ selection: s ▪ projection: p ▪ rename: r ▪ binary operations ▪ union: ▪ set difference: - ▪ cartesian product: ▪ An operator takes one (unary) or two (binary) relations as input and returns a new single relation as output
February 27, 2019 Selection (s) ▪ Goal: Select specific tuples (rows) from a relation (table) ▪ sp (r) = {t | t r and p(t)} ▪ p is a selection predicate - consists of terms pi connected by an and (), or () or not () ▪ a term pi has the form attributem operator attributen or constant - the available operators are: =, , >, <, , ▪ Examples ▪ "Find all tuples in the customer relation that are in the city of Zurich and have a postcode greater than 8010." ▪ scity = "Zurich" postcode > 8010 (customer) customerID name street postcode city 53 Albert Einstein Bergstrasse 18 8037 Zurich 8 Max Frisch ETH Zentrum 8092 Zurich
February 27, 2019 Selection (s) ... ▪ Examples ... ▪ A selection predicate may contain a comparison between two attributes ▪ "Find all tuples in the cd relation with a value for the duration attribute that is equal to the year of release." ▪ sduration = year (cd) ▪ Attention: The selection operation in relational algebra has a different meaning than the SELECT statement used in SQL ▪ SELECT in SQL corresponds to a projection in relational algebra cdID name duration price year 1 Falling into Place 2007 17.90 2007
February 27, 2019 Projection (p) ▪ Goal: Select specific attributes (columns) from a relation ▪ pA1, A2,..., Am (r) ▪ returns a relation instance that only contains the columns for which an attribute Ai has been listed ▪ duplicate tuples are removed from the resulting relation (set) ▪ Example ▪ "Return the name and city of all tuples in the customer relation." ▪ pname,city (customer) name city Max Frisch Zurich Eddy Merckx Brussels Claude Debussy Paris Albert Einstein Zurich
February 27, 2019 Composition of Relational Operations ▪ Since the result of a relational operation is a new single relation instance, multiple operations can be combined ▪ Example ▪ "Find the names of all customers who live in Zurich." ▪ pname (scity = "Zurich" (customer)) name Max Frisch Albert Einstein
February 27, 2019 Union () ▪ Goal: Unify tuples from two relations ▪ r s = {t | t r or t s} ▪ r and s must have the same degree (same number of attributes) ▪ the corresponding attribute domains must have a compatible type ▪ Example ▪ "Find the names of persons who are either customers or suppliers." ▪ pname (customer) pname (supplier) name Max Frisch Eddy Merckx Claude Debussy Albert Einstein Mario Botta
February 27, 2019 Set Difference (-) ▪ Goal: Find tuples that are in one relation but not in another relation ▪ r - s = {t | t r and t s} ▪ r and s must have the same degree ▪ the corresponding attribute domains must have a compatible type ▪ finds tuples that are in a relation r but not in another relation s ▪ Example ▪ "Find the names of suppliers who are no customers." ▪ pname (supplier) - pname (customer) name Mario Botta
February 27, 2019 Cartesian Product () ▪ Goal: Combine information from any two relations ▪ r s = {tu | t r and u s} ▪ the attribute names of r(R) and s(S) have to be distinct - attributes with the same name have to be renamed via the rename operator r ▪ Example ▪ pname (customer) pcity (customer) name city Max Frisch Zurich Max Frisch Brussels Max Frisch Paris Eddy Merckx Zurich Eddy Merckx Brussels ... ...
February 27, 2019 Cartesian Product () ... ▪ Example ... ▪ "List the names of all customers with at least one order." ▪ pname ( scustomer.customerID = order.customerID (customer order) ) ▪ Note that we will later see another operator (natural join) for the combination of two tables based on common attributes name Eddy Merckx Albert Einstein
February 27, 2019 Rename (r) ▪ Goal: Rename a relation or an attribute ▪ rx (E) ▪ renames the result of expression E to x ▪ rx(A1,A2,..., An) (E) ▪ renames the result of expression E to x and renames the attributes to A1 ,A2 ,...,An ▪ Example ▪ rperson(name,location) (pname,city (customer)) name location Max Frisch Zurich ... ... person
February 27, 2019 Rename (r) ... ▪ Example ▪ "Find the price of the most expensive CD in the cd relation." ▪ Pprice (cd) - Pcd.price (scd.price < d.price (cd rd (cd))) ▪ We will later see another operator (aggregate function) that can be used to simplify this type of query for a max value price 17.90
February 27, 2019 Formal Definition ▪ A basic relational algebra expression is either a relation from the database or a constant relation ▪ The set of all relational algebra expressions is defined by ▪ E1 E2 ▪ E1 - E2 ▪ E1 E2 ▪ sp (E1 ) ▪ pA1,A2,..., Am (E1 ) ▪ rx(A1,A2,..., An) (E1 )
February 27, 2019 Set Intersection ▪ r s = {t | t r and t s} ▪ r and s must have the same degree ▪ the corresponding attribute domains must have a compatible type ▪ Note that the set intersection can be implemented via a pair of set difference operations ▪ r s = r - (r - s) ▪ Example ▪ "Find the names of people who are customers and suppliers." ▪ pname (customer) pname (supplier) name Max Frisch
February 27, 2019 Natural Join (⋈) ▪ r ⋈ s = pR S (sr.A1= s.A1 r.A2= s.A2 ... r.An= s.An (r s)) ▪ where R S = {A1 , A2 ,..., An } ▪ Keep all tuples of the cartesian product r s that have the same value for the shared attributes of r(R) and s(S) ▪ the natural join is an associative operation: (r ⋈ s) ⋈ t = r ⋈ (s ⋈ t) ▪ Example ▪ "List the name and street of customers whose order is still open." ▪ pname, street (sstatus="open" (order ⋈ customer)) ▪ note that we could do the selection first (more efficient) - we will review this later when discussing query optimisation name street Albert Einstein Bergstrasse 18
February 27, 2019 Theta Join (q) and Equijoin ▪ r ⋈ p s = sp (r s) ▪ where the predicate p is of the form r.Ai q s.Aj and q is a comparison operator (=, , >, <, , ) ▪ Example ▪ customer ⋈ postcode duration (rcd(cdID,cdName,duration,price,year) (cd)) ▪ An equijoin (r ⋈ p s) is a special form of a theta join where only the equality operator (=) may be used ▪ note that a natural join is a equijoin over all common attributes of a relation r and s customerID name ... cdID cdName ... 2 Eddy Merckx ... 1 Falling into Place ... 2 Eddy Merckx ... 2 Carcassonne ...
February 27, 2019 Semijoin (⋉) ▪ r ⋉ s = pA1, A2,..., An (r ⋈ s) ▪ where A1 , A2 ,..., An are all the attributes of r ▪ Example ▪ "List the customers whose order is still open." ▪ customer ⋉ sstatus="open" (order) customerID Name Street Postcode city 53 Albert Einstein Bergstrasse 18 8037 Zurich
February 27, 2019 Division () ▪ r s = {t | t pR-S (r) and " u s → tu r} ▪ for the relations r(R) and s(S) and S R ▪ suited for queries that include the phrase "for all" ▪ Example ▪ r s A B C D a 3 a 1 a 1 a 3 a 3 b 3 b 1 a 1 c 4 b 3 b 2 b 2 c 4 a 1 r C D b 3 a 1 A B a 3 c 4 s r s
February 27, 2019 Assigment ▪ variable E ▪ Works like an assignment in programming languages ▪ Assignments must always be made to temporary relation variables ▪ no database modification ▪ Example ▪ temp1 pname, street (sstatus="open" (order ⋈ customer)) ▪ temp2 pname,street (scity = "Brussels" (customer)) ▪ result temp1 temp2 name street Eddy Merckx Pleinlaan 25 Albert Einstein Bergstrasse 18
February 27, 2019 Generalised Projection (p) ▪ pF1, F2,..., Fm (E) ▪ generalisation of the project operation that supports the projection to arithmetic expressions F1 , F2 ,..., Fm ▪ Example ▪ "Show a list of the names of all CDs together with a reduced price (80% of the original price)." ▪ pname, price 0.8 as reducedPrice (cd) name reducedPrice Falling into Place 14.32 Carcassonne 12.72
February 27, 2019 Aggregate Function ▪ G1, G2,..., Gm G F1(A1), F2(A2),..., Fn(An) (E) ▪ takes a collection of values as input and returns a single value based on the following operations - min: minimum value - max: maximum value - sum: sum of values - count: number of value - avg: average value ▪ G 1 , G 2 ,..., G m lists the attributes on which to group - can be empty ▪ F i are the aggregate functions ▪ by default, duplicates are not eliminated before aggregating - can use the keyword distict to eliminate duplicates (e.g. sum-distinct)
February 27, 2019 Aggregate Function ... ▪ Examples ▪ "List the average amount of items in an order." ▪ G avg(amount) (order) ▪ "List the number of customers for each city." ▪ city G count(customerID) as customerNo (customer) avg(amount) 1.5 city customerNo Zurich 3 Brussels 1 Paris 1
February 27, 2019 Outer Joins ▪ The left outer join (⟕), right outer join (⟖) and full outer join (⟗) are extensions of the natural join operation ▪ Computes the natural join and then adds the tuples from one relation that do not match the other relation ▪ filled up with null values ▪ Example ▪ customer ⟕ order customerID name ... orderID cdID ... 53 Albert Einstein ... 1 2 ... 2 Eddy Merckx ... 2 1 ... 5 Claude Debussy ... null null ... ... ... ... ... ... ...
February 27, 2019 Outer Joins ... ▪ left outer join (⟕) ▪ keeps all tuples of the left relation ▪ non-matching parts filled up with null values ▪ right outer join (⟖) ▪ keeps all tuples of the right relation ▪ non-matching parts filled up with null values ▪ full outer join (⟗) ▪ keeps all tuples of both relations ▪ non-matching parts filled up with null values
February 27, 2019 Null Values ▪ A null value means that the value is unknown or non-existent ▪ Primary key attributes can never be null (entity integrity) ▪ The result of an arithmetic operation that involves a null value is null ▪ For grouping and duplicate elimination null values are treated like other values (two null values are the same) ▪ Null values are ignored in aggregate functions
February 27, 2019 Database Modifications ▪ A database may be modified by using one of the following operations ▪ insert ▪ update ▪ delete ▪ These operations are defined via the assignment operator
February 27, 2019 Insert ▪ r r E ▪ To insert new data into a relation we can ▪ explicitly specify the tuples to be inserted ▪ write a query and insert the result tuple set ▪ Example ▪ cd cd {(3, "Chromatic", 3012, 16.50, 1994)} cdID name duration price year 1 Falling into Place 2007 17.90 2007 2 Carcassonne 3156 15.50 1993 3 Chromatic 3012 16.50 1994
February 27, 2019 Update ▪ r pF1, F2,..., Fn (r) ▪ Update specific values in a tuple r ▪ Fi is either the old value of r or a new value if the attribute has to be updated ▪ Example ▪ "Increase the price of all CDs by 10%." ▪ cd pcdID, name, duration, price 1.1, year (cd) cdID name duration price year 1 Falling into Place 2007 19.69 2007 2 Carcassonne 3156 17.05 1993 3 Chromatic 3012 18.15 1994
February 27, 2019 Delete ▪ r r - E ▪ A delete is expressed similar to a query except that the result is removed from the database ▪ Cannot remove single attributes ▪ Example ▪ "Remove the CD with the name 'Carcassonne' from the database." ▪ cd cd - sname = "Carcassonne" (cd) cdID name duration price year 1 Falling into Place 2007 19.69 2007 3 Chromatic 3012 18.15 1994
February 27, 2019 Homework ▪ Study the following chapters of the Database System Concepts book ▪ chapter 2 - Relational Model ▪ chapter 6 - section 6.1 - Relational Algebra
February 27, 2019 References ▪ A. Silberschatz, H. Korth and S. Sudarshan, Database System Concepts (Sixth Edition), McGraw-Hill, 2010 ▪ E. F. Codd, A Relational Model of Data for Large Shared Data Banks, Communications of the ACM 13(6), June 1970 ▪ https://doi.org/10.1145/362384.362685