of Data (i) Asst. Prof. Lipyeow Lim Informa6on & Computer Science Department University of Hawaii at Manoa 1 Lipyeow Lim -‐-‐ University of Hawaii at Manoa
concepts for describing data • Structure of the data. – More of a conceptual model rather than a physical data model. Eg. Arrays, objects in C/C++ • Opera6ons on the data – Queries and modifica4ons only • Constraints on the data – Limita6ons on the data. Eg. Data type etc. Examples: the rela6onal model and the semi-‐ structured model (XML) Lipyeow Lim -‐-‐ University of Hawaii at Manoa 2
rela4ons • A rela4on is made up of 2 parts: – Instance : a table, with rows and columns. #Rows = cardinality, #fields = degree / arity. – Schema : specifies name of rela6on, plus name and domain/type of each column or aXribute. • E.G. Students(sid: string, name: string, login: string, age: integer, gpa: real). • Can think of a rela6on as a set of rows or tuples (i.e., all rows are dis6nct). Lipyeow Lim -‐-‐ University of Hawaii at Manoa 3
powerful querying of data. • Queries are wriXen declara6vely in SQL, and the DBMS finds an efficient execu6on plan. • Query Languages != programming languages! • Two mathema6cal query languages – Rela4onal Algebra: More opera6onal, useful for represen6ng query execu6on plans. – Rela4onal Calculus: More declara6ve Lipyeow Lim -‐-‐ University of Hawaii at Manoa 5
input and outputs a rela6on instance. • Posi6onal vs. named-‐field nota6on: – Named-‐field nota6on more readable. – Both used in SQL – Field names in query results are `inherited’ from input rela6ons • “Sailors” and “Reserves” rela6ons for our examples. sid bid day 22 101 10/10/96 58 103 11/12/96 Lipyeow Lim -‐-‐ University of Hawaii at Manoa 6 sid sname ra.ng age 22 Dus6n 7 45.0 31 Lubber 8 55.5 58 Rusty 10 35.0 R1 S1 S2 sid sname ra.ng age 28 Yuppy 9 35.0 31 Lubber 8 55.5 44 Guppy 5 35.0 58 Rusty 10 35.0
(σ) Selects a subset of rows from rela6on. – Projec4on (π) Deletes unwanted columns from rela6on. – Cross-‐product (×) Allows us to combine two rela6ons. – Set-‐difference (−) Tuples in reln. 1, but not in reln. 2. – Union (U) Tuples in reln. 1 and in reln. 2. • Addi6onal opera6ons: – Intersec6on, join, division, renaming: Not essen6al, but (very!) useful. • Since each opera6on returns a rela6on, opera6ons can be composed! (Algebra is “closed”.) Lipyeow Lim -‐-‐ University of Hawaii at Manoa 7
projec4on list. • Schema of result contains exactly the fields in the projec6on list, with the same names that they had in the (only) input rela6on. • Projec6on operator has to eliminate duplicates! (Why??) • Note: real systems typically don’t do duplicate elimina6on unless the user explicitly asks for it. (Why not?) Lipyeow Lim -‐-‐ University of Hawaii at Manoa 8 π sname, rating (S2) sname ra.ng Yuppy 9 Lubber 8 Guppy 5 Rusty 10 π age (S2) age 35.0 55.5 35.0 35.0
• No duplicates in result! (Why?) • Schema of result iden6cal to schema of (only) input rela6on. • Result rela6on can be the input for another rela6onal algebra opera6on! (Operator composi4on.) Lipyeow Lim -‐-‐ University of Hawaii at Manoa 9 sid sname ra.ng age 28 Yuppy 9 35.0 31 Lubber 8 55.5 44 Guppy 5 35.0 58 Rusty 10 35.0 sid sname ra.ng age 28 Yuppy 9 35.0 31 Lubber 8 55.5 44 Guppy 5 35.0 58 Rusty 10 35.0 σ rating > 8 (S2) π sname, rating (σrating>8 (S2))
two input rela6ons, which must be union-‐compa6ble: – Same number of fields. – `Corresponding’ fields have the same type. • What is the schema of result? Lipyeow Lim -‐-‐ University of Hawaii at Manoa 10 sid sname ra.ng age 22 Dus6n 7 45.0 31 Lubber 8 55.5 58 Rusty 10 35.0 S1 S2 sid sname ra.ng age 28 Yuppy 9 35.0 31 Lubber 8 55.5 44 Guppy 5 35.0 58 Rusty 10 35.0 S1 U S2 sid sname ra.ng age 22 Dus6n 7 45.0 28 Yuppy 9 35.0 31 Lubber 8 55.5 44 Guppy 5 35.0 58 Rusty 10 35.0
R1 • Each row of S1 is paired with each row of R1. • Result schema has one field per field of S1 and R1, with field names `inherited’ if possible. – Conflict: Both S1 and R1 have a field called sid. – Rename to sid1 and sid2 sid sname ra.ng age sid bid day 22 Dus6n 7 45 22 101 10/10/96 22 Dus6n 7 45 58 103 11/12/96 31 Lubber 8 55.5 22 101 10/10/96 31 Lubber 8 55.5 58 103 11/12/96 58 Rusty 10 35.0 22 101 10/10/96 58 Rusty 10 35.0 58 103 11/12/96 S1 × R1 sid bid day 22 101 10/10/96 58 103 11/12/96 sid sname ra.ng age 22 Dus6n 7 45.0 31 Lubber 8 55.5 58 Rusty 10 35.0 R1 S1 12 Lipyeow Lim -‐-‐ University of Hawaii at Manoa
• Result schema same as that of cross-‐product. • Fewer tuples than cross-‐product, might be able to compute more efficiently • Some6mes called a theta-‐join. Lipyeow Lim -‐-‐ University of Hawaii at Manoa 14 sid sname ra.ng age sid bid day 22 Dus6n 7 45 58 103 11/12/96 31 Lubber 8 55.5 58 103 11/12/96
of condi6on join where the condi6on c contains only equali4es. – Result schema similar to cross-‐product, but only one copy of fields for which equality is specified. • Natural Join: Equi-‐join on all common fields. Lipyeow Lim -‐-‐ University of Hawaii at Manoa 15 sid sname ra.ng age bid day 22 Dus6n 7 45 101 10/10/96 58 Rusty 10 35.0 103 11/12/96
answer the query ? – What columns are needed ? – Which tables do they belong to ? • How should the tables be linked together ? – Joins, cross-‐product etc • What columns are needed in the final output? – Projec6on operator • What filtering condi6ons are needed ? – Selec6on operator Lipyeow Lim -‐-‐ University of Hawaii at Manoa 17
• Informa6on about boat color only available in Boats; so need an extra join: Lipyeow Lim -‐-‐ University of Hawaii at Manoa 18 • A more efficient solu6on:
boat • First find all red or green boats, then find sailors who’ve reserved one of these boats: Lipyeow Lim -‐-‐ University of Hawaii at Manoa 19 • Can also define Tempboats using union! (How?) • What happens if ∨ is replaced by ∧ in this query?
boat • Previous approach won’t work! Must iden6fy sailors who’ve reserved red boats, sailors who’ve reserved green boats, then find the intersec6on (note that sid is a key for Sailors): Lipyeow Lim -‐-‐ University of Hawaii at Manoa 20
theore6cal rela6onal query languages: rela6onal algebra & rela6onal calculus • Rela6onal Algebra (RA) operators: selec6on, projec6on, cross-‐product, set difference, union, intersec6on, join, division, renaming • Operators are closed and can be composed • RA is more opera6onal and could be used as internal representa6on for query evalua6on plans. • For the same query, the RA expression is not unique. • Query op6mizer can choose the most efficient version. Lipyeow Lim -‐-‐ University of Hawaii at Manoa 21