Slide 1

Slide 1 text

Type-safe Functional SQL Functional Conf 2019 Anupam Jain

Slide 2

Slide 2 text

Bird’s Eye View • Why SQL doesn’t scale. • Allow scaling by decomposing SQL into Relational Algebra. • Implementing Relational Algebra Ops. Row Types! • Bonus Content!

Slide 3

Slide 3 text

Structured Query Language

Slide 4

Slide 4 text

Relational Data • Codd’s Relational Model • Set of Typed Columns, Set of Rows of Related Data • References to other Relations

Slide 5

Slide 5 text

SQL • Query Language for Relational Database • Declarative • Immutable • Based on Sound Mathematical Theory •

Slide 6

Slide 6 text

People Id Name Father Mother 827 E 321 235 828 F 917 61 829 G 819 937 830 H 421 234 … … … …

Slide 7

Slide 7 text

Fetch All Children SELECT *
 FROM people person WHERE person.father = 827
 AND person.mother = 827


Slide 8

Slide 8 text

Fetch All Grandchildren SELECT *
 FROM people person,
 people father,
 people mother,
 WHERE person.father = father.id
 AND person.mother = mother.id
 AND (father.father = 827 OR mother.father = 827 OR father.mother = 827 OR mother.mother = 827)

Slide 9

Slide 9 text

Hard to Abstract • Repeated ID in the query • Missing Type Information • What columns are in the final output? • Hard to isolate bugs • No Reusable Abstraction

Slide 10

Slide 10 text

Fetch All Great-grandchildren SELECT *
 
 FROM people person,
 people father,
 people mother,
 people gfather1,
 people gfather2,
 people gmother1,
 people gmother2 WHERE person.father = father.id
 AND person.mother = mother.id
 AND father.father = gfather1.id
 AND father.mother = gmother1.id
 AND mother.father = gfather2.id
 AND mother.mother = gmother2.id
 AND (gfather1.father = 827
 OR gmother1.father = 827
 OR gfather2.father = 827
 OR gmother2.father = 827 ….?)

Slide 11

Slide 11 text

Primitives That Compose Poorly • Get the Top 5 Highest Paid Employees • SELECT Name, Salary FROM Employees WHERE rownum <= 5 ORDER BY Salary DESC

Slide 12

Slide 12 text

Primitives That Compose Poorly • Actual Correct Solution • SELECT Name, Salary FROM (SELECT Name, Salary FROM Employees ORDER BY Salary DESC) WHERE rownum <= 5 • (Or use LIMIT 5)

Slide 13

Slide 13 text

Reusable Abstractions • Parameters • Functions

Slide 14

Slide 14 text

What About? • Stored Procedures • SQL Functions • Views •

Slide 15

Slide 15 text

MODULARISATION WITH DB FUNCTIONS • Function to look up IATA codes and return the city name. • CREATE OR REPLACE FUNCTION airport_city(iata_code IN char) RETURN varchar2 IS CITY_NAME varchar2(50); BEGIN SELECT city INTO city_name FROM iata_airport_codes WHERE code = iata_codes; RETURN(city_name); END;

Slide 16

Slide 16 text

Using DB Functions • Find all flights, and their destinations, departing on a certain date. • SELECT flight_number airport_city(arrival) “TO” FROM flights WHERE departure_date = • Returns 100 rows in 0.2 sec

Slide 17

Slide 17 text

Using JOINs • Find all flights, and their destinations, departing on a certain date. • SELECT f.flight_number a.city “TO” FROM flights f iata_airport_codes a WHERE a.code = f.arrival AND departure_date = • Returns 100 Rows in 0.02 seconds • Difference increases with number of rows

Slide 18

Slide 18 text

Query Optimiser Relational Core Non Relational Processing Optimiser Boundary

Slide 19

Slide 19 text

Performant Queries • Relational Core is Optimised Heavily • It’s Efficient to Use Even A Large Number of Joins (Say < 20) • Any communication to/from the DB Engine will cause slowdowns. This includes calls to Functions and Procedures.

Slide 20

Slide 20 text

Relational Algebra To The Rescue

Slide 21

Slide 21 text

Relational Algebra • Based on Sound Mathematical Theory • Allows Reasoning about Output • Allows Sound Transformation from inefficient query to efficient query

Slide 22

Slide 22 text

Relational Algebra • Projection (SELECT) • Selection (WHERE) • Rename • Joins •

Slide 23

Slide 23 text

Implementing Relational Algebra • Combinator Library • With Row Types

Slide 24

Slide 24 text

Combinator Library • New Data Type - Relation R • Set of Ops • :: Relation R -> Relation S • R, S, are rows •

Slide 25

Slide 25 text

Example Ops • Select/Project :: Relation r -> Relation s • Union :: Relation r -> Relation r -> Relation r • Filter :: Relation r -> Relation r • Join :: Relation (r|s) -> Relation (s|t) -> Relation (r|s|t) • Where r and t are disjoint. More on this later. • RenameField :: Relation ((m::t) | r) -> Relation ((n::t) | r)

Slide 26

Slide 26 text

Row Types • type ageFilter :: {age :: Int} -> Boolean • oldPeople :: Array ({age :: Int}) -> Array ({age :: Int}) oldPeople = filter (\person -> person.age > 40) • But after this filtering, we lose access to the rest of the fields! There’s nothing connecting the input fields to the output fields. • Hence we need a way to specify “remaining fields” as a polymorphic variable. • ageFilter :: {age :: Int | rest} -> Boolean • oldPeople :: Array ({age :: Int | rest}) -> Array ({age :: Int | rest})

Slide 27

Slide 27 text

Relational Algebra Composes • getChildren :: Relation Person -> Relation Person
 getChildren p = union
 (people # join (p # rename id father)) (people # join (p # rename id mother)) # project (id, name, father, mother) • getGrandchildren p = getChildren (getChildren p)

Slide 28

Slide 28 text

Row Types in Haskell • Field a = Symbol :-> a • Row. List is always ordered. • newtype Row a = R [Field a] • Notice the TypeLevel list • Empty Row • type Empty = R ‘[ ]

Slide 29

Slide 29 text

Row Types in Haskell • List of Types • Adding/Deleting values from type level lists • Sorting type level lists • Type level functions • i.e. Type Families

Slide 30

Slide 30 text

Type Families • Create Overloaded Data, just like TypeClasses create Overloaded Functions. • Closed Type Family is like a Type level function. • type family Foo where Foo Int = Int Foo String = String Foo a = Int • By default - type family Foo :: * • But could be - type family Foo :: Constraint type family Foo :: * -> * etc.

Slide 31

Slide 31 text

User Defined Type Errors • type family TypeError (a :: ErrorMessage) :: b where … • instance TypeError (Text "Cannot Show functions.") => Show (a -> b) where showsPrec = error “unreachable" • data Succ a data Zero type family Pred where Pred (Succ x) = x Pred Zero = TypeError (Text “No Predecessor for Zero")

Slide 32

Slide 32 text

Merge • type family Merge (l :: [Field k]) (r :: [Field k]) where Merge ‘[ ] r = r Merge l ‘[ ] = l Merge (h :-> al ': tl) (h :-> ar ': tr) = TypeError (TL.Text “Non Matching Label") Merge (hl :-> al ': l) (hr :-> ar ': r) = ???

Slide 33

Slide 33 text

GHC.TypeLits • Type Level IF • type family IfT a b c where IfT ‘True b c = b IfT ‘False b c = c • Type level Bool. (Data Kinds Extension). • Type Level Symbols Comparison • type (<=) x y

Slide 34

Slide 34 text

Merge • type family Merge (l :: [Field k]) (r :: [Field k]) where … Merge (hl :-> al ‘: l) (hr :-> ar ‘: r) = IfT hl <= hr (hl :-> al ': Merge tl (hr :-> ar ': tr)) (hr :-> ar ': Merge (hl :-> al ': tl) tr)

Slide 35

Slide 35 text

Type Level Row Append type family (l :: Row k) .+ (r :: Row k) :: Row k where R l .+ R r = R (Merge l r)

Slide 36

Slide 36 text

Diff type family Diff (l :: [Field k]) (r :: [Field k]) where Diff ‘[ ] r = ‘[ ] Diff l ‘[ ] = l Diff (l :-> al ': tl) (l :-> al ': tr) = Diff tl tr Diff (hl :-> al ': tl) (hr :-> ar ': tr) = Ifte (hl <=.? hr) (hl :-> al ': Diff tl (hr :-> ar ': tr)) (Diff (hl :-> al ': tl) tr)

Slide 37

Slide 37 text

Type Level Row Diff type family (l :: Row k) .\\ (r :: Row k) :: Row k where R l .\\ R r = R (Diff l r)

Slide 38

Slide 38 text

Type Level Disjoint type Disjoint l r = (l .+ r .\\ l ~ r, l .+ r .\\ r ~ l)

Slide 39

Slide 39 text

Relational Ops -- | Select some columns from a relation Project :: Disjoint r rest => Relation (r .+ rest) -> Relation r -- | Remove some columns from a relation Except :: Disjoint r rest => Relation (r .+ rest) -> Relation rest

Slide 40

Slide 40 text

Relational Ops RenameField :: (KnownSymbol old, KnownSymbol new) => Label old -> Label new -> Relation r -> Relation(Rec.Rename old new r)

Slide 41

Slide 41 text

Relational Ops -- | Add a new field to a relation AddField :: (KnownSymbol l, Lacks l r) => Label l -> a -> Relation b r -> Relation 'False (Rec.Extend l a r) -- | Remove a single field from a relation RemoveField :: (KnownSymbol l, HasType l a r) => Label l -> Relation b r -> Relation 'False (r .- l)

Slide 42

Slide 42 text

Relational Ops Filter :: (RecFunction r Bool) -> Relation r -> Relation r Limit :: Int -> Relation r -> Relation r • But What About this Function

Slide 43

Slide 43 text

Passing Functions to SQL • We provide inbuilt ones • And Create a DSL • RecFunction r a • (>=.) :: Symbol l -> Int -> Function () Bool

Slide 44

Slide 44 text

Bonus Content

Slide 45

Slide 45 text

Bonus: Nested Relational Algebra • Attributes can be Relations themselves! • Group :: Relation (r|s) -> Relation (r | (l::Relation s)) • Ungroup :: Relation (r | (l::Relation s)) -> Relation (r|s) • Enables clean definitions of things like aggregations

Slide 46

Slide 46 text

Aggregation With Nested Relations • Aggregation is basically a fold over all the rows • Aggregate :: Relation r -> (s -> r -> s) -> s -> s • In practice, for SQL support, we can’t support arbitrary functions for (s -> r -> s) • We provide inbuilt ones - avg, sum, max, min, count, etc. • Combined with Relation Valued ops, we can do SQL aggregations cleanly

Slide 47

Slide 47 text

Example • Find out sales per car by year

Slide 48

Slide 48 text

Initial Table Year Car Sales 2001 A 100 2001 A 200 2001 B 100 2002 B 100 2002 A 300 2002 C 300

Slide 49

Slide 49 text

Group by ‘Year Year YearData Car Sales 2001 A 100 A 200 B 100 2002 B 100 A 300 C 300

Slide 50

Slide 50 text

Map ‘YearData (Group by ‘Car) Year YearData Car CarData Sales 2001 A 100 200 B 100 2002 B 100 A 300 C 300

Slide 51

Slide 51 text

Map ‘YearData (Map ‘CarData (Sum Sales)) Year YearData Car CarData Sales 2001 A 300 B 100 2002 B 100 A 300 C 300

Slide 52

Slide 52 text

Bonus Content 2: Integration with Reporting • Report r - Includes display data • Sort :: Label s -> Report r -> Report r • Where r includes label s. More on this later. • Pivot - Allows displaying the table pivoted, with the rows transformed into cols and vice versa • Pivot :: Relation r -> Report r

Slide 53

Slide 53 text

Integration With Reporting hierarchy = mem [ {parentId = 0, nodeId = 1, name = "Total"}, {parentId = 1, nodeId = 2, name = "Tech"}, {parentId = 2, nodeId = 3, name = "Google"}, {parentId = 2, nodeId = 4, name = "Microsoft"}, {parentId = 2, nodeId = 5, name = "Apple"}, {parentId = 1, nodeId = 6, name = "Financial"}, {parentId = 6, nodeId = 7, name = "Wells Fargo"}, {parentId = 6, nodeId = 8, name = "Citibank"}, {parentId = 6, nodeId = 9, name = "Bank Of America"} ] valueData = mem [ {name = "Google", value = Some 1.0}, {name = "Microsoft", value = Some 0.5}, {name = "Apple", value = Some 0.75}, {name = "Wells Fargo", value = Some 0.8}, {name = "Citibank", value = Some 0.9}, {name = "Bank Of America", value = Some 0.4} ] metric = valueData |> join hierarchy |> except {name}

Slide 54

Slide 54 text

Advanced sql reportData = let f = aggregate (sum value) value in accumulate parentId nodeId f metric hierarchy # join hierarchy Name nodeId parentId value Total 1 0 4.35 Tech 2 1 2.25 Google 3 2 1 Microsoft 4 2 0.5 Apple 5 2 0.75 Financial 6 1 2.1 Wells Fargo 7 6 0.8 Citibank 8 6 0.9 Bank Of America 9 6 0.4

Slide 55

Slide 55 text

Advanced sql report = reportData # drilldownTable Nothing name parentId nodeId Name value Total 4.35 Tech 2.25 Google 1 Microsoft 0.5 Apple 0.75 Financial 2.1 Wells Fargo 0.8 Citibank 0.9 Bank Of America 0.4

Slide 56

Slide 56 text

Thank You