$30 off During Our Annual Pro Sale. View Details »

Typesafe Functional SQL

Anupam
November 16, 2019

Typesafe Functional SQL

This talk was presented at Functional Conf 2019 [https://functionalconf.com/].

We'll discuss how to use strongly typed functional programming to build a typesafe and highly declarative DSL to generate SQL, that allows better abstractions and minimises runtime errors. We'll focus on type systems, and build an intuition for row types and how it guarantees correctness for SQL. I'll cover some parts of actual SQL generation.

Anupam

November 16, 2019
Tweet

More Decks by Anupam

Other Decks in Technology

Transcript

  1. Type-safe Functional SQL Functional Conf 2019 Anupam Jain

  2. Bird’s Eye View • Why SQL doesn’t scale. • Allow

    scaling by decomposing SQL into Relational Algebra. • Implementing Relational Algebra Ops. Row Types! • Bonus Content!
  3. Structured Query Language

  4. Relational Data • Codd’s Relational Model • Set of Typed

    Columns, Set of Rows of Related Data • References to other Relations
  5. SQL • Query Language for Relational Database • Declarative •

    Immutable • Based on Sound Mathematical Theory •
  6. People Id Name Father Mother 827 E 321 235 828

    F 917 61 829 G 819 937 830 H 421 234 … … … …
  7. Fetch All Children SELECT *
 FROM people person WHERE person.father

    = 827
 AND person.mother = 827

  8. Fetch All Grandchildren SELECT *
 FROM people person,
 people father,


    people mother,
 WHERE person.father = father.id
 AND person.mother = mother.id
 AND (father.father = 827 OR mother.father = 827 OR father.mother = 827 OR mother.mother = 827)
  9. Hard to Abstract • Repeated ID in the query •

    Missing Type Information • What columns are in the final output? • Hard to isolate bugs • No Reusable Abstraction
  10. Fetch All Great-grandchildren SELECT *
 
 FROM people person,
 people

    father,
 people mother,
 people gfather1,
 people gfather2,
 people gmother1,
 people gmother2 WHERE person.father = father.id
 AND person.mother = mother.id
 AND father.father = gfather1.id
 AND father.mother = gmother1.id
 AND mother.father = gfather2.id
 AND mother.mother = gmother2.id
 AND (gfather1.father = 827
 OR gmother1.father = 827
 OR gfather2.father = 827
 OR gmother2.father = 827 ….?)
  11. Primitives That Compose Poorly • Get the Top 5 Highest

    Paid Employees • SELECT Name, Salary FROM Employees WHERE rownum <= 5 ORDER BY Salary DESC
  12. Primitives That Compose Poorly • Actual Correct Solution • SELECT

    Name, Salary FROM (SELECT Name, Salary FROM Employees ORDER BY Salary DESC) WHERE rownum <= 5 • (Or use LIMIT 5)
  13. Reusable Abstractions • Parameters • Functions

  14. What About? • Stored Procedures • SQL Functions • Views

  15. MODULARISATION WITH DB FUNCTIONS • Function to look up IATA

    codes and return the city name. • CREATE OR REPLACE FUNCTION airport_city(iata_code IN char) RETURN varchar2 IS CITY_NAME varchar2(50); BEGIN SELECT city INTO city_name FROM iata_airport_codes WHERE code = iata_codes; RETURN(city_name); END;
  16. Using DB Functions • Find all flights, and their destinations,

    departing on a certain date. • SELECT flight_number airport_city(arrival) “TO” FROM flights WHERE departure_date = <some date> • Returns 100 rows in 0.2 sec
  17. Using JOINs • Find all flights, and their destinations, departing

    on a certain date. • SELECT f.flight_number a.city “TO” FROM flights f iata_airport_codes a WHERE a.code = f.arrival AND departure_date = <some date> • Returns 100 Rows in 0.02 seconds • Difference increases with number of rows
  18. Query Optimiser Relational Core Non Relational Processing Optimiser Boundary

  19. Performant Queries • Relational Core is Optimised Heavily • It’s

    Efficient to Use Even A Large Number of Joins (Say < 20) • Any communication to/from the DB Engine will cause slowdowns. This includes calls to Functions and Procedures.
  20. Relational Algebra To The Rescue

  21. Relational Algebra • Based on Sound Mathematical Theory • Allows

    Reasoning about Output • Allows Sound Transformation from inefficient query to efficient query
  22. Relational Algebra • Projection (SELECT) • Selection (WHERE) • Rename

    • Joins •
  23. Implementing Relational Algebra • Combinator Library • With Row Types

  24. Combinator Library • New Data Type - Relation R •

    Set of Ops • :: Relation R -> Relation S • R, S, are rows •
  25. Example Ops • Select/Project :: Relation r -> Relation s

    • Union :: Relation r -> Relation r -> Relation r • Filter :: Relation r -> Relation r • Join :: Relation (r|s) -> Relation (s|t) -> Relation (r|s|t) • Where r and t are disjoint. More on this later. • RenameField :: Relation ((m::t) | r) -> Relation ((n::t) | r)
  26. Row Types • type ageFilter :: {age :: Int} ->

    Boolean • oldPeople :: Array ({age :: Int}) -> Array ({age :: Int}) oldPeople = filter (\person -> person.age > 40) • But after this filtering, we lose access to the rest of the fields! There’s nothing connecting the input fields to the output fields. • Hence we need a way to specify “remaining fields” as a polymorphic variable. • ageFilter :: {age :: Int | rest} -> Boolean • oldPeople :: Array ({age :: Int | rest}) -> Array ({age :: Int | rest})
  27. Relational Algebra Composes • getChildren :: Relation Person -> Relation

    Person
 getChildren p = union
 (people # join (p # rename id father)) (people # join (p # rename id mother)) # project (id, name, father, mother) • getGrandchildren p = getChildren (getChildren p)
  28. Row Types in Haskell • Field a = Symbol :->

    a • Row. List is always ordered. • newtype Row a = R [Field a] • Notice the TypeLevel list • Empty Row • type Empty = R ‘[ ]
  29. Row Types in Haskell • List of Types • Adding/Deleting

    values from type level lists • Sorting type level lists • Type level functions • i.e. Type Families
  30. Type Families • Create Overloaded Data, just like TypeClasses create

    Overloaded Functions. • Closed Type Family is like a Type level function. • type family Foo where Foo Int = Int Foo String = String Foo a = Int • By default - type family Foo :: * • But could be - type family Foo :: Constraint type family Foo :: * -> * etc.
  31. User Defined Type Errors • type family TypeError (a ::

    ErrorMessage) :: b where … • instance TypeError (Text "Cannot Show functions.") => Show (a -> b) where showsPrec = error “unreachable" • data Succ a data Zero type family Pred where Pred (Succ x) = x Pred Zero = TypeError (Text “No Predecessor for Zero")
  32. Merge • type family Merge (l :: [Field k]) (r

    :: [Field k]) where Merge ‘[ ] r = r Merge l ‘[ ] = l Merge (h :-> al ': tl) (h :-> ar ': tr) = TypeError (TL.Text “Non Matching Label") Merge (hl :-> al ': l) (hr :-> ar ': r) = ???
  33. GHC.TypeLits • Type Level IF • type family IfT a

    b c where IfT ‘True b c = b IfT ‘False b c = c • Type level Bool. (Data Kinds Extension). • Type Level Symbols Comparison • type (<=) x y
  34. Merge • type family Merge (l :: [Field k]) (r

    :: [Field k]) where … Merge (hl :-> al ‘: l) (hr :-> ar ‘: r) = IfT hl <= hr (hl :-> al ': Merge tl (hr :-> ar ': tr)) (hr :-> ar ': Merge (hl :-> al ': tl) tr)
  35. Type Level Row Append type family (l :: Row k)

    .+ (r :: Row k) :: Row k where R l .+ R r = R (Merge l r)
  36. Diff type family Diff (l :: [Field k]) (r ::

    [Field k]) where Diff ‘[ ] r = ‘[ ] Diff l ‘[ ] = l Diff (l :-> al ': tl) (l :-> al ': tr) = Diff tl tr Diff (hl :-> al ': tl) (hr :-> ar ': tr) = Ifte (hl <=.? hr) (hl :-> al ': Diff tl (hr :-> ar ': tr)) (Diff (hl :-> al ': tl) tr)
  37. Type Level Row Diff type family (l :: Row k)

    .\\ (r :: Row k) :: Row k where R l .\\ R r = R (Diff l r)
  38. Type Level Disjoint type Disjoint l r = (l .+

    r .\\ l ~ r, l .+ r .\\ r ~ l)
  39. Relational Ops -- | Select some columns from a relation

    Project :: Disjoint r rest => Relation (r .+ rest) -> Relation r -- | Remove some columns from a relation Except :: Disjoint r rest => Relation (r .+ rest) -> Relation rest
  40. Relational Ops RenameField :: (KnownSymbol old, KnownSymbol new) => Label

    old -> Label new -> Relation r -> Relation(Rec.Rename old new r)
  41. Relational Ops -- | Add a new field to a

    relation AddField :: (KnownSymbol l, Lacks l r) => Label l -> a -> Relation b r -> Relation 'False (Rec.Extend l a r) -- | Remove a single field from a relation RemoveField :: (KnownSymbol l, HasType l a r) => Label l -> Relation b r -> Relation 'False (r .- l)
  42. Relational Ops Filter :: (RecFunction r Bool) -> Relation r

    -> Relation r Limit :: Int -> Relation r -> Relation r • But What About this Function
  43. Passing Functions to SQL • We provide inbuilt ones •

    And Create a DSL • RecFunction r a • (>=.) :: Symbol l -> Int -> Function () Bool
  44. Bonus Content

  45. Bonus: Nested Relational Algebra • Attributes can be Relations themselves!

    • Group :: Relation (r|s) -> Relation (r | (l::Relation s)) • Ungroup :: Relation (r | (l::Relation s)) -> Relation (r|s) • Enables clean definitions of things like aggregations
  46. Aggregation With Nested Relations • Aggregation is basically a fold

    over all the rows • Aggregate :: Relation r -> (s -> r -> s) -> s -> s • In practice, for SQL support, we can’t support arbitrary functions for (s -> r -> s) • We provide inbuilt ones - avg, sum, max, min, count, etc. • Combined with Relation Valued ops, we can do SQL aggregations cleanly
  47. Example • Find out sales per car by year

  48. Initial Table Year Car Sales 2001 A 100 2001 A

    200 2001 B 100 2002 B 100 2002 A 300 2002 C 300
  49. Group by ‘Year Year YearData Car Sales 2001 A 100

    A 200 B 100 2002 B 100 A 300 C 300
  50. Map ‘YearData (Group by ‘Car) Year YearData Car CarData Sales

    2001 A 100 200 B 100 2002 B 100 A 300 C 300
  51. Map ‘YearData (Map ‘CarData (Sum Sales)) Year YearData Car CarData

    Sales 2001 A 300 B 100 2002 B 100 A 300 C 300
  52. Bonus Content 2: Integration with Reporting • Report r -

    Includes display data • Sort :: Label s -> Report r -> Report r • Where r includes label s. More on this later. • Pivot - Allows displaying the table pivoted, with the rows transformed into cols and vice versa • Pivot :: Relation r -> Report r
  53. Integration With Reporting hierarchy = mem [ {parentId = 0,

    nodeId = 1, name = "Total"}, {parentId = 1, nodeId = 2, name = "Tech"}, {parentId = 2, nodeId = 3, name = "Google"}, {parentId = 2, nodeId = 4, name = "Microsoft"}, {parentId = 2, nodeId = 5, name = "Apple"}, {parentId = 1, nodeId = 6, name = "Financial"}, {parentId = 6, nodeId = 7, name = "Wells Fargo"}, {parentId = 6, nodeId = 8, name = "Citibank"}, {parentId = 6, nodeId = 9, name = "Bank Of America"} ] valueData = mem [ {name = "Google", value = Some 1.0}, {name = "Microsoft", value = Some 0.5}, {name = "Apple", value = Some 0.75}, {name = "Wells Fargo", value = Some 0.8}, {name = "Citibank", value = Some 0.9}, {name = "Bank Of America", value = Some 0.4} ] metric = valueData |> join hierarchy |> except {name}
  54. Advanced sql reportData = let f = aggregate (sum value)

    value in accumulate parentId nodeId f metric hierarchy # join hierarchy Name nodeId parentId value Total 1 0 4.35 Tech 2 1 2.25 Google 3 2 1 Microsoft 4 2 0.5 Apple 5 2 0.75 Financial 6 1 2.1 Wells Fargo 7 6 0.8 Citibank 8 6 0.9 Bank Of America 9 6 0.4
  55. Advanced sql report = reportData # drilldownTable Nothing name parentId

    nodeId Name value Total 4.35 Tech 2.25 Google 1 Microsoft 0.5 Apple 0.75 Financial 2.1 Wells Fargo 0.8 Citibank 0.9 Bank Of America 0.4
  56. Thank You