$30 off During Our Annual Pro Sale. View Details »

Fearless Joins

Fearless Joins

Companion slides for the "Fearless Joins" talk: https://hasgeek.com/rootconf/just-enough-sql-to-be-effective/

Warning: May not be very effective on their own.

If you work with SQL, you invariably come across questions like:

- What is a lateral join?
- When do I pick what kind of join?
- Rectangular join? What?
- What happens if you mix inner and outer joins together?

With simple practical examples, and visual representation of the joins, we will cover:

- Why is it called a join and what exactly does it mean?
- The concepts behind “inner/outer” and “left/right” and how DB engines determine which rows will be in ResultSet.
- The technique of “one at a time” in building large, complex joins.

Swanand Pagnis

September 01, 2020
Tweet

More Decks by Swanand Pagnis

Other Decks in Programming

Transcript

  1. Fearless Joins
    Just enough SQL to be effective

    View Slide

  2. Swanand Pagnis
    @_swanand
    info.pagnis.in
    queries.fun

    View Slide

  3. Outgrow ActiveRecord,
    Django.DB, and friends

    View Slide

  4. Joins

    View Slide

  5. - Understand joins
    - Write complex joins
    - When to use which

    View Slide

  6. Inner / Outer
    Left / Right / Full
    Recursive
    Lateral

    View Slide

  7. Have sets,
    will combine

    View Slide

  8. Have sets,
    will join

    View Slide

  9. Joins are about sets.
    Not tables.

    View Slide

  10. Your data could be
    in tables

    View Slide

  11. Your data could be
    in views

    View Slide

  12. Your data could be
    in a file

    View Slide

  13. Your data could be
    in another query

    View Slide

  14. Joins are about sets.

    View Slide

  15. Unlearn the Venn
    diagram. #

    View Slide

  16. Illustration time! $

    View Slide

  17. Refer to the PDF

    View Slide

  18. Join Semantics:

    View Slide

  19. SET L SET R

    View Slide

  20. L set: [(a, 1), (b, 2)]
    R set: [(1, %), (3, &)]

    View Slide

  21. L set: [(a, 1), (b, 2), null]
    R set: [(1, %), (3, &), null]

    View Slide

  22. Working set:

    View Slide

  23. (a, 1) - (1, %)
    (a, 1) - (3, &)
    (a, 1) - null
    (b, 2) - (1, %)
    (b, 2) - (3, &)
    (b, 2) - null
    null - (1, %)
    null - (3, &)
    null - null (Ignored for practicality)

    View Slide

  24. Joins without predicate

    View Slide

  25. (a, 1) - (1, %)
    (a, 1) - (3, &)
    (b, 2) - (1, %)
    (b, 2) - (3, &)

    View Slide

  26. All nulls ignored.

    View Slide

  27. Try this:
    select * FROM
    (VALUES ('one', 1), ('two', 2)) as t1,
    (VALUES (1, 'O'), (2, 'T')) as t2;

    View Slide

  28. Result:
    +-----------+-----------+-----------+-----------+
    | column1 | column2 | column1 | column2 |
    |-----------+-----------+-----------+-----------|
    | one | 1 | 1 | O |
    | one | 1 | 2 | T |
    | two | 2 | 1 | O |
    | two | 2 | 2 | T |
    +-----------+-----------+-----------+-----------+

    View Slide

  29. Joins with predicate

    View Slide

  30. join on L.column2 = R.column1
    (a, 1) - (1, %) ✅ "true" match
    (a, 1) - (3, &) "false" match
    (a, 1) - null because already matched
    (b, 2) - (1, %) "false" match
    (b, 2) - (3, &) "false" match
    (b, 2) - null ✅ "no match"
    null - (1, %) because already matched
    null - (3, &) ✅ "no match"

    View Slide

  31. join on L.column2 = R.column1
    (a, 1) - (1, %) ✅ "true" match
    (b, 2) - null ✅ "no match"
    null - (3, &) ✅ "no match"

    View Slide

  32. join on L.column2 = R.column1
    (a, 1) - (1, %) - matched
    (b, 2) - null - L has no match
    null - (3, &) - R has no match

    View Slide

  33. Back to left/inner/outer

    View Slide

  34. join on L.column2 = R.column1
    inner — skip all "no match"
    left — skip all "R with no match"
    right — skip all "L with no match"
    full - don't skip anything

    View Slide

  35. join on L.column2 = R.column1
    (a, 1) - (1, %) - always included
    (b, 2) - null - left and full
    null - (3, &) - right and full

    View Slide

  36. Joins in action

    View Slide

  37. We will use this schema:

    View Slide

  38. Rumours
    Tidbits
    Sources

    View Slide

  39. Rumour is provided by many Sources
    Source provides many Rumours
    Tidbit is about a Rumour
    Tidbit is provided by many Sources

    Source provides many Tidbits

    View Slide

  40. Rumour: "BigCo is going all remote"
    Sources: PopularMag, BuzzStream
    Tidbits:
    - Employees are given home setup budgets
    - A job was posted on WeWorkRemotely
    - Hired Jane; senior leader, outspoken
    remote work supporter
    - CEO followed "ReMoat Podcast" on Twitter

    View Slide

  41. Labels

    View Slide

  42. Label can sub-labels
    Label can have one parent label
    A classic tree hierarchy.

    View Slide

  43. - Computer Science
    - OS
    - Memory Management
    - Processes and Signals
    - Software Engineering
    - OOP Design
    - Encapsulation
    - Functional Programming

    View Slide

  44. https:"//gist.github.com/swanandp/
    d0252911bb7bb4dbe82a0f9ce8df9d1a

    View Slide

  45. Questions?

    View Slide

  46. Thank you!

    View Slide