Upgrade to Pro — share decks privately, control downloads, hide ads and more …

What if your databases never forgot

B58690b45ac5b0864fe7001de44aff9d?s=47 Rahul De
November 04, 2020

What if your databases never forgot

All major mainstream databases are update in-place and have a notion of NOW rather than a progression of time. They are supposed to be a reflection of real life, but are they really? When you save a Customer and their address changes, does that mean they never used to live at the old place? What if your DB started behaving like a git repo instead and recorded each change as a series of changes or facts? What if you could time travel in the DB and look back at any point of time? And do it with the similar performance as any of the usual DBs? What if CRUD isn't the right way of thinking of DBs? Let's discuss about DBs which make this possible and not only radically change your thought process about data and facts but also simplify everything around the DB: the app, the infra and the monitoring tooling! Let's be functional at the disk too and not just the app.

B58690b45ac5b0864fe7001de44aff9d?s=128

Rahul De

November 04, 2020
Tweet

Transcript

  1. NEVER FORGET

  2. ” “ — Nena, 99 Luftballons HAST DU ETWAS ZEIT

    FÜR MICH? DANN SINGE ICH EIN LEID FÜR DICH.
  3. I’M RAHUL DE HELLO, WORLD! • lispyclouds • Hopelessly in

    love with Functional Programming, Clojure and High Performance, Scalable Infrastructures • Make Simple Easy • Diversity, Sustainable Living, Anarcho- Communism, Teaching for Learning • ThoughtWorks Berlin • https://github.com/lispyclouds • https://twitter.com/lispyclouds
  4. CONTENT WARNING: THIS CONTAINS SOME HISTORICAL DEPICTIONS OF RIGHT WING

    EXTREMIST IMAGERY.
  5. BERLIN

  6. None
  7. None
  8. None
  9. None
  10. None
  11. None
  12. None
  13. FACTS

  14. SOME OF US TEND TO FORGET

  15. None
  16. THEY ARE IMMUTABLE DATA, INFORMATION AND FACTS • Durable log

    of events • Series of changes accreted over time • Ledger like properties • Log book, Bookkeeping? • Most of us don’t forget our history and where we came from • History is immutable (in most places)
  17. ARE OUR COMPUTERS LIKE THAT?

  18. numbers = [1, 2, 3] doubles = [] for number

    in numbers: doubles.append(number * 2)
  19. None
  20. numbers = [1, 2, 3] doubles = map(lambda n: n

    * 2, numbers)
  21. None
  22. db_conn = db_connect() def add_city(city_data): db_conn.insert(city_data) def get_city(city): return db_conn.get(city)

    def update_city(city, data): db_conn.update(city, data) def delete_city(city): db_conn.delete(city)
  23. DO THEY REALLY REPRESENT REAL LIFE? MAINSTREAM DATABASES • Update

    in-place • CRUD • UPDATE and ALTER • Ugly and globally mutable variable • Central and often elaborate locking and query engines • Object Relational Mappers and Relational Databases have a rigid view of data which is highly dynamic • Expensive queries affect all the cluster members • Inconsistencies due to separate operational and analytical databases
  24. DID WE FORGET SOMETHING?

  25. ” “ HAST DU ETWAS ZEIT FÜR MICH?

  26. AIN’T NOBODY GOT TIME IN THEM MAINSTREAM DATABASES • Have

    the notion of NOW and not how we got here • Update in-place and CRUD directly results in overwriting and forgetting of the past • UPDATE and ALTER causes in-place structural changes with similar effects as above • Changes are central and affect everyone regardless of the view others want • Due to the in-place updates, we often resort to dedicated operational and analytical instances as the access patterns are way different • Need for elaborate and often highly complex logging, append-only strategies and timestamps to remember the past
  27. city_name city_population country_name Berlin 3.769.000 Germany Barcelona 5.575.000 Spain London

    8.892.000 United Kingdom
  28. city_name city_population country_name Berlin 3.769.300 Germany Barcelona 5.575.200 Spain London

    8.893.000 United Kingdom
  29. THE MESS WE ARE IN HOW DID WE GET HERE?

    • Relic of the past: computing resources were really expensive and we needed to overwrite the bit of disk and RAM we had; resources are literally dirt cheap now • Rewritable memory directly resulted in the imperative paradigm we are in now • The notion of imperative instructions(CRUD) rather than declarative queries also results from this • Functional Programming and Immutability is actively tackling this issue in the language levels • The mainstream tooling like Java/C#/Kotlin etc though offering functional facilities still are quite imperative at core • Imperative tooling is a direct impedance to the scale needs of today • But regardless of the declarative languages the DB and concordantly the Disk Persistence are very very imperative and employ update in-place heavily
  30. ” “ — John Backus, ACM Proceedings, 1978 CAN PROGRAMMING

    BE LIBERATED FROM VON NEUMANN STYLE?
  31. Are we there yet? - Rich Hickey

  32. WE CAN START WITH THE DATABASE

  33. TEMPORAL DATABASES PRESENTING

  34. ” “ — Crux and the Temporal Databases WIR HABEN

    VIELE ZEIT FÜR DICH!
  35. None
  36. None
  37. GIT IS A DISTRIBUTED, TEMPORAL DATABASE!

  38. DATAHIKE

  39. Traditional View Temporal View Create Assert Read Read Update Accumulate

    Delete Retract
  40. CRUD "-> ARAR

  41. WHAT IS IT? ARAR! ☠ • Assertions are granular statements

    of facts • Reads are always performed against an immutable database value at a particular point in time. Time is globally ordered in a database via ACID properties • New transactions only Accumulate new data. Existing facts never change • Retractions state that an assertion no longer holds at some later point in time. The original remains unchanged
  42. [[:crux.tx/put {:crux.db/id :cities/Berlin :capital? true :population 3769000 :country “Germany”}]]

  43. [[:crux.tx/put {:crux.db/id :cities/Barcelona :population 5575200 :country “Spain”}]]

  44. [[:crux.tx/put {:crux.db/id :cities/Berlin :capital? true :population 3769200 (+200) :country “Germany”}]]

  45. [[:crux.tx/put {:crux.db/id :cities/Berlin :capital? true :population 3769200 :country “Germany” :wall?

    true}]]
  46. [[:crux.tx/put {:crux.db/id :cities/London :capital? true :population 8892000 :country “United Kingdom”}]]

  47. [[:crux.tx/put {:crux.db/id :cities/London :capital? true :population 8893000 (+1000) :country “United

    Kingdom”}]]
  48. [[:crux.tx/put {:crux.db/id :cities/Berlin :capital? true :population 3769200 :country "Germany"}]]

  49. (crux/entity-history db :cities/Berlin)

  50. [{:crux.db/id :cities/Berlin :capital? true :population 3769000 :country “Germany”} {:crux.db/id :cities/Berlin

    :capital? true :population 3769200 :country “Germany”} {:crux.db/id :cities/Berlin :capital? true :population 3769200 :country “Germany” :has-wall? true} {:crux.db/id :cities/Berlin :capital? true :population 3769200 :country “Germany”}]
  51. (crux/q db '{:find [city] :where [[city :capital? true]]})

  52. !#{[:cities/Berlin] [:cities/London]}

  53. [[:crux.tx/put {:crux.db/id :cities/Berlin :population 3769200 :country “Germany”}]]

  54. (crux/q db '{:find [city] :where [[city :capital? true]]})

  55. !#{[:cities/London]}

  56. (crux/q (as-of db older-time) '{:find [city] :where [[city :capital? true]]})

  57. !#{[:cities/Berlin] [:cities/London]}

  58. TIME TRAVEL IS HERE!

  59. None
  60. None
  61. NOT ONLY OUR CODE BUT THINKING TOO? WHAT DOES THIS

    CHANGE FOR US? • Time is a first class citizen, the whole DB can be frozen in time and inspected • Database as a value: treat your DB as any data structure and not a connection • OLTP and OLAP or operational and analytics databases can be fully merged • Since queries and storage are cleanly separated, the DB never grinds to a halt during loads. Decomposed nature makes it extremely scalable • There is no global locking and all queries are local • Not only the DB design is simpler, but our apps are much simpler too • Much much simpler incidental complexity of infra as the DB is the one and only source of historical truth
  62. TIME TRAVELLING CI/CD?

  63. BUILDING A PLATFORM BOB THE BUILDER • CI/CD Platform •

    Unbundled design offering un- opinionated scaling • Externally and limitlessly scalable • Powered by Crux • https://bob-cd.github.io/
  64. CI/CD AS A VALUE TEMPORAL DEBUGGING /QUERY = ARBITRARY DATALOG

    + TIME
  65. THE ACTUAL SOURCES OF TRUTH MORE RESOURCES • Nena’s 99

    Luftballons: https://www.youtube.com/ watch?v=7aLiT3wXko0 • https://opencrux.com • https://www.datomic.com/ • https://github.com/replikativ/datahike • https://en.wikipedia.org/wiki/Temporal_database • https://en.wikipedia.org/wiki/Datalog • Database as a Value: https://www.youtube.com/ watch?v=EKdV1IgAaFc • Designing Crux: https://www.youtube.com/watch? v=YjAVsvYGbuU • Datahike: https://www.youtube.com/watch? v=Hjo4TEV81sQ
  66. None
  67. None