Upgrade to Pro — share decks privately, control downloads, hide ads and more …

What if your databases never forgot

Rahul De
November 04, 2020

What if your databases never forgot

All major mainstream databases are update in-place and have a notion of NOW rather than a progression of time. They are supposed to be a reflection of real life, but are they really? When you save a Customer and their address changes, does that mean they never used to live at the old place? What if your DB started behaving like a git repo instead and recorded each change as a series of changes or facts? What if you could time travel in the DB and look back at any point of time? And do it with the similar performance as any of the usual DBs? What if CRUD isn't the right way of thinking of DBs? Let's discuss about DBs which make this possible and not only radically change your thought process about data and facts but also simplify everything around the DB: the app, the infra and the monitoring tooling! Let's be functional at the disk too and not just the app.

Rahul De

November 04, 2020
Tweet

More Decks by Rahul De

Other Decks in Technology

Transcript

  1. ” “ — Nena, 99 Luftballons HAST DU ETWAS ZEIT

    FÜR MICH? DANN SINGE ICH EIN LEID FÜR DICH.
  2. I’M RAHUL DE HELLO, WORLD! • lispyclouds • Hopelessly in

    love with Functional Programming, Clojure and High Performance, Scalable Infrastructures • Make Simple Easy • Diversity, Sustainable Living, Anarcho- Communism, Teaching for Learning • ThoughtWorks Berlin • https://github.com/lispyclouds • https://twitter.com/lispyclouds
  3. THEY ARE IMMUTABLE DATA, INFORMATION AND FACTS • Durable log

    of events • Series of changes accreted over time • Ledger like properties • Log book, Bookkeeping? • Most of us don’t forget our history and where we came from • History is immutable (in most places)
  4. numbers = [1, 2, 3] doubles = [] for number

    in numbers: doubles.append(number * 2)
  5. db_conn = db_connect() def add_city(city_data): db_conn.insert(city_data) def get_city(city): return db_conn.get(city)

    def update_city(city, data): db_conn.update(city, data) def delete_city(city): db_conn.delete(city)
  6. DO THEY REALLY REPRESENT REAL LIFE? MAINSTREAM DATABASES • Update

    in-place • CRUD • UPDATE and ALTER • Ugly and globally mutable variable • Central and often elaborate locking and query engines • Object Relational Mappers and Relational Databases have a rigid view of data which is highly dynamic • Expensive queries affect all the cluster members • Inconsistencies due to separate operational and analytical databases
  7. AIN’T NOBODY GOT TIME IN THEM MAINSTREAM DATABASES • Have

    the notion of NOW and not how we got here • Update in-place and CRUD directly results in overwriting and forgetting of the past • UPDATE and ALTER causes in-place structural changes with similar effects as above • Changes are central and affect everyone regardless of the view others want • Due to the in-place updates, we often resort to dedicated operational and analytical instances as the access patterns are way different • Need for elaborate and often highly complex logging, append-only strategies and timestamps to remember the past
  8. THE MESS WE ARE IN HOW DID WE GET HERE?

    • Relic of the past: computing resources were really expensive and we needed to overwrite the bit of disk and RAM we had; resources are literally dirt cheap now • Rewritable memory directly resulted in the imperative paradigm we are in now • The notion of imperative instructions(CRUD) rather than declarative queries also results from this • Functional Programming and Immutability is actively tackling this issue in the language levels • The mainstream tooling like Java/C#/Kotlin etc though offering functional facilities still are quite imperative at core • Imperative tooling is a direct impedance to the scale needs of today • But regardless of the declarative languages the DB and concordantly the Disk Persistence are very very imperative and employ update in-place heavily
  9. WHAT IS IT? ARAR! ☠ • Assertions are granular statements

    of facts • Reads are always performed against an immutable database value at a particular point in time. Time is globally ordered in a database via ACID properties • New transactions only Accumulate new data. Existing facts never change • Retractions state that an assertion no longer holds at some later point in time. The original remains unchanged
  10. [{:crux.db/id :cities/Berlin :capital? true :population 3769000 :country “Germany”} {:crux.db/id :cities/Berlin

    :capital? true :population 3769200 :country “Germany”} {:crux.db/id :cities/Berlin :capital? true :population 3769200 :country “Germany” :has-wall? true} {:crux.db/id :cities/Berlin :capital? true :population 3769200 :country “Germany”}]
  11. NOT ONLY OUR CODE BUT THINKING TOO? WHAT DOES THIS

    CHANGE FOR US? • Time is a first class citizen, the whole DB can be frozen in time and inspected • Database as a value: treat your DB as any data structure and not a connection • OLTP and OLAP or operational and analytics databases can be fully merged • Since queries and storage are cleanly separated, the DB never grinds to a halt during loads. Decomposed nature makes it extremely scalable • There is no global locking and all queries are local • Not only the DB design is simpler, but our apps are much simpler too • Much much simpler incidental complexity of infra as the DB is the one and only source of historical truth
  12. BUILDING A PLATFORM BOB THE BUILDER • CI/CD Platform •

    Unbundled design offering un- opinionated scaling • Externally and limitlessly scalable • Powered by Crux • https://bob-cd.github.io/
  13. THE ACTUAL SOURCES OF TRUTH MORE RESOURCES • Nena’s 99

    Luftballons: https://www.youtube.com/ watch?v=7aLiT3wXko0 • https://opencrux.com • https://www.datomic.com/ • https://github.com/replikativ/datahike • https://en.wikipedia.org/wiki/Temporal_database • https://en.wikipedia.org/wiki/Datalog • Database as a Value: https://www.youtube.com/ watch?v=EKdV1IgAaFc • Designing Crux: https://www.youtube.com/watch? v=YjAVsvYGbuU • Datahike: https://www.youtube.com/watch? v=Hjo4TEV81sQ