Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Rethinking the Database

Rethinking the Database

ACID • Functional • Scalable • Programmable

https://www.datomic.com/

Joe R. Smith

April 03, 2018
Tweet

More Decks by Joe R. Smith

Other Decks in Programming

Transcript

  1. Rigid Model “Person can belong to multiple clubs” join table

    person table club table FK constraints foreign keys ids
  2. Complexity • Types • Inherent: inherent to the problem •

    Incidental: accidental, arising from naive or suboptimal implementation, often self-inflicted
  3. Complexity • Major source of incidental complexity is conflating or

    combining orthogonal/independent concerns • Complect*: interleave, entwine, braid • Decomplect: To make simple by separating orthogonal concerns *Simple Made Easy (Rich Hickey, 2011)
  4. Time and State • Axioms • Time flows linearly in

    one direction • The world exists as a series of discrete, ordered states • The past is immutable • You cannot know the state of another process/ system instantaneously
  5. Time and State • Consequences • There is no such

    thing as sharing current data, only the possibility of sharing consistent data • Current only exists on a single machine, on a single core, within a single thread • Or, you stop the world to emulate this
  6. Time and State • Corollary: There is no such thing

    as stale data if it is part of a historical state (e.g., versioned/timestamped). • If consistency is a requirement for some datasource, then you need: • A single Source of Truth • Linearized writes
  7. Old Assumptions • Memory is expensive • Storage is expensive

    • Machines are precious • Resources are dedicated
  8. Datomic • Functional • Local, lazy reads • Serialized ACID

    • Time-aware, accumulates changes • Elastic read scaling • Flexible, universal attribute schema • Programmable
  9. Transience Persistence Sharing Difficult Trivial Distribution Difficult Trivial Concurrency Difficult

    Trivial Access Pattern Eager Eager or Lazy Caching Difficult Trivial
  10. Scalar Types symbol blah keyword :thing string "hello, architects!" integer

    12345 float 3.141 exponent notation 2e5 regular expression #"^hello,\s\w+" booleans true false undefined nil
  11. Collections list '(1 2 3 "badger") vector [1 2 3

    "squirrel!"] map {:name "Bob" :age 34} set #{"cat" "dog" "snake"}
  12. Expressions (filter odd? (map inc [7 8 9 10 11

    12])) => [9 11 13] (->> [7 8 9 10 11 12] (map inc) (filter odd?))
  13. Special Forms (def symbol init?) define symbol (if test then

    else?) conditional, yields `then` or `else` (do exprs*) evaluate expressions, return last (let [bindings* ] exprs*) lexical context (quote form) yields an unevaluated form (unquote form) unquotes a quoted form (var symbol) the var object named by a symbol (fn name? ([params*] cond-map? exprs*)+) defines a function (loop [bindings* ] exprs*) like let, but provides a recursion target (recur exprs*) evaluates expressions and rebinds at recur point (throw expr) evaluates expr and throws result (try expr* catch-clause* finally-clause?) try/catch/finally semantics . new set! java[script] interop
  14. Functional, Lazy, "Peers" (require '[datomic.api :as d]) ;; Connect to

    a remote database (def conn (d/connect "datomic:ddb://us-east-1/my-db")) ;; Fetch the current database value (def db (d/db conn)) ;; Evaluate a query (d/q <some query> db) ;; Join across multiple databases (d/q <some query> db1, db2) ;; Retrieve an entity (d/entity db <identifier>)
  15. Functional, Lazy "Peers" ;; Connect to a remote database (def

    conn (d/connect "datomic:ddb://us-east-1/my-db")) mem sql dev cassandra Supports multiple storage backends
  16. Functional, Lazy "Peers" ;; Fetch the current database value (def

    db (d/db conn)) Immutable, lazy database value at time t
  17. Local Query ;; Evaluate a query (d/q <some query> db)

    Queries a database value, not a transient connection
  18. Local Query ;; Join across multiple databases (d/q <some query>

    db1, db2) Query multiple, potentially disparate data sources
  19. Entity API ;; Direct entity access (d/entity db <identifier>) Entity

    id or lookup ref Returns a lazy map of an entity's attributes
  20. Serialized ACID Transactions ;; Direct entity access (def result (d/transact

    conn <some tx data>) Enqueues transaction on transactor
  21. Serialized ACID Transactions ;; Direct entity access (def result (d/transact

    conn <some tx data>) (keys @result) ! (:db-before :db-after :tx-data :tempids) Returns the database values before and after the transaction
  22. Time-aware ;; View the db, as of a week ago

    (def db-before (d/as-of db <1 week ago>)) ;; Or, pretend to transact new data (def db-later (d/with db <proposed tx data>)) ;; History of a db, in its entirety (d/history db) ;; Listen for new changes (d/tx-report-queue conn) ;; Or, process old ones (d/tx-range (d/log conn) <1 week ago> nil)
  23. Time-aware ;; View the db, as of a week ago

    (def db-before (d/as-of db <1 week ago>)) The database value as of a point in time
  24. Time-aware ;; Or, pretend to transact new data (def db-later

    (d/with db <proposed tx data>)) A speculative database value
  25. Time-aware ;; History of a db, in its entirety (d/history

    db) A database value with all historical events
  26. Time-aware ;; Or, process old ones (d/tx-range (d/log conn) <1

    week ago> nil) Look at a range of transactions from the transaction log
  27. DB Server Query Trans- actions Indexing I/O Storage Cache App

    Process App Results Strings DDL + DML Traditional (SQL) Database • Query, indexing, transactions, I/O, and storage all colocated • Queries must be performed eagerly, for consistency • e.g., MVCC is transient historical state isolation only during query • Updates performed in-place • Complects reads and writes
  28. Transactor Indexing Trans- actions Transactor Indexing Trans- actions Data Segments

    Peer App Process Peer Lib Query Cache App Live Index Comm Storage service (DynamoDB) Segment storage Standby Peer App Process Peer Lib Query Cache App Live Index Comm Peer App Process Peer Lib Query Cache App Live Index Comm elasticache cluster (optional) Data Segments Datomic Operational Model • Read/Query scales with the number of peers • Every peer sees completed transactions as of a particular point in time • Peers always see all transactions up to their time basis, in order, with no gaps • Peers can, optionally, coordinate on a time basis
  29. Programmable ;; A simple function to perform a query (defn

    customers-who-bought [db product-id] (d/q '[:find ?customer :in $ ?id :where [?customer :customer/orders ?order] [?order :order/items ?item] [?item :product/id ?id]] db product-id))
  30. Programmable ;; A simple function to perform a query (defn

    customers-who-bought [db product-id] (d/q '[:find ?customer :in $ ?id :where [?customer :customer/orders ?order] [?order :order/items ?item] [?item :product/id ?id]] db product-id)) Database value as an argument to the function
  31. Programmable ;; A simple function to perform a query (defn

    customers-who-bought [db product-id] (d/q '[:find ?customer :in $ ?id :where [?customer :customer/orders ?order] [?order :order/items ?item] [?item :product/id ?id]] db product-id)) The query is data, not strings
  32. Datalog • Queries are expressed in Datalog • Equivalent to

    Relational Model + Recursion • No clause order dependency • Guaranteed termination • Pattern-matching style easy to learn
  33. A Sample Database Entity Attribute Value Tx 71 :customer/email [email protected]

    148 72 :customer/email [email protected] 150 72 :customer/orders 93 180 72 :customer/orders 108 185 93 :order/items 120 180 120 :product/id "PROD123" 141 120 :product/name "Generic Product" 141 120 :product/description "good for all the things" 141 ⋮ ⋮ ⋮ ⋮
  34. A Sample Database Entity Attribute Value Tx Added? 72 :customer/email

    [email protected] 134 true 71 :customer/email [email protected] 148 true 72 :customer/email [email protected] 150 false 72 :customer/email [email protected] 150 true 72 :customer/orders 93 180 true 72 :customer/orders 108 185 true 93 :order/items 120 180 true 120 :product/id "PROD123" 141 true 120 :product/name "Generic Product" 141 true 120 :product/description "good for all the things" 141 true ⋮ ⋮ ⋮ ⋮
  35. A Sample Database Entity Attribute Value Tx Added? 72 :customer/email

    [email protected] 134 true 71 :customer/email [email protected] 148 true 72 :customer/email [email protected] 150 false 72 :customer/email [email protected] 150 true 72 :customer/orders 93 180 true 72 :customer/orders 108 185 true 93 :order/items 120 180 true 120 :product/id "PROD123" 141 true 120 :product/name "Generic Product" 141 true 120 :product/description "good for all the things" 141 true ⋮ ⋮ ⋮ ⋮
  36. Datoms [entity attribute value tx added?] • Entity: A unique

    entity id. • Attribute: Names a piece of data associated with an entity, has schema. • Value: A scalar or reference (type defined by the attribute's schema). • Tx: A transaction id. Refers to a transaction entity! • Added?: A boolean declaring whether this Datom is asserting or retracting a fact about its entity.
  37. Data Patterns [?customer :customer/email ?email] Entity Attribute Value Tx 71

    :customer/email [email protected] 148 72 :customer/email [email protected] 150 72 :customer/orders 93 180 72 :customer/orders 108 185 ⋮ ⋮ ⋮ ⋮
  38. Data Patterns [72 :customer/email ?email] Entity Attribute Value Tx 71

    :customer/email [email protected] 148 72 :customer/email [email protected] 150 72 :customer/orders 93 180 72 :customer/orders 108 185 ⋮ ⋮ ⋮ ⋮
  39. Data Patterns [72 ?attribute] Entity Attribute Value Tx 71 :customer/email

    [email protected] 148 72 :customer/email [email protected] 150 72 :customer/orders 93 180 72 :customer/orders 108 185 ⋮ ⋮ ⋮ ⋮
  40. Data Patterns [72 ?attribute ?value] Entity Attribute Value Tx 71

    :customer/email [email protected] 148 72 :customer/email [email protected] 150 72 :customer/orders 93 180 72 :customer/orders 108 185 ⋮ ⋮ ⋮ ⋮
  41. Query API (d/q '[:find ?customer :where [?customer :customer/orders ?order] [?order

    :order/items ?item] [?item :product/id "PROD123"]] db)
  42. Query API (d/q '[:find ?customer :where [?customer :customer/orders ?order] [?order

    :order/items ?item] [?item :product/id "PROD123"]] db) Query
  43. Query API (d/q '[:find ?customer :where [?customer :customer/orders ?order] [?order

    :order/items ?item] [?item :product/id "PROD123"]] db …) Inputs
  44. Parameterized Query (d/q '[:find ?customer :in $db ?product-id :where [?db

    ?customer :customer/orders ?order] [?db ?order :order/items ?item] [?db ?item :product/id ?product-id]] db, "PROD123")
  45. Parameterized Query (d/q '[:find ?customer :in $ ?product-id :where [?customer

    :customer/orders ?order] [?order :order/items ?item] [?item :product/id ?product-id]] db, "PROD123") Implicit DB in clauses
  46. Predicates (d/q '[:find ?product :where [?product :product/price ?price] [(> ?price

    100.00)]] db) "Find all products that cost more than $100.00"
  47. Functions (defn shipping “Estimate the cost to ship goods to

    a given zip code.” [zipcode weight] (* (cost-per-lb-by-zip zipcode) weight)) Just a function
  48. Functions (d/q '[:find ?customer ?product :where [?customer :customer/address ?addr] [?addr

    :address/zipcode ?zip] [?product :product/weight ?weight] [?product :product/price ?price] [(shipping ?zip ?weight) ?ship-cost] [(<= ?price ?ship-cost)]] db) “Find customer/product combos where shipping cost dominates the product price”
  49. Functions (d/q '[:find ?customer ?product :where [?customer :customer/address ?addr] [?addr

    :address/zipcode ?zip] [?product :product/weight ?weight] [?product :product/price ?price] [(shipping ?zip ?weight) ?ship-cost] [(<= ?price ?ship-cost)]] db) Navigate from customer to zip code
  50. Functions (d/q '[:find ?customer ?product :where [?customer :customer/address ?addr] [?addr

    :address/zipcode ?zip] [?product :product/weight ?weight] [?product :product/price ?price] [(shipping ?zip ?weight) ?ship-cost] [(<= ?price ?ship-cost)]] db) Retrieve product weights and prices
  51. Functions (d/q '[:find ?customer ?product :where [?customer :customer/address ?addr] [?addr

    :address/zipcode ?zip] [?product :product/weight ?weight] [?product :product/price ?price] [(shipping ?zip ?weight) ?ship-cost] [(<= ?price ?ship-cost)]] db) Retrieve estimated shipping cost
  52. Functions (d/q '[:find ?customer ?product :where [?customer :customer/address ?addr] [?addr

    :address/zipcode ?zip] [?product :product/weight ?weight] [?product :product/price ?price] [(shipping ?zip ?weight) ?ship-cost] [(<= ?price ?ship-cost)]] db) Constrain by price <= shipping cost
  53. Entity API 71 :customer/email [email protected] 148 72 :customer/email [email protected] 150

    72 :customer/orders 93 180 72 :customer/orders 108 185 93 :order/items 120 180 120 :product/id "PROD123" 141 120 :product/name "Generic Product" 141 120 :product/description "good for all the things" 141 ⋮ ⋮ ⋮ ⋮ (d/entity db <identifier>)
  54. Entity API ⋮ ⋮ ⋮ ⋮ 72 :customer/email [email protected] 150

    72 :customer/orders 93 180 72 :customer/orders 108 185 ⋮ ⋮ ⋮ ⋮ (def result (d/entity db 72)) ! {:db/id 72 :customer/email "[email protected]" :customer/orders [{:db/id 93} {:db/id 108}]} Lazy, immutable, map-like view of entity’s attributes & values
  55. Entity API ⋮ ⋮ ⋮ ⋮ 72 :customer/email [email protected] 150

    72 :customer/orders 93 180 72 :customer/orders 108 185 ⋮ ⋮ ⋮ ⋮ (def result (d/entity db 72)) ! {:db/id 72 :customer/email "[email protected]" :customer/orders [{:db/id 93} {:db/id 108}]} Lazy values as of db's time t
  56. Entity API (-> result :customer/orders first :order/items) 71 :customer/email [email protected]

    148 72 :customer/email [email protected] 150 72 :customer/orders 93 180 72 :customer/orders 108 185 93 :order/items 120 180 120 :product/id "PROD123" 141 120 :product/name "Generic Product" 141 120 :product/description "good for all the things" 141 ⋮ ⋮ ⋮ ⋮
  57. Entity API (-> result :customer/orders first :order/items) 71 :customer/email [email protected]

    148 72 :customer/email [email protected] 150 72 :customer/orders 93 180 72 :customer/orders 108 185 93 :order/items 120 180 120 :product/id "PROD123" 141 120 :product/name "Generic Product" 141 120 :product/description "good for all the things" 141 ⋮ ⋮ ⋮ ⋮
  58. Entity API (-> result :customer/orders first :order/items) 71 :customer/email [email protected]

    148 72 :customer/email [email protected] 150 72 :customer/orders 93 180 72 :customer/orders 108 185 93 :order/items 120 180 120 :product/id "PROD123" 141 120 :product/name "Generic Product" 141 120 :product/description "good for all the things" 141 ⋮ ⋮ ⋮ ⋮
  59. Entity API (-> result :customer/orders first :order/items) ! [{:db/id 120

    :product/id "PROD123" :product/name "Generic Product" :product/description "good for all the things"}] ⋮ ⋮ ⋮ ⋮ 120 :product/id "PROD123" 141 120 :product/name "Generic Product" 141 120 :product/description "good for all the things" 141 ⋮ ⋮ ⋮ ⋮
  60. Entity API (def product (d/entity db 120)) (-> product :order/_items)

    Navigate a relationship backwards “Find all orders that included product”
  61. Pull API (d/pull db <pull-pattern> <entity-identifier>) declarative way to make

    hierarchical selections of information about entities
  62. Pull API (d/pull db [:customer/email {:customer/orders [{:order/items [:product/name :product/description]}]}] 72)

    ! {:customer/email "[email protected]" :customer/orders [{:order/items [{:product/name "Generic Product" :product/description "good for all the things"]} {:order/items [{:product/name "Another Product" :product/description "does some things"]}]}
  63. Built-in Transaction Functions [:db.fn/retractEntity john] expands to [[:db/retract john :user/friends

    bob] [:db/retract john :user/friends stu] [:db/retract john :user/email "[email protected]"] …]
  64. Custom Transaction Functions • Run inside transaction • Can access

    current DB-value • Expand into 1+ assertions/retractions
  65. Custom Transaction Functions (defn inc [db e attr val] (let

    [entity (d/entity db e) prev (get entity attr)] [[:db/add e attr (+ prev val)]])) Just a function of the current db value and some args
  66. Custom Transaction Functions (defn inc [db e attr val] (let

    [entity (d/entity db e) prev (get entity attr)] [[:db/add e attr (+ prev val)]])) Just a function of the current db value and some args
  67. Custom Transaction Functions (defn inc [db e attr val] (let

    [entity (d/entity db e) prev (get entity attr)] [[:db/add e attr (+ prev val)]])) Just a function of the current db value and some args
  68. Custom Transaction Functions (defn inc [db e attr val] (let

    [entity (d/entity db e) prev (get entity attr)] [[:db/add e attr (+ prev val)]])) Just a function of the current db value and some args
  69. Custom Transaction Functions (defn inc [db e attr val] (let

    [entity (d/entity db e) prev (get entity attr)] [[:db/add e attr (+ prev val)]])) Just a function of the current db value and some args
  70. Transaction data (let [joe "joe-tempid"] [[:db/add joe :user/name “Joe Smith”]

    [:db/add joe :user/email “[email protected]”] [:db/add joe :user/website “solussd.io”]])
  71. Reified Transactions Transactions are, themselves, reified as entities in the

    database {:db/txInstant #inst"2018-02-19T23:09:57.488-00:00"}
  72. Reified Transactions Add annotations to transactions! [[:db/add "datomic.tx" :data/source "wikipedia"]

    <other tx data>] {:db/txInstant #inst"2018-02-19T23:09:57.488-00:00" :data/source "wikipedia"}
  73. “Person can belong to multiple clubs” join table person table

    club table FK constraints foreign keys ids [?person :club ?club]
  74. Modeling News Stories Attribute Type Cardinality Component? Unique? :story/title string

    1 :story/url string 1 :story/slug string 1 :news/comment ref many ✓ Schema defines type and other schema attributes at the attribute level
  75. Modeling News Stories Attribute Type Cardinality Component? Unique? :story/title string

    1 :story/url string 1 :story/slug string 1 :news/comment ref many ✓ {:db/ident :story/title :db/valueType :db.type/string :db/cardinality :db.cardinality/one …} Just short-form transaction data!
  76. Modeling News Stories Attribute Type Cardinality Component? Unique? :story/title string

    1 :story/url string 1 :story/slug string 1 :news/comment ref many ✓ {:db/ident :news/comment :db/valueType :db.type/ref :db/cardinality :db.cardinality/many :db/isComponent true …} The referenced entity is a component-of its parent
  77. Schema is just data “What are all the attributes in

    the database?” [?e :db/valueType]
  78. Modeling Users Attribute Type Cardinality Component? Unique? :user/fullName string 1

    :user/email string 1 Identity :user/votes ref Many A user's email is unique and serves as an identity for the entity
  79. Modeling Users Attribute Type Cardinality Component? Unique? :user/fullName string 1

    :user/email string 1 Identity :user/votes ref Many {:db/ident :user/email :db/valueType :db.type/string :db/cardinality :db.cardinality/one :db/unique :db.unique/identity …} Could also be db.unique/value
  80. Modeling Comments Attribute Type Cardinality Component? Unique? :comment/body string 1

    :comment/author ref 1 :news/comment ref Many ✓ "ref" type does not dictate attributes
  81. Modeling Comments Attribute Type Cardinality Component? Unique? :comment/body string 1

    :comment/author ref 1 :news/comment ref Many ✓ How do you ask for all of a user’s comments?
  82. Relationships are Bi-directional ;; Get a comment’s author (:comment/author some-comment-entity)

    ;; Get an author’s comments (:comment/_author some-author-entity) Navigate a relationship backwards
  83. Indexes Structure Name Included? Row EAVT All Datoms Column AEVT

    All Datoms Lookup AVET “Indexed” attributes Graph VAET :db.type/ref attributes
  84. Modeling Comments Attribute Type Cardinality Component? Unique? :comment/body string 1

    :comment/author ref 1 :news/comment ref Many ✓ How do you find all comments for a story?
  85. Recursive Queries (def rules '[[(story-comment ?story ?comment) [?story :story/title] [?story

    :news/comment ?comment]] [(story-comment ?story ?comment) [?parent :news/comment ?comment] (story-comment ?story ?parent)]] (d/q '[:find ?comment :in $ % ?story :where [(story-comment ?story ?comment)]] db, rules story-id)
  86. Transactor Indexing Trans- actions Transactor Indexing Trans- actions Data Segments

    Peer App Process Peer Lib Query Cache App Live Index Comm Storage service (DynamoDB) Segment storage Standby Peer App Process Peer Lib Query Cache App Live Index Comm Peer App Process Peer Lib Query Cache App Live Index Comm elasticache cluster (optional) Data Segments Peers • Local query • High performance • Use arbitrary functions in query • "Heavy" peers • Peers are part of the database • Peers must run on the JVM
  87. Transactor Indexing Trans- actions Transactor Indexing Trans- actions Data Segments

    Peer App Process Peer Lib Query Cache App Live Index Comm Storage service (DynamoDB) Segment storage Standby Peer App Process Peer Lib Query Cache App Live Index Comm Peer App Process Peer Lib Query Cache App Live Index Comm elasticache cluster (optional) Data Segments
  88. Client/Peer Server • Clients, (not peers) • Peer server and

    clients scale separately • Good fit for lightweight / ephemeral clients • "Warm" caches on peer server(s) • Option for generic web-servers and specialized peer servers (e.g., products vs analytics) to increase cache hits • Scale up/down web-servers w/o sacrificing cache hits • Clients can be written in/for any language/ runtime
  89. Production CF Stack Storage CF Stack Many-DB Storage System DynamoDB

    EFS S3 client (outside VPC) Amazon VPC creds Bastion SSH Route 53 Transaction Log Indexes Cache Client (in VPC) CloudFormation Stack Tx Group Cluster Node Cluster Node Cluster Node SSD Cache ALB CloudFormation Stack Cluster Node Cluster Node Cluster Node Analytics Query CF Cluster Node Cluster Node Cluster Node Dev Query CF txes A Cloud Native Database • Distributed, Highly Available, Horizontally scalable • Managed: provisioning, scaling, and multi-region replication • DB performance tuning, resource provisioning, etc. are abstracted away • Less operational complexity (learn one DB, not two) • Cloud-managed auth • Encryption in flight and at rest
  90. Production CF Stack Storage CF Stack Many-DB Storage System DynamoDB

    EFS S3 client (outside VPC) Amazon VPC creds Bastion SSH Route 53 Transaction Log Indexes Cache Client (in VPC) CloudFormation Stack Tx Group Cluster Node Cluster Node Cluster Node SSD Cache ALB CloudFormation Stack Cluster Node Cluster Node Cluster Node Analytics Query CF Cluster Node Cluster Node Cluster Node Dev Query CF txes
  91. Questions? Based on Datomic for the 96% by Stuart Halloway

    and Datomic for the 96% Redux by Ryan Neufeld @solussd – [email protected]