GLOBAL, SCALABLE, RESILIENT SERVICES ▸ Make life easier for humans. Low-touch and highly automated distributed DB for operators yet simple to reason about for developers. ▸ Industry-leading consistency, even on massively scaled deployments. Enabling distributed transactions, yet removing the pain of eventual consistency issues. ▸ Always-on database that accepts reads and writes on all nodes without generating conflicts. ▸ Flexible deployment in any environment, without tying you to any platform or vendor. ▸ Support familiar tools for working with relational data (i.e., SQL).
OF INCONSISTENCIES ▸ Emin Gün Sirer, an Associate Professor at Cornell University, wrote a blog post blaming eventually consistent data stores for the lost bitcoins. He mentions MongoDB, Cassandra and Riak among the NoSQL solutions that are vulnerable to banking thefts because:
REPLICATION ▸ Raft ▸ Commit when quorum has written data (nodes > = 3) ▸ 1 consensus group per data range ution replication protocol (Raft) transactions t, repair, rebalance Gossip / Raft SQL API Distributed, transactional KV Node 1 Node 2 Node 3
AVAILABILITY ▸ “CONSISTENCY” - ACID semantics & CAP theorem. Data should be anomaly-free. ▸ “CLUSTER” - Single logical DB, multiple nodes, joined together to form a uniform consistent cluster. ▸ “RANGE” - All data (tables, indices, etc) as a giant sorted map of KV pairs; 64 MB in size per range; Auto split and balanced across nodes. ▸ “LEASEHOLDER” - Ranges replicated across nodes & 1 replica of range holds “range lease”; coordinates all reads/writes to that range.
AVAILABILITY ▸ “RAFTLEADER” - 1 replica for each range is “leader” for writes; Leader coordinates all writes to raft group (followers); Quorum consensus for writes. Leaseholder usually also Raftleader. When a write doesn't achieve consensus, forward progress halts to maintain consistency within the cluster. Ensures ACID semantics for multi-tables data operations are consistent. ▸ “REPLICATION” - Synchronous distribution of copies of data that are ensured to be consistent. ▸ “MULTI-ACTIVE AVAILABILITY” - Each node in cluster can handle reads/ writes for a subset of stored data (range and quorum consensus concepts). Symmetrical -> works nicely with load balancers over SQL API (PostgreSQL wire format).
Write committed when 2 out of 3 nodes have written data Follower apricot banana blueberry cherry grape Follower apricot banana blueberry grape Consensus Replication Put “cherry” Leader apricot banana blueberry cherry grape Put “cherry”
IS NEEDED? ▸ Production code -> hex package: postgrex_cdb (yet to support correlated sub- queries) ▸ Test code -> hex package: ecto_replay_sandbox ▸ No prod code changes!!! ▸ Test code changes.