A Guide to the Post Relational Revolution

A GUIDE TO THE POST RELATIONAL REVOLUTION @iconara

speakerdeck.com/u/iconara (real time!)

Theo / @iconara

Chief Architect at Co-organizer of the local Ruby, Scala and
JavaScript user groups More rep on StackOverflow than both Jeff & Joel

THE WORLD ISN’T FLAT

OUT IS THE NEW UP when scaling up you’re constrained
by Moore’s Law

DISTRIBUTED SYSTEMS ARE ABOUT TRADEOFFS

WHO NEEDS ACID, ANYWAY? banks, perhaps

JOINS ARE A CRUTCH why split up your data, if
all you’re going to do is assemble it over and over again?

OBJECTS DON’T FIT IN TABLES can you say “impedance mismatch”?

40 YEARS IS A LONG TIME you didn’t have 256
gigabytes of RAM in 1970

THE RELATIONAL MODEL ISN’T A GOLDEN HAMMER the existence of
object relational mappers should be proof enough

WELCOME TO THE POST RELATIONAL REVOLUTION

POST RELATIONAL STORAGE

KEY/VALUE STORES the simplest possible database, not exactly a new
idea

VALUE KEY OPAQUE Riak, Voldemort, LevelDB, Tokyo Cabinet, Berkeley DB

STRUCTURED KEY/VALUE STORES sometimes you need just a little bit
more

the Bigtable model, “column oriented”, “sparse tables” found in Cassandra
and HBase COLUMN KEY ROW KEY VALUE COLUMN KEY VALUE + TIMEST AMP SORTED

“datastructure server”, e.g. Redis KEY VALUE VALUE VALUE LIST OR
SET KEY VALUE VALUE VALUE SORTED SET OR HASH KEY KEY KEY KEY VALUE INCREMENT , APPEND, SLICE, CAS

DOCUMENT DATABASES object databases, but for hipsters

complex objects with lists, numbers, strings secondary indexes* and partial
updates, MongoDB, CouchDB, RavenDB, Lotus Notes * subject to availability { "firstName": "John", "lastName": "Smith", "age": 25, "address": { "streetAddress": "21 2nd Street", "city": "New York", "state": "NY", "postalCode": "10021" }, "phoneNumber": [ { "type": "home", "number": "212 555-1234" }, { "type": "cell", "number": "646 555-4567" } ] }

GRAPH DATABASES relational, for real

traversal algorithms, extreme data complexity, Neo4j, AllegroGraph, FlockDB NODE NODE
NODE NODE NODE NAME + PROPERTIES NAME

DIVERSITY I haven’t even mentioned search & indexing systems like
Solr and Elastic Search, or distributed ﬁlesystems

SOMETIMES TABLES ARE GREAT, TOO but mostly when you rely
heavily on GROUP BY, SUM, AVG, etc. and can’t precompute

POST RELATIONAL SCALING

CONSISTENCY AVAILABILITY PARTITION TOLERANCE (choose any two)

PARTITION TOLERANCE ISN’T OPTIONAL

CONSISTENCY VS. AVAILABILITY (but in reality, it’s not even that
simple)

CONSISTENCY you can always read what you just wrote, but
keys may become unavailable

AVAILABILITY you can always read and write, but you may
not always get the latest value

NOT EITHER OR most databases let you choose on a
query-by-query basis

SHARDING scaling writes in a consistent system

divide the keyspace into shards, or regions (and store each
one redundantly) SHARD SHARD SHARD KEYSPACE REPLICA REPLICA REPLICA REPLICA REPLICA REPLICA REPLICA REPLICA REPLICA DIVIDED BY DA T A SIZE Z A

split a shard when it grows too big, move one
of the new shards onto a new node SHARD SHARD SHARD KEYSPACE REPLICA REPLICA REPLICA REPLICA REPLICA REPLICA REPLICA REPLICA REPLICA SPLIT SHARD REPLICA REPLICA REPLICA Z A

in reality there’s chunks, tablets or “virtual shards” that are
distributed over physical shards SHARD SHARD SHARD KEYSPACE REPLICA REPLICA REPLICA REPLICA REPLICA REPLICA REPLICA REPLICA REPLICA SHARD REPLICA REPLICA REPLICA Z A

HBASE, MONGODB sharding is easy in theory, hard in practice,
lots data needs to be moved when adding nodes

CONSISTENT HASHING scaling writes in an available system

each node is responsible for a range of the keyspace,
keys are hashed and mapped to the first following node, (optionally) replicated to subsequent nodes KEYSPACE NODE NODE NODE NODE hash(key) replication 0 2n

KEYSPACE NODE NODE NODE NODE NODE NEW NODE 0 2n
when a new node is added, only part of the keyspace needs to be moved

KEYSPACE NODE NODE NODE NODE NODE 0 2n in practice,
“virtual nodes” are evenly distributed over the keyspace, and then mapped onto physical nodes

CASSANDRA, RIAK perfect balance, in theory, but rings may still
need rebalancing

GOSSIP , HINTED HANDOFF , LOG STRUCTURED STORAGE, COMPACTION, VECTOR
CLOCKS, READ REPAIR, JOURNALING, QUORUMS, EVENTUAL CONSISTENCY, DYNAMO, MAP/REDUCE, 2PC a few of the things I haven’t mentioned, look them up

LESSONS LEARNED

EVERYTHING THEY TAUGHT YOU ABOUT DATABASES AT UNIVERSITY IS WRONG
almost

THINK ABOUT YOUR QUERIES FIRST don’t optimize for insertion, denormalize
heavily, disk is cheap, this ain’t 1970

GIVE A LOT OF THOUGHT TO YOUR PRIMARY KEYS range
queries over cleverly designed primary keys can be very powerful, good keys required for efficient sharding

M04L7NOC5NQS M04L7O05MIU2 M04NX42YFUCR M04NYR7VWKJC M04NZA8MJOOA M04NZB88CT14 M04NZPOCE8DM M04NZQ9G2T0S M04NZQE7E5VX M04NZSK4V3JN
M04NZTRG661R M04NZTSUITJ7 M04NZUAILUS5 M04NZUG4DTXN M04NZWB9VV0C M04NZWW52T8N M04NZX2JEVO9 M04NZX7WD77W M04NZXGOLDEX M04NZXKNQWB3 M04NZXLGJ3M6 M04NZY7GO39G M04NZZ2SQF1I M04O013HN9L9 M04O014DASE6 M04O02PE8AD3 M04O02PGJBR1 M04O03UPTRWG M04O04833ZTL M04O04GH21JF M04O04JQ8B57 M04O04UHK3U4 M04O056QBNBH M04O05E8XO8N M04O069O8CDK M04O06MG47WK M04O07BHELVD M04O07F30WYX M04O0B39DGEA

M04NZW B9VV0C timestamp 2012-02-28 23:59:56 UTC random number 681 731
004

B9VV0C M04NZW timestamp 2012-02-28 23:59:56 UTC random number 681 731
004

CONSISTENCY IS OVERRATED when you need it you need it,
but most of the time you don’t

DELETING DATA IS NOT TRIVIAL sometimes delete operations can be
more costly than inserts, design your cleaning process early

REDIS MONGODB CASSANDRA our current toolbox

REDIS swiss army knife, we use it for “virtual memory”,
counters and even messaging

REDIS not distributed (yet), no automatic failover

MONGODB a very good replacement for MySQL, replication and automatic
failover is fantastic

MONGODB global write lock kills performance, easily fragmented, sharding is
complex and (has been) very buggy

MONGODB we use it for precomputing and storing metrics for
our reporting app

MONGODB we’re currently pushing around 5K updates/s over three replica
sets, each update incrementing up to 20 numbers

CASSANDRA low level building blocks, no single point of failure,
great horizontal scalability, TTL on values

CASSANDRA we use it to store data about website visits,
indexing it to support complex queries

CASSANDRA millions of rows, some with millions of columns, adding
~1K new every second

one million writes per second

LEARN SOMETHING NEW TODAY nosql.mypopescu.com highscalability.com nosqltapes.com

KTHXBAI twitter.com/iconara speakerdeck.com/u/iconara architecturalatrocities.com burtcorp.com

A Guide to the Post Relational Revolution

A Guide to the Post Relational Revolution

More Decks by Theo Hultberg

Other Decks in Programming

Featured

Transcript