Upgrade to Pro — share decks privately, control downloads, hide ads and more …

A Guide to the Post Relational Revolution

A Guide to the Post Relational Revolution

Presentation held at Scandinavian Developer Conference, April 2012

8c21306523b16ba5dd35c3549bf90994?s=128

Theo Hultberg

April 17, 2012
Tweet

Transcript

  1. A GUIDE TO THE POST RELATIONAL REVOLUTION @iconara

  2. speakerdeck.com/u/iconara (real time!)

  3. Theo / @iconara

  4. Chief Architect at Co-organizer of the local Ruby, Scala and

    JavaScript user groups More rep on StackOverflow than both Jeff & Joel
  5. THE WORLD ISN’T FLAT

  6. OUT IS THE NEW UP when scaling up you’re constrained

    by Moore’s Law
  7. DISTRIBUTED SYSTEMS ARE ABOUT TRADEOFFS

  8. WHO NEEDS ACID, ANYWAY? banks, perhaps

  9. JOINS ARE A CRUTCH why split up your data, if

    all you’re going to do is assemble it over and over again?
  10. OBJECTS DON’T FIT IN TABLES can you say “impedance mismatch”?

  11. 40 YEARS IS A LONG TIME you didn’t have 256

    gigabytes of RAM in 1970
  12. THE RELATIONAL MODEL ISN’T A GOLDEN HAMMER the existence of

    object relational mappers should be proof enough
  13. WELCOME TO THE POST RELATIONAL REVOLUTION

  14. POST RELATIONAL STORAGE

  15. KEY/VALUE STORES the simplest possible database, not exactly a new

    idea
  16. VALUE KEY OPAQUE Riak, Voldemort, LevelDB, Tokyo Cabinet, Berkeley DB

  17. STRUCTURED KEY/VALUE STORES sometimes you need just a little bit

    more
  18. the Bigtable model, “column oriented”, “sparse tables” found in Cassandra

    and HBase COLUMN KEY ROW KEY VALUE COLUMN KEY VALUE + TIMEST AMP SORTED
  19. “datastructure server”, e.g. Redis KEY VALUE VALUE VALUE LIST OR

    SET KEY VALUE VALUE VALUE SORTED SET OR HASH KEY KEY KEY KEY VALUE INCREMENT , APPEND, SLICE, CAS
  20. DOCUMENT DATABASES object databases, but for hipsters

  21. None
  22. complex objects with lists, numbers, strings secondary indexes* and partial

    updates, MongoDB, CouchDB, RavenDB, Lotus Notes * subject to availability { "firstName": "John", "lastName": "Smith", "age": 25, "address": { "streetAddress": "21 2nd Street", "city": "New York", "state": "NY", "postalCode": "10021" }, "phoneNumber": [ { "type": "home", "number": "212 555-1234" }, { "type": "cell", "number": "646 555-4567" } ] }
  23. GRAPH DATABASES relational, for real

  24. traversal algorithms, extreme data complexity, Neo4j, AllegroGraph, FlockDB NODE NODE

    NODE NODE NODE NAME + PROPERTIES NAME
  25. DIVERSITY I haven’t even mentioned search & indexing systems like

    Solr and Elastic Search, or distributed filesystems
  26. SOMETIMES TABLES ARE GREAT, TOO but mostly when you rely

    heavily on GROUP BY, SUM, AVG, etc. and can’t precompute
  27. POST RELATIONAL SCALING

  28. CAP

  29. CONSISTENCY AVAILABILITY PARTITION TOLERANCE (choose any two)

  30. OK?

  31. PARTITION TOLERANCE ISN’T OPTIONAL

  32. CONSISTENCY VS. AVAILABILITY (but in reality, it’s not even that

    simple)
  33. CONSISTENCY you can always read what you just wrote, but

    keys may become unavailable
  34. AVAILABILITY you can always read and write, but you may

    not always get the latest value
  35. NOT EITHER OR most databases let you choose on a

    query-by-query basis
  36. SHARDING scaling writes in a consistent system

  37. divide the keyspace into shards, or regions (and store each

    one redundantly) SHARD SHARD SHARD KEYSPACE REPLICA REPLICA REPLICA REPLICA REPLICA REPLICA REPLICA REPLICA REPLICA DIVIDED BY DA T A SIZE Z A
  38. split a shard when it grows too big, move one

    of the new shards onto a new node SHARD SHARD SHARD KEYSPACE REPLICA REPLICA REPLICA REPLICA REPLICA REPLICA REPLICA REPLICA REPLICA SPLIT SHARD REPLICA REPLICA REPLICA Z A
  39. in reality there’s chunks, tablets or “virtual shards” that are

    distributed over physical shards SHARD SHARD SHARD KEYSPACE REPLICA REPLICA REPLICA REPLICA REPLICA REPLICA REPLICA REPLICA REPLICA SHARD REPLICA REPLICA REPLICA Z A
  40. HBASE, MONGODB sharding is easy in theory, hard in practice,

    lots data needs to be moved when adding nodes
  41. CONSISTENT HASHING scaling writes in an available system

  42. each node is responsible for a range of the keyspace,

    keys are hashed and mapped to the first following node, (optionally) replicated to subsequent nodes KEYSPACE NODE NODE NODE NODE hash(key) replication 0 2n
  43. KEYSPACE NODE NODE NODE NODE NODE NEW NODE 0 2n

    when a new node is added, only part of the keyspace needs to be moved
  44. KEYSPACE NODE NODE NODE NODE NODE 0 2n in practice,

    “virtual nodes” are evenly distributed over the keyspace, and then mapped onto physical nodes
  45. CASSANDRA, RIAK perfect balance, in theory, but rings may still

    need rebalancing
  46. GOSSIP , HINTED HANDOFF , LOG STRUCTURED STORAGE, COMPACTION, VECTOR

    CLOCKS, READ REPAIR, JOURNALING, QUORUMS, EVENTUAL CONSISTENCY, DYNAMO, MAP/REDUCE, 2PC a few of the things I haven’t mentioned, look them up
  47. LESSONS LEARNED

  48. EVERYTHING THEY TAUGHT YOU ABOUT DATABASES AT UNIVERSITY IS WRONG

    almost
  49. None
  50. THINK ABOUT YOUR QUERIES FIRST don’t optimize for insertion, denormalize

    heavily, disk is cheap, this ain’t 1970
  51. GIVE A LOT OF THOUGHT TO YOUR PRIMARY KEYS range

    queries over cleverly designed primary keys can be very powerful, good keys required for efficient sharding
  52. M04L7NOC5NQS M04L7O05MIU2 M04NX42YFUCR M04NYR7VWKJC M04NZA8MJOOA M04NZB88CT14 M04NZPOCE8DM M04NZQ9G2T0S M04NZQE7E5VX M04NZSK4V3JN

    M04NZTRG661R M04NZTSUITJ7 M04NZUAILUS5 M04NZUG4DTXN M04NZWB9VV0C M04NZWW52T8N M04NZX2JEVO9 M04NZX7WD77W M04NZXGOLDEX M04NZXKNQWB3 M04NZXLGJ3M6 M04NZY7GO39G M04NZZ2SQF1I M04O013HN9L9 M04O014DASE6 M04O02PE8AD3 M04O02PGJBR1 M04O03UPTRWG M04O04833ZTL M04O04GH21JF M04O04JQ8B57 M04O04UHK3U4 M04O056QBNBH M04O05E8XO8N M04O069O8CDK M04O06MG47WK M04O07BHELVD M04O07F30WYX M04O0B39DGEA
  53. M04NZW B9VV0C timestamp 2012-02-28 23:59:56 UTC random number 681 731

    004
  54. B9VV0C M04NZW timestamp 2012-02-28 23:59:56 UTC random number 681 731

    004
  55. CONSISTENCY IS OVERRATED when you need it you need it,

    but most of the time you don’t
  56. DELETING DATA IS NOT TRIVIAL sometimes delete operations can be

    more costly than inserts, design your cleaning process early
  57. REDIS MONGODB CASSANDRA our current toolbox

  58. REDIS swiss army knife, we use it for “virtual memory”,

    counters and even messaging
  59. REDIS not distributed (yet), no automatic failover

  60. MONGODB a very good replacement for MySQL, replication and automatic

    failover is fantastic
  61. MONGODB global write lock kills performance, easily fragmented, sharding is

    complex and (has been) very buggy
  62. MONGODB we use it for precomputing and storing metrics for

    our reporting app
  63. MONGODB we’re currently pushing around 5K updates/s over three replica

    sets, each update incrementing up to 20 numbers
  64. CASSANDRA low level building blocks, no single point of failure,

    great horizontal scalability, TTL on values
  65. CASSANDRA we use it to store data about website visits,

    indexing it to support complex queries
  66. CASSANDRA millions of rows, some with millions of columns, adding

    ~1K new every second
  67. one million writes per second

  68. LEARN SOMETHING NEW TODAY nosql.mypopescu.com highscalability.com nosqltapes.com

  69. KTHXBAI twitter.com/iconara speakerdeck.com/u/iconara architecturalatrocities.com burtcorp.com