Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Meet Riak with Justin Pease

Meet Riak with Justin Pease

Riak is an open source, scalable, fault-tolerant distributed database.

In this fast-paced introduction to Riak, Basho Director of Client Services Justin Pease discusses what Riak offers, some of the concepts upon which it is built, and why it should possibly be of interest to you.

Many of the concepts covered are broadly applicable to any distributed software.

Basho Technologies

March 24, 2012
Tweet

More Decks by Basho Technologies

Other Decks in Technology

Transcript

  1. basho Dynamo* powers S3 and the Amazon shopping cart, among

    other things. *Not to be confused with DynamoDB
  2. basho You associate your data with a meaningful key, and

    store them together in the database.
  3. basho Riak does provide extras on top of K/V. (MapReduce,

    Links, Full-text Search, Secondary Indices)
  4. basho In 2000, Dr. Eric Brewer* suggested that in a

    distributed system there is a tension between these characteristics: Consistency Availability Partition Tolerance * Dr. Eric Brewer is on Basho’s Board of Directors.
  5. basho In 2002, Seth Gilbert and Nancy Lynch of MIT

    published a proof of Brewer’s conjecture, that at any given moment in time you can only guarantee 2 of the 3 characteristics.
  6. basho Consistency means a request for a piece of data

    is guaranteed to always return the last written value.
  7. basho Does that cache ever serve data that may not

    be the absolute latest available?
  8. basho If we trade-off a measure of consistency - then

    - in exchange we can increase availability.
  9. basho Availability is not the same thing as 100% uptime.

    (If all your nodes are inaccessible, requests will not complete.)
  10. basho Partition Tolerance: Not on the table for trading. (More

    info, google: “coda hale partition tolerance”)
  11. basho The CAP theorem informs us that trade-offs will be

    required in any distributed system.
  12. basho Whatever distributed system you may use, this is relevant,

    and a question you should have an answer for.
  13. basho Highly Scalable What will it cost to grow capacity

    2X? (“Capacity” may be write-throughput, storage capacity, processing power, etc.)
  14. basho Highly Scalable With Riak, you get X% “capacity” with

    X% the nodes. (Where “capacity” equals write-throughput, storage capacity, and processing power.)
  15. basho Highly Scalable In this case, we saw: 6ms variance

    for 99th% (32 to 38ms) 0.68s variance for 100th% (0.12s to 0.8s)
  16. basho Fault Tolerant Riak is designed to transparently survive node

    failures, network failures, and software failures.
  17. basho Fault Tolerant Writes that the failed node was supposed

    to own are temporarily assigned to a substitute node.
  18. basho Fault Tolerant Reads requests for data the failed node

    was responsible for will trigger a replica being sent to the substitute node.
  19. basho Fault Tolerant Your success & sanity will be affected

    by ability to gracefully handle failures.
  20. basho The CAP theorem informs us that trade-offs will be

    required in any distributed system.
  21. basho N = Number of replicas across the cluster. (Ideally

    across N distinct servers, but not necessarily... e.g. you have less than N servers.)
  22. basho N = Number of replicas across the cluster. (Rule

    of thumb: A cluster of at least N+1 nodes)
  23. basho W = Number of replicas for write to be

    successful (Tunable per request)
  24. basho Speaks HTTP by default (Play with it via curl)

    Does HTTP/REST well (see webmachine) Plays well with existing HTTP infrastructure (Reverse proxy caches, load balancers, web servers)
  25. basho But if you need a highly scalable, fault tolerant,

    distributed database... you should check it out.