From Relational to Riak

From Relational to Riak

Introduction to moving from a relational database to Riak, including advantages, user examples and tradeoffs.


Basho Technologies

January 03, 2013


  1. From Relational To

  2. Shanley Kane
 Director of Product Management @shanley

  3. Overview •  A few Riak user examples •  Common reasons

    for moving to Riak •  Tradeoffs
  4. What is Riak •  Masterless, distributed, open source database • 

    Automatically replicates data (default, n = 3) •  Designed for availability and operational ease •  Inspired by architectural principles from Amazon and Akamai •  Open source (Apache 2 license) •  Key-value model •  Features for full-text search, querying metadata and MapReduce
  5. •  Cloud infrastructure management •  Machine, customer and API data

    •  “Design for failure” architecture “enStratus relies on Riak to ensure that our cloud infrastructure management platform scales seamlessly, without interruption and performance bottlenecks, while meeting and exceeding internal requirements for high availability and data durability.”
  6. •  Scaling writes in MySQL became a bottleneck •  Master/slave

    replication made master nodes a single point of failure •  Multi-site replication ß
  7. •  Re-platform of e-commerce platform •  Product catalog data • 

    Adding non-relational to the mix
  8. ß •  Challenges: modeling data in a NoSQL world

  9. Top Reasons •  High availability •  Minimizing the cost of

    scale •  Simple “schema-less” design
  10. High Availability •  Availability has a direct impact on revenue

    and user trust •  Read AND write availability is critical
  11. High Availability ß  master ß  slave slave à Relational Architecture

  12. High Availability ß  master ß  slave slave à write

  13. High Availability ß  master ß  slave slave à write

  14. High Availability Riak’s Masterless Architecture

  15. High Availability Riak’s Masterless Architecture write read read write write

    write read write read
  16. High Availability Hinted Handoff write write write

  17. High Availability Hinted Handoff write write write

  18. Cost of Scale •  Operational impact of growth •  Economies

    of scale •  Commodity machines •  Meeting peak loads
  19. Cost of Scale Sharding in Relational Systems A - D

    E - K L - P Q - T U - Z
  20. Cost of Scale •  Hot spots •  Unevenly spread data

    and request patterns •  Resharding is operationally intensive, often manual A - D E - K L - P Q - T U - Z
  21. Cost of Scale Riak’s Consistent Hashing •  Evenly spreads data

    around the cluster •  Automatically rebalances data when machines are added
  22. None
  23. None
  24. None
  25. None
  26. None
  27. Ops = Easier! •  When a new node is added,

    the node takes over its share of partitions until data distribution is even again •  Does not require manual intervention – data is automatically handed off •  Simple stage, preview and commit workflow
  28. Multi-Data Center •  Global footprint •  Disaster recovery •  Data

  29. Multi-Data Center Riak Enterprise •  Real-time or full sync • 

    Uni-directional or bi-directional •  “Masterless” – a secondary site can take over operations for a failed primary
  30. Data Modeling •  Simple key / value design •  Riak

    is “Schema-less”
  31. Data Modeling •  Simple key / value design •  “Schema-less”

    Data Modeling
  32. Data Modeling •  Add new features without updating the schema

    •  Ideal for when rapid iterations are required •  Simple, straight-forward operations •  Less complex than a relational model
  33. Tradeoffs •  No sets, counters or transactions •  No concept

    of columns and rows •  No SQL or SQL-like language •  No join operations
  34. Query Options •  Riak Search: Distributed full-text search •  Secondary

    Indexing: Tag objects with queryable metadata •  MapReduce: Aggregation tasks; Erlang and Javascript support
  35. Common Application Patterns Data Type Key Value Session User/Session ID

    Session Data Advertising Campaign ID Ad Content Logs Date Log File Sensor Date, Date/Time Updates User Data Login, Email, UUID User Attributes Content Title, Integer, Etc. Text, JSON, XML
  36. Eventual Consistency •  Riak is eventually consistent •  Does not

    support strictly consistent operations
  37. Resolving Conflicts •  Data conflicts can arise in a small

    proportion of requests due to exactly concurrent writes, laggy nodes and certain failure modes •  Riak has mechanisms for detecting and resolving data conflicts
  38. Resolving Conflicts •  Read repair •  “Repair” command •  NEW

    active anti-entropy
  39. Resolving Conflicts •  Objects are tagged with vector clocks • 

    Vector clocks show relationships •  Deal with data conflicts at the database level (last write wins) or let clients resolve with use- case specific logic
  40. Other Nice Riak Features •  HTTP API and Protocol Buffers

    •  Lots of client libraries •  Robust, supportive community •  Tunable requests •  AMIs, Windows Azure, Engine Yard, Joyent and lots of other platforms
  41. Migration •  Migrate in stages •  Pick a unit of

    data •  Start with 1:1 relationships •  Areas that can be modeled as key/value operations •  Use Riak client APIs •, mailing list or proserv team
  42. Next Steps • • • Additional questions?

    Join the mailing list or email