Upgrade to Pro — share decks privately, control downloads, hide ads and more …

From Relational to Riak

From Relational to Riak

Introduction to moving from a relational database to Riak, including advantages, user examples and tradeoffs.

Basho Technologies

January 03, 2013
Tweet

More Decks by Basho Technologies

Other Decks in Technology

Transcript

  1. What is Riak •  Masterless, distributed, open source database • 

    Automatically replicates data (default, n = 3) •  Designed for availability and operational ease •  Inspired by architectural principles from Amazon and Akamai •  Open source (Apache 2 license) •  Key-value model •  Features for full-text search, querying metadata and MapReduce
  2. •  Cloud infrastructure management •  Machine, customer and API data

    •  “Design for failure” architecture “enStratus relies on Riak to ensure that our cloud infrastructure management platform scales seamlessly, without interruption and performance bottlenecks, while meeting and exceeding internal requirements for high availability and data durability.”
  3. •  Scaling writes in MySQL became a bottleneck •  Master/slave

    replication made master nodes a single point of failure •  Multi-site replication ß vimeo.com/bashotech
  4. Top Reasons •  High availability •  Minimizing the cost of

    scale •  Simple “schema-less” design
  5. High Availability •  Availability has a direct impact on revenue

    and user trust •  Read AND write availability is critical
  6. Cost of Scale •  Operational impact of growth •  Economies

    of scale •  Commodity machines •  Meeting peak loads
  7. Cost of Scale •  Hot spots •  Unevenly spread data

    and request patterns •  Resharding is operationally intensive, often manual A - D E - K L - P Q - T U - Z
  8. Cost of Scale Riak’s Consistent Hashing •  Evenly spreads data

    around the cluster •  Automatically rebalances data when machines are added
  9. Ops = Easier! •  When a new node is added,

    the node takes over its share of partitions until data distribution is even again •  Does not require manual intervention – data is automatically handed off •  Simple stage, preview and commit workflow
  10. Multi-Data Center Riak Enterprise •  Real-time or full sync • 

    Uni-directional or bi-directional •  “Masterless” – a secondary site can take over operations for a failed primary
  11. Data Modeling •  Add new features without updating the schema

    •  Ideal for when rapid iterations are required •  Simple, straight-forward operations •  Less complex than a relational model
  12. Tradeoffs •  No sets, counters or transactions •  No concept

    of columns and rows •  No SQL or SQL-like language •  No join operations
  13. Query Options •  Riak Search: Distributed full-text search •  Secondary

    Indexing: Tag objects with queryable metadata •  MapReduce: Aggregation tasks; Erlang and Javascript support
  14. Common Application Patterns Data Type Key Value Session User/Session ID

    Session Data Advertising Campaign ID Ad Content Logs Date Log File Sensor Date, Date/Time Updates User Data Login, Email, UUID User Attributes Content Title, Integer, Etc. Text, JSON, XML
  15. Resolving Conflicts •  Data conflicts can arise in a small

    proportion of requests due to exactly concurrent writes, laggy nodes and certain failure modes •  Riak has mechanisms for detecting and resolving data conflicts
  16. Resolving Conflicts •  Objects are tagged with vector clocks • 

    Vector clocks show relationships •  Deal with data conflicts at the database level (last write wins) or let clients resolve with use- case specific logic
  17. Other Nice Riak Features •  HTTP API and Protocol Buffers

    •  Lots of client libraries •  Robust, supportive community •  Tunable requests •  AMIs, Windows Azure, Engine Yard, Joyent and lots of other platforms
  18. Migration •  Migrate in stages •  Pick a unit of

    data •  Start with 1:1 relationships •  Areas that can be modeled as key/value operations •  Use Riak client APIs •  Docs.basho.com, mailing list or proserv team