From Relational to Riak

From Relational To

Shanley Kane    Director of Product Management @shanley [email protected]

Overview •  A few Riak user examples •  Common reasons
for moving to Riak •  Tradeoffs

What is Riak •  Masterless, distributed, open source database • 
Automatically replicates data (default, n = 3) •  Designed for availability and operational ease •  Inspired by architectural principles from Amazon and Akamai •  Open source (Apache 2 license) •  Key-value model •  Features for full-text search, querying metadata and MapReduce

•  Cloud infrastructure management •  Machine, customer and API data
•  “Design for failure” architecture “enStratus relies on Riak to ensure that our cloud infrastructure management platform scales seamlessly, without interruption and performance bottlenecks, while meeting and exceeding internal requirements for high availability and data durability.”

•  Scaling writes in MySQL became a bottleneck •  Master/slave
replication made master nodes a single point of failure •  Multi-site replication ß vimeo.com/bashotech

•  Re-platform of e-commerce platform •  Product catalog data • 
Adding non-relational to the mix

ß ricon2012.com •  Challenges: modeling data in a NoSQL world

Top Reasons •  High availability •  Minimizing the cost of
scale •  Simple “schema-less” design

High Availability •  Availability has a direct impact on revenue
and user trust •  Read AND write availability is critical

High Availability ß  master ß  slave slave à Relational Architecture

High Availability ß  master ß  slave slave à write

High Availability Riak’s Masterless Architecture

High Availability Riak’s Masterless Architecture write read read write write
write read write read

High Availability Hinted Handoff write write write

Cost of Scale •  Operational impact of growth •  Economies
of scale •  Commodity machines •  Meeting peak loads

Cost of Scale Sharding in Relational Systems A - D
E - K L - P Q - T U - Z

Cost of Scale •  Hot spots •  Unevenly spread data
and request patterns •  Resharding is operationally intensive, often manual A - D E - K L - P Q - T U - Z

Cost of Scale Riak’s Consistent Hashing •  Evenly spreads data
around the cluster •  Automatically rebalances data when machines are added

Ops = Easier! •  When a new node is added,
the node takes over its share of partitions until data distribution is even again •  Does not require manual intervention – data is automatically handed off •  Simple stage, preview and commit workﬂow

Multi-Data Center •  Global footprint •  Disaster recovery •  Data
locality

Multi-Data Center Riak Enterprise •  Real-time or full sync • 
Uni-directional or bi-directional •  “Masterless” – a secondary site can take over operations for a failed primary

Data Modeling •  Simple key / value design •  Riak
is “Schema-less”

Data Modeling •  Simple key / value design •  “Schema-less”
Data Modeling

Data Modeling •  Add new features without updating the schema
•  Ideal for when rapid iterations are required •  Simple, straight-forward operations •  Less complex than a relational model

Tradeoffs •  No sets, counters or transactions •  No concept
of columns and rows •  No SQL or SQL-like language •  No join operations

Query Options •  Riak Search: Distributed full-text search •  Secondary
Indexing: Tag objects with queryable metadata •  MapReduce: Aggregation tasks; Erlang and Javascript support

Common Application Patterns Data Type Key Value Session User/Session ID
Session Data Advertising Campaign ID Ad Content Logs Date Log File Sensor Date, Date/Time Updates User Data Login, Email, UUID User Attributes Content Title, Integer, Etc. Text, JSON, XML

Eventual Consistency •  Riak is eventually consistent •  Does not
support strictly consistent operations

Resolving Conflicts •  Data conflicts can arise in a small
proportion of requests due to exactly concurrent writes, laggy nodes and certain failure modes •  Riak has mechanisms for detecting and resolving data conflicts

Resolving Conﬂicts •  Read repair •  “Repair” command •  NEW
active anti-entropy

Resolving Conflicts •  Objects are tagged with vector clocks • 
Vector clocks show relationships •  Deal with data conflicts at the database level (last write wins) or let clients resolve with use- case specific logic

Other Nice Riak Features •  HTTP API and Protocol Buffers
•  Lots of client libraries •  Robust, supportive community •  Tunable requests •  AMIs, Windows Azure, Engine Yard, Joyent and lots of other platforms

Migration •  Migrate in stages •  Pick a unit of
data •  Start with 1:1 relationships •  Areas that can be modeled as key/value operations •  Use Riak client APIs •  Docs.basho.com, mailing list or proserv team

Next Steps •  Basho.com/resources/white-papers •  Ricon2012.com •  Docs.basho.com Additional questions?
Join the mailing list or email [email protected]

From Relational to Riak

From Relational to Riak

More Decks by Basho Technologies

Other Decks in Technology

Featured

Transcript