From Relational to Riak

Slide 1

Slide 1 text

From Relational To

Slide 2

Slide 2 text

Shanley Kane    Director of Product Management @shanley [email protected]

Slide 3

Slide 3 text

Overview •  A few Riak user examples •  Common reasons for moving to Riak •  Tradeoffs

Slide 4

Slide 4 text

What is Riak •  Masterless, distributed, open source database •  Automatically replicates data (default, n = 3) •  Designed for availability and operational ease •  Inspired by architectural principles from Amazon and Akamai •  Open source (Apache 2 license) •  Key-value model •  Features for full-text search, querying metadata and MapReduce

Slide 5

Slide 5 text

•  Cloud infrastructure management •  Machine, customer and API data •  “Design for failure” architecture “enStratus relies on Riak to ensure that our cloud infrastructure management platform scales seamlessly, without interruption and performance bottlenecks, while meeting and exceeding internal requirements for high availability and data durability.”

Slide 6

Slide 6 text

•  Scaling writes in MySQL became a bottleneck •  Master/slave replication made master nodes a single point of failure •  Multi-site replication ß vimeo.com/bashotech

Slide 7

Slide 7 text

•  Re-platform of e-commerce platform •  Product catalog data •  Adding non-relational to the mix

Slide 8

Slide 8 text

ß ricon2012.com •  Challenges: modeling data in a NoSQL world

Slide 9

Slide 9 text

Top Reasons •  High availability •  Minimizing the cost of scale •  Simple “schema-less” design

Slide 10

Slide 10 text

High Availability •  Availability has a direct impact on revenue and user trust •  Read AND write availability is critical

Slide 11

Slide 11 text

High Availability ß  master ß  slave slave à Relational Architecture

Slide 12

Slide 12 text

High Availability ß  master ß  slave slave à write

Slide 13

Slide 13 text

High Availability ß  master ß  slave slave à write

Slide 14

Slide 14 text

High Availability Riak’s Masterless Architecture

Slide 15

Slide 15 text

High Availability Riak’s Masterless Architecture write read read write write write read write read

Slide 16

Slide 16 text

High Availability Hinted Handoff write write write

Slide 17

Slide 17 text

High Availability Hinted Handoff write write write

Slide 18

Slide 18 text

Cost of Scale •  Operational impact of growth •  Economies of scale •  Commodity machines •  Meeting peak loads

Slide 19

Slide 19 text

Cost of Scale Sharding in Relational Systems A - D E - K L - P Q - T U - Z

Slide 20

Slide 20 text

Cost of Scale •  Hot spots •  Unevenly spread data and request patterns •  Resharding is operationally intensive, often manual A - D E - K L - P Q - T U - Z

Slide 21

Slide 21 text

Cost of Scale Riak’s Consistent Hashing •  Evenly spreads data around the cluster •  Automatically rebalances data when machines are added

Slide 22

Slide 22 text

No content

Slide 23

Slide 23 text

No content

Slide 24

Slide 24 text

No content

Slide 25

Slide 25 text

No content

Slide 26

Slide 26 text

No content

Slide 27

Slide 27 text

Ops = Easier! •  When a new node is added, the node takes over its share of partitions until data distribution is even again •  Does not require manual intervention – data is automatically handed off •  Simple stage, preview and commit workﬂow

Slide 28

Slide 28 text

Multi-Data Center •  Global footprint •  Disaster recovery •  Data locality

Slide 29

Slide 29 text

Multi-Data Center Riak Enterprise •  Real-time or full sync •  Uni-directional or bi-directional •  “Masterless” – a secondary site can take over operations for a failed primary

Slide 30

Slide 30 text

Data Modeling •  Simple key / value design •  Riak is “Schema-less”

Slide 31

Slide 31 text

Data Modeling •  Simple key / value design •  “Schema-less” Data Modeling

Slide 32

Slide 32 text

Data Modeling •  Add new features without updating the schema •  Ideal for when rapid iterations are required •  Simple, straight-forward operations •  Less complex than a relational model

Slide 33

Slide 33 text

Tradeoffs •  No sets, counters or transactions •  No concept of columns and rows •  No SQL or SQL-like language •  No join operations

Slide 34

Slide 34 text

Query Options •  Riak Search: Distributed full-text search •  Secondary Indexing: Tag objects with queryable metadata •  MapReduce: Aggregation tasks; Erlang and Javascript support

Slide 35

Slide 35 text

Common Application Patterns Data Type Key Value Session User/Session ID Session Data Advertising Campaign ID Ad Content Logs Date Log File Sensor Date, Date/Time Updates User Data Login, Email, UUID User Attributes Content Title, Integer, Etc. Text, JSON, XML

Slide 36

Slide 36 text

Eventual Consistency •  Riak is eventually consistent •  Does not support strictly consistent operations

Slide 37

Slide 37 text

Resolving Conflicts •  Data conflicts can arise in a small proportion of requests due to exactly concurrent writes, laggy nodes and certain failure modes •  Riak has mechanisms for detecting and resolving data conflicts

Slide 38

Slide 38 text

Resolving Conﬂicts •  Read repair •  “Repair” command •  NEW active anti-entropy

Slide 39

Slide 39 text

Resolving Conflicts •  Objects are tagged with vector clocks •  Vector clocks show relationships •  Deal with data conflicts at the database level (last write wins) or let clients resolve with use- case specific logic

Slide 40

Slide 40 text

Other Nice Riak Features •  HTTP API and Protocol Buffers •  Lots of client libraries •  Robust, supportive community •  Tunable requests •  AMIs, Windows Azure, Engine Yard, Joyent and lots of other platforms

Slide 41

Slide 41 text

Migration •  Migrate in stages •  Pick a unit of data •  Start with 1:1 relationships •  Areas that can be modeled as key/value operations •  Use Riak client APIs •  Docs.basho.com, mailing list or proserv team

Slide 42

Slide 42 text

Next Steps •  Basho.com/resources/white-papers •  Ricon2012.com •  Docs.basho.com Additional questions? Join the mailing list or email [email protected]