Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Building Reliable Cloud Storage Services with Riak

Building Reliable Cloud Storage Services with Riak

Presented to the SIlicon Valley Cloud Computing Group, April 2, 2013 @ Citrix HQ, Santa Clara, CA

Andy Gross

April 02, 2013
Tweet

More Decks by Andy Gross

Other Decks in Technology

Transcript

  1. Riak and Riak CS Andy Gross <@argv0> Chief Architect, Basho

    Technologies Silicon Valley Cloud Computing Group April 2, 2013 Tuesday, April 2, 13
  2. Basho 120+ employees, offices in SF, MA, London, Japan Founded

    in 2008, open sourced Riak in 2009 Sponsors of the Riak open source database (Apache 2) Sell Enterprise features (multi-DC replication), support, training. Riak CS (S3-compat storage) released in March 2012 Tuesday, April 2, 13
  3. Now Open Source (Apache 2) Cloud storage software backed by

    Riak S3 API Formerly closed-source Per-tenant reporting Pluggable authentication Detailed stats DTrace support Multi-datacenter replication (Enterprise) Preliminary integration with CloudStack Tuesday, April 2, 13
  4. what is a cloud service? Tuesday, April 2, 13

  5. what is a cloud service? fault tolerant Tuesday, April 2,

    13
  6. what is a cloud service? horizontally scalable fault tolerant Tuesday,

    April 2, 13
  7. what is a cloud service? operationally simple horizontally scalable fault

    tolerant Tuesday, April 2, 13
  8. what is a cloud service? operationally simple horizontally scalable no

    SPOFs fault tolerant Tuesday, April 2, 13
  9. what is a cloud service? operationally simple horizontally scalable highly

    available no SPOFs fault tolerant Tuesday, April 2, 13
  10. what is a cloud service? operationally simple horizontally scalable globally

    distributed highly available no SPOFs fault tolerant Tuesday, April 2, 13
  11. you can’t outsource these properties operationally simple horizontally scalable globally

    distributed highly available no SPOFs fault tolerant Tuesday, April 2, 13
  12. “use pacemaker” = wrong answer Tuesday, April 2, 13

  13. “use mysql best practices for redundancy” = wrong answer Tuesday,

    April 2, 13
  14. “just plug it into a SAN” = wrong answer Tuesday,

    April 2, 13
  15. all cloud services need reliable, distributed state storage Tuesday, April

    2, 13
  16. storage is the most important and hardest part Tuesday, April

    2, 13
  17. Riak CS uses Riak Tuesday, April 2, 13

  18. What is Riak? Tuesday, April 2, 13

  19. Key-Value store (plus extras) Distributed, horizontally scalable Eventually consistent Fault-tolerant

    Highly-available Inspired by Amazon’s Dynamo Tuesday, April 2, 13
  20. Simple operations - get, put, delete Value is mostly opaque

    (some metadata) Extras MapReduce Secondary Indexes Full-text search (optional) Key-Value Tuesday, April 2, 13
  21. Distributed & Horizontally Scalable Default configuration is in a cluster

    Load and data are spread evenly via consistent hashing Scalable: Add more nodes to get more X Tuesday, April 2, 13
  22. Fault-Tolerant Symmetry: All nodes participate equally Decentralized: no central control,

    no SPOF All data is replicated 3x by default Cluster transparently survives... node failure network partitions Tuesday, April 2, 13
  23. Highly-Available Any node can serve client requests Fallbacks (sloppy quorums)

    are used when nodes are down Always accepts write requests Accepts read request as long as R/N nodes are alive Per-request quorums Tuesday, April 2, 13
  24. Inspired by Amazon’s Dynamo Masterless, peer-coordinated replication Consistent hashing Eventually

    consistent Quorum reads and writes Anti-entropy: read repair, hinted handoff Tuesday, April 2, 13
  25. Riak Node Riak Node Riak Node Riak Node Riak Node

    Large Object Riak CS S3 API Reporting API Riak CS S3 API Reporting API Riak CS S3 API Reporting API Riak CS S3 API Reporting API Riak CS S3 API Reporting API Tuesday, April 2, 13
  26. Riak Node Riak Node Riak Node Riak Node Riak Node

    Large Object Riak CS S3 API Reporting API Riak CS S3 API Reporting API Riak CS S3 API Reporting API Riak CS S3 API Reporting API Riak CS S3 API Reporting API 1. user uploads an object Tuesday, April 2, 13
  27. Riak Node Riak Node Riak Node Riak Node Riak Node

    Large Object Riak CS S3 API Reporting API Riak CS S3 API Reporting API Riak CS S3 API Reporting API Riak CS S3 API Reporting API Riak CS S3 API Reporting API Tuesday, April 2, 13
  28. Riak Node Riak Node Riak Node Riak Node Riak Node

    Large Object Riak CS S3 API Reporting API Riak CS S3 API Reporting API Riak CS S3 API Reporting API Riak CS S3 API Reporting API Riak CS S3 API Reporting API 1 MB 2. Riak CS breaks object into 1 MB chunks 1 MB 1 MB 1 MB 1 MB 1 MB 1 MB 1 MB 1 MB 1 MB 1 MB 1 MB 1 MB 1 MB 1 MB 1 MB 1 MB 1 MB Tuesday, April 2, 13
  29. Riak Node Riak Node Riak Node Riak Node Riak Node

    Large Object Riak CS S3 API Reporting API Riak CS S3 API Reporting API Riak CS S3 API Reporting API Riak CS S3 API Reporting API Riak CS S3 API Reporting API 1 MB 1 MB 1 MB 1 MB 1 MB 1 MB 1 MB 1 MB 1 MB 1 MB 1 MB 1 MB 1 MB 1 MB 1 MB 1 MB 1 MB 1 MB Tuesday, April 2, 13
  30. Riak Node Riak Node Riak Node Riak Node Riak Node

    Large Object Riak CS S3 API Reporting API Riak CS S3 API Reporting API Riak CS S3 API Reporting API Riak CS S3 API Reporting API Riak CS S3 API Reporting API 1 MB 1 MB 1 MB 1 MB 1 MB 1 MB 1 MB 1 MB 1 MB 1 MB 1 MB 1 MB 1 MB 1 MB 1 MB 1 MB 1 MB 1 MB 3. Riak CS streams chunks to Riak nodes Tuesday, April 2, 13
  31. Riak Node Riak Node Riak Node Riak Node Riak Node

    Large Object Riak CS S3 API Reporting API Riak CS S3 API Reporting API Riak CS S3 API Reporting API Riak CS S3 API Reporting API Riak CS S3 API Reporting API Tuesday, April 2, 13
  32. Riak Node Riak Node Riak Node Riak Node Riak Node

    Large Object Riak CS S3 API Reporting API Riak CS S3 API Reporting API Riak CS S3 API Reporting API Riak CS S3 API Reporting API Riak CS S3 API Reporting API 4. Riak replicates and stores chunks Tuesday, April 2, 13
  33. Principles Always-writable Incrementally scalable Symmetrical Decentralized Focus on SLAs, tail

    latency Tuesday, April 2, 13
  34. Techniques Consistent Hashing Vector Clocks Read Repair Anti-Entropy Hinted Handoff

    Gossip Protocol Tuesday, April 2, 13
  35. Consistent Hashing Invented by Danny Lewin and others @ MIT/Akamai

    Minimizes remapping of keys when number of hash slots changes Originally applied to CDNs, used in Dynamo for replica placement Enables incremental scalability, even spread Minimizes hot spots Tuesday, April 2, 13
  36. Tuesday, April 2, 13

  37. Vector Clocks Introduced by Mattern et al, in 1988 Extends

    Lamport’s timestamps (1978) Each value in Dynamo tagged with vector clock Allows detection of stale values, logical siblings Tuesday, April 2, 13
  38. Read Repair Update stale versions opportunistically on reads (instead of

    writes) Pushes system toward consistency, after returning value to client Reflects focus on a cheap, always-available write path Tuesday, April 2, 13
  39. Hinted Handoff Any node can accept writes for other nodes

    if they’re down All messages include a destination Data accepted by node other than destination is handed off when node recovers As long as a single node is alive the cluster can accept a write Tuesday, April 2, 13
  40. Anti-Entropy Replicas maintain a Merkle Tree of keys and their

    versions/hashes Trees periodically exchanged with peer vnodes Merkle tree enables cheap comparison Only values with different hashes are exchanged Pushes system toward consistency Tuesday, April 2, 13
  41. Gossip Protocol Decentralized approach to managing global state Trades off

    atomicity of state changes for a decentralized approach Volume of gossip can overwhelm networks without care Tuesday, April 2, 13
  42. Hinted Handoff Tuesday, April 2, 13

  43. Hinted Handoff • Node fails X X X X X

    X X X Tuesday, April 2, 13
  44. Hinted Handoff • Node fails • Requests go to fallback

    hash(“blocks/6307C89A-710A-42CD-9FFB-2A6B39F983EA”) X X X X X X X X Tuesday, April 2, 13
  45. Hinted Handoff • Node fails • Requests go to fallback

    • Node comes back hash(“blocks/6307C89A-710A-42CD-9FFB-2A6B39F983EA”) Tuesday, April 2, 13
  46. Hinted Handoff • Node fails • Requests go to fallback

    • Node comes back • “Handoff” - data returns to recovered node hash(“blocks/6307C89A-710A-42CD-9FFB-2A6B39F983EA”) Tuesday, April 2, 13
  47. Hinted Handoff • Node fails • Requests go to fallback

    • Node comes back • “Handoff” - data returns to recovered node • Normal operations resume hash(“blocks/6307C89A-710A-42CD-9FFB-2A6B39F983EA”) Tuesday, April 2, 13
  48. Anatomy of a Request get(“blocks/6307C89A-710A-42CD-9FFB-2A6B39F983EA”) Tuesday, April 2, 13

  49. Anatomy of a Request get(“blocks/6307C89A-710A-42CD-9FFB-2A6B39F983EA”) client Riak Tuesday, April 2,

    13
  50. Anatomy of a Request get(“blocks/6307C89A-710A-42CD-9FFB-2A6B39F983EA”) Get Handler (FSM) client Riak

    Tuesday, April 2, 13
  51. Anatomy of a Request get(“blocks/6307C89A-710A-42CD-9FFB-2A6B39F983EA”) Get Handler (FSM) client Riak

    hash(“blocks/ 6307C89A-710A-42CD-9FFB-2A6B39F983EA”) == 10, 11, 12 Tuesday, April 2, 13
  52. Anatomy of a Request get(“blocks/6307C89A-710A-42CD-9FFB-2A6B39F983EA”) Get Handler (FSM) client Riak

    hash(“blocks/ 6307C89A-710A-42CD-9FFB-2A6B39F983EA”) == 10, 11, 12 Coordinating node Cluster 6 7 8 9 10 11 12 13 14 15 16 The Ring Tuesday, April 2, 13
  53. Anatomy of a Request get(“blocks/6307C89A-710A-42CD-9FFB-2A6B39F983EA”) Get Handler (FSM) client Riak

    get(“blocks/6307C89A-710A-42CD-9FFB-2A6B39F983EA”) Coordinating node Cluster 6 7 8 9 10 11 12 13 14 15 16 The Ring Tuesday, April 2, 13
  54. Anatomy of a Request get(“blocks/6307C89A-710A-42CD-9FFB-2A6B39F983EA”) Get Handler (FSM) client Riak

    Coordinating node Cluster 6 7 8 9 10 11 12 13 14 15 16 The Ring R=2 Tuesday, April 2, 13
  55. Anatomy of a Request get(“blocks/6307C89A-710A-42CD-9FFB-2A6B39F983EA”) Get Handler (FSM) client Riak

    Coordinating node Cluster 6 7 8 9 10 11 12 13 14 15 16 The Ring R=2 v1 Tuesday, April 2, 13
  56. Anatomy of a Request get(“blocks/6307C89A-710A-42CD-9FFB-2A6B39F983EA”) Get Handler (FSM) client Riak

    R=2 v1 v2 Tuesday, April 2, 13
  57. Anatomy of a Request get(“blocks/6307C89A-710A-42CD-9FFB-2A6B39F983EA”) Get Handler (FSM) client Riak

    R=2 v2 v2 Tuesday, April 2, 13
  58. Anatomy of a Request get(“blocks/6307C89A-710A-42CD-9FFB-2A6B39F983EA”) v2 Tuesday, April 2, 13

  59. Read Repair get(“blocks/6307C89A-710A-42CD-9FFB-2A6B39F983EA”) Get Handler (FSM) client Riak Coordinating node

    Cluster 6 7 8 9 10 11 12 13 14 15 16 R=2 v1 v2 v2 v2 v1 Tuesday, April 2, 13
  60. Read Repair get(“blocks/6307C89A-710A-42CD-9FFB-2A6B39F983EA”) Get Handler (FSM) client Riak Coordinating node

    Cluster 6 7 8 9 10 11 12 13 14 15 16 R=2 v2 v2 v2 v1 Tuesday, April 2, 13
  61. Read Repair get(“blocks/6307C89A-710A-42CD-9FFB-2A6B39F983EA”) Get Handler (FSM) client Riak Coordinating node

    Cluster 6 7 8 9 10 11 12 13 14 15 16 R=2 v2 v2 v2 v1 v1 Tuesday, April 2, 13
  62. v2 v2 Read Repair get(“blocks/6307C89A-710A-42CD-9FFB-2A6B39F983EA”) Get Handler (FSM) client Riak

    Coordinating node Cluster 6 7 8 9 10 11 12 13 14 15 16 R=2 v2 v2 v2 v2 v2 Tuesday, April 2, 13
  63. Erlang/OTP Runtime Riak Architecture Tuesday, April 2, 13

  64. Erlang/OTP Runtime Riak KV Riak Architecture Tuesday, April 2, 13

  65. Erlang/OTP Runtime Riak KV Riak Architecture Client APIs Tuesday, April

    2, 13
  66. Erlang/OTP Runtime Riak KV Riak Architecture Client APIs HTTP Tuesday,

    April 2, 13
  67. Erlang/OTP Runtime Riak KV Riak Architecture Client APIs HTTP Protocol

    Buffers Tuesday, April 2, 13
  68. Erlang/OTP Runtime Riak KV Riak Architecture Client APIs HTTP Protocol

    Buffers Erlang local client Tuesday, April 2, 13
  69. Erlang/OTP Runtime Riak KV Riak Architecture Client APIs Request Coordination

    HTTP Protocol Buffers Erlang local client Tuesday, April 2, 13
  70. Erlang/OTP Runtime Riak KV Riak Architecture Client APIs Request Coordination

    get put delete map-reduce HTTP Protocol Buffers Erlang local client Tuesday, April 2, 13
  71. Erlang/OTP Runtime Riak KV Riak Architecture Client APIs Request Coordination

    Riak Core get put delete map-reduce HTTP Protocol Buffers Erlang local client Tuesday, April 2, 13
  72. Erlang/OTP Runtime Riak KV Riak Architecture Client APIs Request Coordination

    Riak Core get put delete map-reduce HTTP Protocol Buffers Erlang local client consistent hashing Tuesday, April 2, 13
  73. Erlang/OTP Runtime Riak KV Riak Architecture Client APIs Request Coordination

    Riak Core get put delete map-reduce HTTP Protocol Buffers Erlang local client membership consistent hashing Tuesday, April 2, 13
  74. Erlang/OTP Runtime Riak KV Riak Architecture Client APIs Request Coordination

    Riak Core get put delete map-reduce HTTP Protocol Buffers Erlang local client membership consistent hashing handoff Tuesday, April 2, 13
  75. Erlang/OTP Runtime Riak KV Riak Architecture Client APIs Request Coordination

    Riak Core get put delete map-reduce HTTP Protocol Buffers Erlang local client membership consistent hashing handoff node-liveness Tuesday, April 2, 13
  76. Erlang/OTP Runtime Riak KV Riak Architecture Client APIs Request Coordination

    Riak Core get put delete map-reduce HTTP Protocol Buffers Erlang local client membership consistent hashing handoff node-liveness gossip Tuesday, April 2, 13
  77. Erlang/OTP Runtime Riak KV Riak Architecture Client APIs Request Coordination

    Riak Core get put delete map-reduce HTTP Protocol Buffers Erlang local client membership consistent hashing handoff node-liveness gossip buckets Tuesday, April 2, 13
  78. Erlang/OTP Runtime Riak KV Riak Architecture Client APIs Request Coordination

    Riak Core get put delete map-reduce HTTP Protocol Buffers Erlang local client membership consistent hashing handoff node-liveness gossip buckets vnode master Tuesday, April 2, 13
  79. Erlang/OTP Runtime Riak KV Riak Architecture Client APIs Request Coordination

    Riak Core get put delete map-reduce HTTP Protocol Buffers Erlang local client membership consistent hashing handoff node-liveness gossip buckets vnodes vnode master Tuesday, April 2, 13
  80. Erlang/OTP Runtime Riak KV Riak Architecture Client APIs Request Coordination

    Riak Core get put delete map-reduce HTTP Protocol Buffers Erlang local client membership consistent hashing handoff node-liveness gossip buckets vnodes storage backend vnode master Tuesday, April 2, 13
  81. Erlang/OTP Runtime Riak KV Riak Architecture Client APIs Request Coordination

    Riak Core get put delete map-reduce HTTP Protocol Buffers Erlang local client membership consistent hashing handoff node-liveness gossip buckets vnodes storage backend JS Runtime vnode master Tuesday, April 2, 13
  82. riak is a solid foundation for building cloud services Tuesday,

    April 2, 13
  83. Coming Soon: Riak CS 1.4 (Q2) Swift API Keystone Integration

    S3 Features COPY Object Object Versioning Riak CS 1.5 (Q3) Server side encryption More S3 features Enhanced CloudStack and OpenStack integration Tuesday, April 2, 13
  84. Coming Later (2014) Erasure coding Reduced redundancy storage Native indexing/search

    Tuesday, April 2, 13
  85. RICON East - May 13-14, NYC A distributed systems conference

    for developers Speakers from Comcast, State Farm, UC Berkeley, Harvard, and many more Use discount code SVCloud20 for 20% off tickets http://ricon.io/east.html Tuesday, April 2, 13
  86. thanks!/questions? download riakcs: http://docs.basho.com/riakcs/latest/riakcs-downloads/ hack riakcs: http://github.com/basho/riak_cs work at basho:

    http://bashojobs.theresumator.com follow basho on twitter: http:/twitter.com/basho Tuesday, April 2, 13