Deploying Riak on Amazon Web Services

Deploying Riak on Amazon Web Services

A brief overview of Riak [0], followed by details on how to deploy Riak on Amazon Web Services (AWS).

Toward the end of the deck, there is a section on tuning. The gist [1] below contains a Bash script to tune Linux.

[0] http://basho.com/riak/
[1] https://gist.github.com/hectcastro/5651350

B32443719f266e1da10dc301688642b4?s=128

Hector Castro

June 01, 2013
Tweet

Transcript

  1. None
  2. // Who are we? * Founded in 2008 by ex-Akamai,

    Mitre, Apple * 120+ employees; > 50% eng; distributed company * Sponsors of Riak, the Apache 2.0-licensed project * Basho sells Riak add-ons: Riak EDS and Riak CS
  3. // Who am I? * Hector Castro * E-mail: hector@basho.com

    * Twitter: @hectcastro
  4. None
  5. "Distributed, masterless, highly-available key/value store."

  6. "Distributed, masterless, highly-available key/value store." * Deployed as cluster of

    nodes (>= 5), incrementally scalable
  7. "Distributed, masterless, highly-available key/value store." * Deployed as cluster of

    nodes (>= 5), incrementally scalable * Any node can coordinate requests, data replicated to 3 nodes by default, no SPOF
  8. "Distributed, masterless, highly-available key/value store." * Deployed as cluster of

    nodes (>= 5), incrementally scalable * Any node can coordinate requests, data replicated to 3 nodes by default, no SPOF * Eventually consistent with automatic failover
  9. "Distributed, masterless, highly-available key/value store." * Deployed as cluster of

    nodes (>= 5), incrementally scalable * Any node can coordinate requests, data replicated to 3 nodes by default, no SPOF * Eventually consistent with automatic failover * Key/value model with additional query methods
  10. None
  11. // When to use Riak?

  12. // When to use Riak? * Data is critical and

    always needs to be available
  13. // When to use Riak? * Data is critical and

    always needs to be available * Database must always accept writes
  14. // When to use Riak? * Data is critical and

    always needs to be available * Database must always accept writes * Scale horizontally
  15. // When to use Riak? * Data is critical and

    always needs to be available * Database must always accept writes * Scale horizontally * Focus on SLAs, tail latency
  16. // Data model

  17. // Riak Object

  18. // Riak Object * Key/value + metadata

  19. // Riak Object * Key/value + metadata * Fundamental unit

    of replication
  20. // Riak Object * Key/value + metadata * Fundamental unit

    of replication * Soft limit of ~4MB on object size [0]
  21. // Riak Object * Key/value + metadata * Fundamental unit

    of replication * Soft limit of ~4MB on object size [0] [0] Hardware dependent
  22. // Buckets

  23. // Buckets * Virtual namespace

  24. // Buckets * Virtual namespace * Bucket and key produce

    object's address
  25. // Buckets * Virtual namespace * Bucket and key produce

    object's address * No relationships between buckets
  26. // Data access

  27. // Interfaces

  28. // Interfaces * RESTful HTTP API

  29. // Interfaces * RESTful HTTP API * Protocol Buffers API

  30. // MapReduce

  31. // MapReduce * Distributed processing system using Riak Pipe [0]

  32. // MapReduce * Distributed processing system using Riak Pipe [0]

    * Efficient for targeted queries over known key range
  33. // MapReduce * Distributed processing system using Riak Pipe [0]

    * Efficient for targeted queries over known key range * Write jobs in Erlang or JavaScript
  34. // MapReduce * Distributed processing system using Riak Pipe [0]

    * Efficient for targeted queries over known key range * Write jobs in Erlang or JavaScript [0] https://github.com/basho/riak_pipe
  35. // Secondary indexes (2i)

  36. // Secondary indexes (2i) * Riak Objects tagged with custom

    metadata
  37. // Secondary indexes (2i) * Riak Objects tagged with custom

    metadata * Exact match and range queries
  38. // Secondary indexes (2i) * Riak Objects tagged with custom

    metadata * Exact match and range queries * Pagination support coming in 1.4
  39. // Riak Search (soon to be Yokozuna)

  40. // Riak Search (soon to be Yokozuna) * Store and

    index JSON, XML, TXT documents
  41. // Riak Search (soon to be Yokozuna) * Store and

    index JSON, XML, TXT documents * Supports subset of Solr API
  42. // Riak Search (soon to be Yokozuna) * Store and

    index JSON, XML, TXT documents * Supports subset of Solr API * Yokozuna will support Distributed Solr API [0]
  43. // Riak Search (soon to be Yokozuna) * Store and

    index JSON, XML, TXT documents * Supports subset of Solr API * Yokozuna will support Distributed Solr API [0] [0] http://wiki.apache.org/solr/DistributedSearch
  44. // Future work

  45. // Future work * Dynamic ring sizing

  46. // Future work * Dynamic ring sizing * Secondary index

    (2i) improvements
  47. // Future work * Dynamic ring sizing * Secondary index

    (2i) improvements * Convergent replicated data types (CRDTs)
  48. // Future work * Dynamic ring sizing * Secondary index

    (2i) improvements * Convergent replicated data types (CRDTs) * Yokozuna
  49. None
  50. // Why is Riak an attractive option for AWS?

  51. // Why is Riak an attractive option for AWS? *

    Masterless design allows data to remain available during instance failure
  52. // Why is Riak an attractive option for AWS? *

    Masterless design allows data to remain available during instance failure * Spin up a another instance to add more cluster resources
  53. // Why is Riak an attractive option for AWS? *

    Masterless design allows data to remain available during instance failure * Spin up a another instance to add more cluster resources * Replicate clusters between Availability Zones (AZs) or Regions
  54. // Getting started

  55. // Operating system

  56. // Operating system * Debian * Ubuntu * Fedora *

    Red Hat * FreeBSD * SmartOS * OmniOS * Solaris
  57. // EC2 environment

  58. // EC2 environment * EC2-Classic

  59. // EC2 environment * EC2-Classic * EC2-Virtual Private Cloud (VPC)

  60. // Virtual Private Cloud (VPC)

  61. // Virtual Private Cloud (VPC) * Use private IPs and

    hostnames for Riak node names
  62. // Virtual Private Cloud (VPC) * Use private IPs and

    hostnames for Riak node names * Nodes on a private subnet require a NAT instance
  63. // Virtual Private Cloud (VPC) * Use private IPs and

    hostnames for Riak node names * Nodes on a private subnet require a NAT instance * Elastic Load Balancers (ELB) must be on a public subnet in order to accept traffic from outside the VPC
  64. // Virtual Private Cloud (VPC) * Use private IPs and

    hostnames for Riak node names * Nodes on a private subnet require a NAT instance * Elastic Load Balancers (ELB) must be on a public subnet in order to accept traffic from outside the VPC * Create network ACLs and security groups within the VPC
  65. // Security groups

  66. // Security groups * Firewall that lives above instances

  67. // Security groups * Firewall that lives above instances *

    Allow all traffic between nodes in the cluster
  68. // Security groups * Firewall that lives above instances *

    Allow all traffic between nodes in the cluster * Poke holes for ports 8098 (HTTP) and 8087 (PBC) to networks that contain Riak clients
  69. // Instance type

  70. // Instance type * m1.large and up

  71. // Instance type * m1.large and up * Cluster compute

    instances provide 10 GigE between nodes
  72. // Instance type * m1.large and up * Cluster compute

    instances provide 10 GigE between nodes * 10 GigE and SSDs: hi1.4xlarge
  73. // Storage type

  74. // Storage type * Ephemeral or instance store

  75. // Storage type * Ephemeral or instance store * Elastic

    Block Store (EBS) with provisioned IOPS (PIOPS) and EBS-optimized instances
  76. // Storage type * Ephemeral or instance store * Elastic

    Block Store (EBS) with provisioned IOPS (PIOPS) and EBS-optimized instances * SSD options are currently all ephemeral
  77. // Load balancing

  78. // Load balancing * Elastic Load Balancer (ELB)

  79. // Load balancing * Elastic Load Balancer (ELB) * Software:

    HAproxy, nginx, Apache
  80. // Protips

  81. // Protips * Use a configuration management framework

  82. // Protips * Use a configuration management framework * Evaluate

    CloudFormation
  83. // Protips * Use a configuration management framework * Evaluate

    CloudFormation * Use our AMIs in the AWS Marketplace
  84. // Protips * Use a configuration management framework * Evaluate

    CloudFormation * Use our AMIs in the AWS Marketplace * Monitor with CloudWatch, but also make use of the statistics Riak emits
  85. // Protips * Use a configuration management framework * Evaluate

    CloudFormation * Use our AMIs in the AWS Marketplace * Monitor with CloudWatch, but also make use of the statistics Riak emits * Assess whether you need to span Availability Zones
  86. // Tuning

  87. // Thank you * E-mail: hector@basho.com * Twitter: @hectcastro