Deploying Riak on Amazon Web Services

Deploying Riak on Amazon Web Services

A brief overview of Riak [0], followed by details on how to deploy Riak on Amazon Web Services (AWS).

Toward the end of the deck, there is a section on tuning. The gist [1] below contains a Bash script to tune Linux.

[0] http://basho.com/riak/
[1] https://gist.github.com/hectcastro/5651350

B32443719f266e1da10dc301688642b4?s=128

Hector Castro

June 01, 2013
Tweet

Transcript

  1. 1.
  2. 2.

    // Who are we? * Founded in 2008 by ex-Akamai,

    Mitre, Apple * 120+ employees; > 50% eng; distributed company * Sponsors of Riak, the Apache 2.0-licensed project * Basho sells Riak add-ons: Riak EDS and Riak CS
  3. 4.
  4. 7.

    "Distributed, masterless, highly-available key/value store." * Deployed as cluster of

    nodes (>= 5), incrementally scalable * Any node can coordinate requests, data replicated to 3 nodes by default, no SPOF
  5. 8.

    "Distributed, masterless, highly-available key/value store." * Deployed as cluster of

    nodes (>= 5), incrementally scalable * Any node can coordinate requests, data replicated to 3 nodes by default, no SPOF * Eventually consistent with automatic failover
  6. 9.

    "Distributed, masterless, highly-available key/value store." * Deployed as cluster of

    nodes (>= 5), incrementally scalable * Any node can coordinate requests, data replicated to 3 nodes by default, no SPOF * Eventually consistent with automatic failover * Key/value model with additional query methods
  7. 10.
  8. 12.

    // When to use Riak? * Data is critical and

    always needs to be available
  9. 13.

    // When to use Riak? * Data is critical and

    always needs to be available * Database must always accept writes
  10. 14.

    // When to use Riak? * Data is critical and

    always needs to be available * Database must always accept writes * Scale horizontally
  11. 15.

    // When to use Riak? * Data is critical and

    always needs to be available * Database must always accept writes * Scale horizontally * Focus on SLAs, tail latency
  12. 20.

    // Riak Object * Key/value + metadata * Fundamental unit

    of replication * Soft limit of ~4MB on object size [0]
  13. 21.

    // Riak Object * Key/value + metadata * Fundamental unit

    of replication * Soft limit of ~4MB on object size [0] [0] Hardware dependent
  14. 25.

    // Buckets * Virtual namespace * Bucket and key produce

    object's address * No relationships between buckets
  15. 32.

    // MapReduce * Distributed processing system using Riak Pipe [0]

    * Efficient for targeted queries over known key range
  16. 33.

    // MapReduce * Distributed processing system using Riak Pipe [0]

    * Efficient for targeted queries over known key range * Write jobs in Erlang or JavaScript
  17. 34.

    // MapReduce * Distributed processing system using Riak Pipe [0]

    * Efficient for targeted queries over known key range * Write jobs in Erlang or JavaScript [0] https://github.com/basho/riak_pipe
  18. 37.

    // Secondary indexes (2i) * Riak Objects tagged with custom

    metadata * Exact match and range queries
  19. 38.

    // Secondary indexes (2i) * Riak Objects tagged with custom

    metadata * Exact match and range queries * Pagination support coming in 1.4
  20. 40.

    // Riak Search (soon to be Yokozuna) * Store and

    index JSON, XML, TXT documents
  21. 41.

    // Riak Search (soon to be Yokozuna) * Store and

    index JSON, XML, TXT documents * Supports subset of Solr API
  22. 42.

    // Riak Search (soon to be Yokozuna) * Store and

    index JSON, XML, TXT documents * Supports subset of Solr API * Yokozuna will support Distributed Solr API [0]
  23. 43.

    // Riak Search (soon to be Yokozuna) * Store and

    index JSON, XML, TXT documents * Supports subset of Solr API * Yokozuna will support Distributed Solr API [0] [0] http://wiki.apache.org/solr/DistributedSearch
  24. 47.

    // Future work * Dynamic ring sizing * Secondary index

    (2i) improvements * Convergent replicated data types (CRDTs)
  25. 48.

    // Future work * Dynamic ring sizing * Secondary index

    (2i) improvements * Convergent replicated data types (CRDTs) * Yokozuna
  26. 49.
  27. 51.

    // Why is Riak an attractive option for AWS? *

    Masterless design allows data to remain available during instance failure
  28. 52.

    // Why is Riak an attractive option for AWS? *

    Masterless design allows data to remain available during instance failure * Spin up a another instance to add more cluster resources
  29. 53.

    // Why is Riak an attractive option for AWS? *

    Masterless design allows data to remain available during instance failure * Spin up a another instance to add more cluster resources * Replicate clusters between Availability Zones (AZs) or Regions
  30. 56.

    // Operating system * Debian * Ubuntu * Fedora *

    Red Hat * FreeBSD * SmartOS * OmniOS * Solaris
  31. 61.
  32. 62.

    // Virtual Private Cloud (VPC) * Use private IPs and

    hostnames for Riak node names * Nodes on a private subnet require a NAT instance
  33. 63.

    // Virtual Private Cloud (VPC) * Use private IPs and

    hostnames for Riak node names * Nodes on a private subnet require a NAT instance * Elastic Load Balancers (ELB) must be on a public subnet in order to accept traffic from outside the VPC
  34. 64.

    // Virtual Private Cloud (VPC) * Use private IPs and

    hostnames for Riak node names * Nodes on a private subnet require a NAT instance * Elastic Load Balancers (ELB) must be on a public subnet in order to accept traffic from outside the VPC * Create network ACLs and security groups within the VPC
  35. 67.

    // Security groups * Firewall that lives above instances *

    Allow all traffic between nodes in the cluster
  36. 68.

    // Security groups * Firewall that lives above instances *

    Allow all traffic between nodes in the cluster * Poke holes for ports 8098 (HTTP) and 8087 (PBC) to networks that contain Riak clients
  37. 71.

    // Instance type * m1.large and up * Cluster compute

    instances provide 10 GigE between nodes
  38. 72.

    // Instance type * m1.large and up * Cluster compute

    instances provide 10 GigE between nodes * 10 GigE and SSDs: hi1.4xlarge
  39. 75.

    // Storage type * Ephemeral or instance store * Elastic

    Block Store (EBS) with provisioned IOPS (PIOPS) and EBS-optimized instances
  40. 76.

    // Storage type * Ephemeral or instance store * Elastic

    Block Store (EBS) with provisioned IOPS (PIOPS) and EBS-optimized instances * SSD options are currently all ephemeral
  41. 83.

    // Protips * Use a configuration management framework * Evaluate

    CloudFormation * Use our AMIs in the AWS Marketplace
  42. 84.

    // Protips * Use a configuration management framework * Evaluate

    CloudFormation * Use our AMIs in the AWS Marketplace * Monitor with CloudWatch, but also make use of the statistics Riak emits
  43. 85.

    // Protips * Use a configuration management framework * Evaluate

    CloudFormation * Use our AMIs in the AWS Marketplace * Monitor with CloudWatch, but also make use of the statistics Riak emits * Assess whether you need to span Availability Zones
  44. 86.