Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Deploying Riak on Amazon Web Services

Deploying Riak on Amazon Web Services

A brief overview of Riak [0], followed by details on how to deploy Riak on Amazon Web Services (AWS).

Toward the end of the deck, there is a section on tuning. The gist [1] below contains a Bash script to tune Linux.

[0] http://basho.com/riak/
[1] https://gist.github.com/hectcastro/5651350

Hector Castro

June 01, 2013
Tweet

More Decks by Hector Castro

Other Decks in Technology

Transcript

  1. // Who are we? * Founded in 2008 by ex-Akamai,

    Mitre, Apple * 120+ employees; > 50% eng; distributed company * Sponsors of Riak, the Apache 2.0-licensed project * Basho sells Riak add-ons: Riak EDS and Riak CS
  2. "Distributed, masterless, highly-available key/value store." * Deployed as cluster of

    nodes (>= 5), incrementally scalable * Any node can coordinate requests, data replicated to 3 nodes by default, no SPOF
  3. "Distributed, masterless, highly-available key/value store." * Deployed as cluster of

    nodes (>= 5), incrementally scalable * Any node can coordinate requests, data replicated to 3 nodes by default, no SPOF * Eventually consistent with automatic failover
  4. "Distributed, masterless, highly-available key/value store." * Deployed as cluster of

    nodes (>= 5), incrementally scalable * Any node can coordinate requests, data replicated to 3 nodes by default, no SPOF * Eventually consistent with automatic failover * Key/value model with additional query methods
  5. // When to use Riak? * Data is critical and

    always needs to be available
  6. // When to use Riak? * Data is critical and

    always needs to be available * Database must always accept writes
  7. // When to use Riak? * Data is critical and

    always needs to be available * Database must always accept writes * Scale horizontally
  8. // When to use Riak? * Data is critical and

    always needs to be available * Database must always accept writes * Scale horizontally * Focus on SLAs, tail latency
  9. // Riak Object * Key/value + metadata * Fundamental unit

    of replication * Soft limit of ~4MB on object size [0]
  10. // Riak Object * Key/value + metadata * Fundamental unit

    of replication * Soft limit of ~4MB on object size [0] [0] Hardware dependent
  11. // Buckets * Virtual namespace * Bucket and key produce

    object's address * No relationships between buckets
  12. // MapReduce * Distributed processing system using Riak Pipe [0]

    * Efficient for targeted queries over known key range
  13. // MapReduce * Distributed processing system using Riak Pipe [0]

    * Efficient for targeted queries over known key range * Write jobs in Erlang or JavaScript
  14. // MapReduce * Distributed processing system using Riak Pipe [0]

    * Efficient for targeted queries over known key range * Write jobs in Erlang or JavaScript [0] https://github.com/basho/riak_pipe
  15. // Secondary indexes (2i) * Riak Objects tagged with custom

    metadata * Exact match and range queries
  16. // Secondary indexes (2i) * Riak Objects tagged with custom

    metadata * Exact match and range queries * Pagination support coming in 1.4
  17. // Riak Search (soon to be Yokozuna) * Store and

    index JSON, XML, TXT documents
  18. // Riak Search (soon to be Yokozuna) * Store and

    index JSON, XML, TXT documents * Supports subset of Solr API
  19. // Riak Search (soon to be Yokozuna) * Store and

    index JSON, XML, TXT documents * Supports subset of Solr API * Yokozuna will support Distributed Solr API [0]
  20. // Riak Search (soon to be Yokozuna) * Store and

    index JSON, XML, TXT documents * Supports subset of Solr API * Yokozuna will support Distributed Solr API [0] [0] http://wiki.apache.org/solr/DistributedSearch
  21. // Future work * Dynamic ring sizing * Secondary index

    (2i) improvements * Convergent replicated data types (CRDTs)
  22. // Future work * Dynamic ring sizing * Secondary index

    (2i) improvements * Convergent replicated data types (CRDTs) * Yokozuna
  23. // Why is Riak an attractive option for AWS? *

    Masterless design allows data to remain available during instance failure
  24. // Why is Riak an attractive option for AWS? *

    Masterless design allows data to remain available during instance failure * Spin up a another instance to add more cluster resources
  25. // Why is Riak an attractive option for AWS? *

    Masterless design allows data to remain available during instance failure * Spin up a another instance to add more cluster resources * Replicate clusters between Availability Zones (AZs) or Regions
  26. // Operating system * Debian * Ubuntu * Fedora *

    Red Hat * FreeBSD * SmartOS * OmniOS * Solaris
  27. // Virtual Private Cloud (VPC) * Use private IPs and

    hostnames for Riak node names * Nodes on a private subnet require a NAT instance
  28. // Virtual Private Cloud (VPC) * Use private IPs and

    hostnames for Riak node names * Nodes on a private subnet require a NAT instance * Elastic Load Balancers (ELB) must be on a public subnet in order to accept traffic from outside the VPC
  29. // Virtual Private Cloud (VPC) * Use private IPs and

    hostnames for Riak node names * Nodes on a private subnet require a NAT instance * Elastic Load Balancers (ELB) must be on a public subnet in order to accept traffic from outside the VPC * Create network ACLs and security groups within the VPC
  30. // Security groups * Firewall that lives above instances *

    Allow all traffic between nodes in the cluster
  31. // Security groups * Firewall that lives above instances *

    Allow all traffic between nodes in the cluster * Poke holes for ports 8098 (HTTP) and 8087 (PBC) to networks that contain Riak clients
  32. // Instance type * m1.large and up * Cluster compute

    instances provide 10 GigE between nodes
  33. // Instance type * m1.large and up * Cluster compute

    instances provide 10 GigE between nodes * 10 GigE and SSDs: hi1.4xlarge
  34. // Storage type * Ephemeral or instance store * Elastic

    Block Store (EBS) with provisioned IOPS (PIOPS) and EBS-optimized instances
  35. // Storage type * Ephemeral or instance store * Elastic

    Block Store (EBS) with provisioned IOPS (PIOPS) and EBS-optimized instances * SSD options are currently all ephemeral
  36. // Protips * Use a configuration management framework * Evaluate

    CloudFormation * Use our AMIs in the AWS Marketplace
  37. // Protips * Use a configuration management framework * Evaluate

    CloudFormation * Use our AMIs in the AWS Marketplace * Monitor with CloudWatch, but also make use of the statistics Riak emits
  38. // Protips * Use a configuration management framework * Evaluate

    CloudFormation * Use our AMIs in the AWS Marketplace * Monitor with CloudWatch, but also make use of the statistics Riak emits * Assess whether you need to span Availability Zones