Upgrade to Pro — share decks privately, control downloads, hide ads and more …

DevOps @ Basho

DevOps @ Basho

Slides from my talk at the NYC DevOps meetup on Aug 23, discussing Riak, and the nature of the DevOps team at Basho.

Tom Santero

August 23, 2012
Tweet

More Decks by Tom Santero

Other Decks in Technology

Transcript

  1. $

  2. whoami $ Name: Tom Santero $ ./presentation Title: Technical Evangelist

    Company: Basho Technologies Twitter: @tsantero
  3. • Overview of Riak • Nature of DevOps @ Basho

    • What has Basho learned from having a DevOps team Tonight’s Agenda
  4. need to know: • Written in Erlang/OTP • distributed, key/value

    store + extras • advanced query features • pre/post commit hooks • pluggable backend storage engines • open source (Apache v2.0)
  5. key

  6. Consistent Hashing • 160-bit integer keyspace • divided into !xed

    number of evenly-sized partitions 32 partitions 0 2160/2 2160/4
  7. Consistent Hashing • 160-bit integer keyspace • divided into !xed

    number of evenly-sized partitions • partitions are claimed by nodes in the cluster 32 partitions node 0 node 1 node 2 node 3 0 2160/2 2160/4
  8. Consistent Hashing • 160-bit integer keyspace • divided into !xed

    number of evenly-sized partitions • partitions are claimed by nodes in the cluster • replicas go to the N partitions following the key node 0 node 1 node 2 node 3
  9. Consistent Hashing • 160-bit integer keyspace • divided into !xed

    number of evenly-sized partitions • partitions are claimed by nodes in the cluster • replicas go to the N partitions following the key node 0 node 1 node 2 node 3 hash(“meetups/nycdevops”) N=3
  10. Disaster Scenario • node fails • requests go to fallback

    X X X X X X X X hash(“meetups/nycdevops”)
  11. Disaster Scenario • node fails • requests go to fallback

    • node comes back hash(“meetups/nycdevops”)
  12. Disaster Scenario • node fails • requests go to fallback

    • node comes back • “Hando"” - data returns to recovered node hash(“meetups/nycdevops”)
  13. Disaster Scenario • node fails • requests go to fallback

    • node comes back • “Hando"” - data returns to recovered node • normal operations resume hash(“meetups/nycdevops”)
  14. DevOps Team • Brief History: • uno#cially created in Oct

    2011 • o#cially recognized in June 2012 • pilot program (to be discussed)
  15. DevOps - Internal • mix of programming and utilities •

    automate all the things! • instrumental in improving Riak Test Suite • 1,000s of separate tests • various con!gs / platforms
  16. guttersnipe • middleware in Vagrant • allows us to: •

    deploy + run entire test suite • no need to rebuild entire clusters • 500% increase in e#ciency
  17. Chef Cookbooks • open source: • $ git clone [email protected]:basho/riak-chef-cookbook.git

    • over 30 commits in 2012 • deploy riak • future: • simplify use • rolling upgrades, autocon!g, iptables
  18. Erlang Template Helper • also open source: • $ git

    clone [email protected]:basho/erlang_template_helper.git • written by Dan Reverri • specify Erlang con!g and args !les in JSON
  19. ETH Example $ irb >> require ‘erlang_template_helper’ => true >>

    args = Eth::Args.new({“-name” => “[email protected]”, “-env” => {“ERL_MAX_PORTS” => 4096}}) => -name [email protected] -env ERL_MAX_PORTS 4096 >> puts args.pp -name [email protected] -env ERL_MAX_PORTS 4096 => nil
  20. ETH Example $ .bin/config_to_json multi_backend.config -p { "riak_kv": { "storage_backend":

    "riak_kv_multi_backend", "multi_backend_default": "first_backend", "multi_backend": [ ["__tuple", "first_backend", "riak_kv_bitcask_backend", { "data_root": "__string_/var/lib/riak/bitcask"}], ["__tuple", "second_backend", "riak_kv_leveldb_backend", { "data_root": "__string_/var/lib/riak/leveldb"}] ] } }
  21. DataCenter Specs • 300 cpu cores • 500 TB storage

    • ~1 TB RAM • gigabit internet connection
  22. DevOps - Product • Basho Engineers all have test boxes

    • Now DevOps delegates DC resources • Examples of Product Enhancements:
  23. Testing repl at Scale • full-sync repl • bottleneck in

    how keys were being read • lexicographical vs disk-ordered • used the Boston datacenter to prototype !x • currently in review to be merged into Riak EE
  24. DevOps - CliServ • replicate customer issues /w datacenter •

    secondary datacenter: • potential customers ship us hardware • spec out + provide best implementations
  25. Realtime repl Throuput • provisioned 5 machines • constrained bandwidth

    and increase latency between clusters • but not among nodes in the same cluster (‘management’ network as WAN approx) • discovered + eliminated several bottlenecks • customer achieved 10x jump in performance
  26. “In both cases I was unable to replicate these issues

    on the hardware available to me locally, both through limited machines (2) and major performance di!erences between nodes ("rst gen mbp vs i7 dell tower)” --Andrew Thompson, Basho Engineer
  27. RiakCS • released March 27, 2012 • shipped RiakCS v1.1

    on Tuesday • S3-compatible cloud storage • built on Riak • multi-tenancy, multi billing, etc...
  28. Riak CS Large Object Reporting API S3 API Riak CS

    Reporting API S3 API Riak CS Reporting API S3 API Riak CS Reporting API S3 API Riak CS Reporting API S3 API Riak Node Riak Node Riak Node Riak Node Riak Node
  29. Riak CS Large Object Reporting API S3 API Riak CS

    Reporting API S3 API Riak CS Reporting API S3 API Riak CS Reporting API S3 API Riak CS Reporting API S3 API Riak Node Riak Node Riak Node Riak Node Riak Node
  30. Riak CS Large Object Reporting API S3 API Riak CS

    Reporting API S3 API Riak CS Reporting API S3 API Riak CS Reporting API S3 API Riak CS Reporting API S3 API Riak Node Riak Node Riak Node Riak Node Riak Node 1mb 1mb 1mb 1mb
  31. Riak CS Large Object Reporting API S3 API Riak CS

    Reporting API S3 API Riak CS Reporting API S3 API Riak CS Reporting API S3 API Riak CS Reporting API S3 API Riak Node Riak Node Riak Node Riak Node Riak Node 1mb 1mb 1mb 1mb
  32. Riak CS Large Object Reporting API S3 API Riak CS

    Reporting API S3 API Riak CS Reporting API S3 API Riak CS Reporting API S3 API Riak CS Reporting API S3 API Riak Node Riak Node Riak Node Riak Node Riak Node 1mb 1mb 1mb 1mb
  33. shameless plugs: • Basho is Hiring DevOps Engineers: • work

    remotely • hack on cool shit • help make Basho and Riak better • send CV to Sean Carey [email protected]