Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Tools, Testing and Disasters at Basho

7c4bac30ed2d3a9d346ced746b1d985d?s=47 Tom Santero
October 18, 2012

Tools, Testing and Disasters at Basho

Slides from my talk at the DevOpsATL meetup on 10/18/2012. [http://www.meetup.com/DevOpsATL/events/82032472/] Discussing Riak, DevOps at Basho and what happens when things go wrong.

7c4bac30ed2d3a9d346ced746b1d985d?s=128

Tom Santero

October 18, 2012
Tweet

Transcript

  1. Tom Santero @tsantero October 18, 2012 - ATL DevOps Meetup

    Tools, Testing and Disasters Basho at Friday, October 19, 12
  2. $ whoami Tom Santero @tsantero on Twitter tsantero@basho.com I <3

    open source software, beer and kitty cats github.com/tsantero Friday, October 19, 12
  3. http://ricon2012.com videos will be posted soon! Friday, October 19, 12

  4. Riak Design Goals high-availability low-latency horizontal scalability fault tolerance ops

    friendliness predictability Friday, October 19, 12
  5. Riak Design Goals high-availability low-latency horizontal scalability fault tolerance ops

    friendliness predictability Friday, October 19, 12
  6. Riak Design Goals high-availability low-latency horizontal scalability fault tolerance ops

    friendliness predictability Friday, October 19, 12
  7. Riak Design Goals high-availability low-latency horizontal scalability fault tolerance ops

    friendliness predictability Friday, October 19, 12
  8. Riak Design Goals high-availability low-latency horizontal scalability fault tolerance ops

    friendliness predictability Friday, October 19, 12
  9. Riak Design Goals high-availability low-latency horizontal scalability fault tolerance ops

    friendliness predictability Friday, October 19, 12
  10. Riak Design Goals high-availability low-latency horizontal scalability fault tolerance ops

    friendliness predictability Friday, October 19, 12
  11. Riak Design Goals high-availability low-latency horizontal scalability fault tolerance ops

    friendliness predictability Friday, October 19, 12
  12. KEY CONCEPTS Friday, October 19, 12

  13. key/value store key/value pairs mapped to buckets values are opaque

    objects stored as binaries on disk key value key value key value key value bucket Friday, October 19, 12
  14. masterless a deployed as a cluster of nodes e b

    d c any node can serve any request Friday, October 19, 12
  15. masterless a deployed as a cluster of nodes request e

    b d c any node can serve any request Friday, October 19, 12
  16. masterless a deployed as a cluster of nodes e b

    d c any node can serve any request Friday, October 19, 12
  17. masterless a deployed as a cluster of nodes e b

    d c any node can serve any request payload Friday, October 19, 12
  18. THE HOOD UNDER Friday, October 19, 12

  19. consistent hashing replicas virtual nodes (vnodes) handoff gossip protocols anti-entropy

    REPLICATION and ADMINISTRATION Friday, October 19, 12
  20. THE RING DIVIDED EVENLY INTO 160-BIT INT KEYSPACE PARTITIONS Friday,

    October 19, 12
  21. VNODES PARTITIONS node 0 node 1 node 2 CLAIM Friday,

    October 19, 12
  22. VNODES PARTITIONS node 0 node 1 node 2 Friday, October

    19, 12
  23. REBALANCE VNODES PARTITIONS node 0 node 1 node 2 node

    3 + Friday, October 19, 12
  24. DEMO (can someone sacri!ce a few bits to the demo

    gods?) Friday, October 19, 12
  25. Quorum requests N R W PR/PW DR/DW Friday, October 19,

    12
  26. consistent hashing node 0 node 1 node 2 node 3

    hash(“meetups/DevOpsATL”) N = 3 Friday, October 19, 12
  27. consistent hashing node 0 node 1 node 2 node 3

    hash(“meetups/DevOpsATL”) N = 3 Friday, October 19, 12
  28. disaster scenario node 0 node 1 node 3 node 2

    Friday, October 19, 12
  29. disaster scenario node 0 node 1 node 3 node 2

    Friday, October 19, 12
  30. disaster scenario node 0 node 1 node 3 node 2

    requests go to fallback Friday, October 19, 12
  31. disaster scenario node 0 node 1 node 3 node 2

    Friday, October 19, 12
  32. disaster scenario node 0 node 1 node 3 node 2

    node comes back online Friday, October 19, 12
  33. disaster scenario node 0 node 1 node 3 node 2

    Friday, October 19, 12
  34. disaster scenario node 0 node 1 node 3 node 2

    normal operations resume Friday, October 19, 12
  35. vector clocks establish temporality Friday, October 19, 12

  36. vector clocks establish temporality Friday, October 19, 12

  37. access your programmatically DATA with Erlang Ruby Python Java Python

    Haskell Perl Go Clojure OCaml Node.JS Dart ...your language of choice Friday, October 19, 12
  38. GET PUT DELETE CRUD CRUD CRUD CRUD CRUD CRUD CRUD

    CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD CRUD Friday, October 19, 12
  39. map/reduce Friday, October 19, 12

  40. store and index documents Riak Search JSON / XML plain

    text english prose query against VALUES Friday, October 19, 12
  41. 2i Secondary Indexes Riak Object Riak Object Riak Object X-Riak-Index-email_bin

    X-Riak-Index-email_bin X-Riak-Index-email_bin “tsantero@basho.com” “mark@basho.com” “mark@basho.com” “tsantero@basho.com” Friday, October 19, 12
  42. 2i Secondary Indexes Riak Object Riak Object Riak Object tag

    objects X-Riak-Index-email_bin X-Riak-Index-email_bin X-Riak-Index-email_bin “tsantero@basho.com” “mark@basho.com” “mark@basho.com” “tsantero@basho.com” Friday, October 19, 12
  43. pluggable backends Friday, October 19, 12

  44. Bitcask write ahead log in-memory LUT append-only !les Friday, October

    19, 12
  45. LevelDB persistent ordered map generational sstables append-only !les snappy unbounded

    Friday, October 19, 12
  46. Memory Friday, October 19, 12

  47. Multi-backend use all the backends! Friday, October 19, 12

  48. Ask 10 engineers what they think DevOps is and you’ll

    get 12.79234 di"erent answers Friday, October 19, 12
  49. Internal Customers Product basho DevOps at Basho Friday, October 19,

    12
  50. Internal programming and utilities automation improve test suite by 500%

    Friday, October 19, 12
  51. guttersnipe middleware in Vagrant deploy and run test suite no

    need to rebuild clusters 500% increase in e"ciency Friday, October 19, 12
  52. https://github.com/basho/riak-chef-cookbook CHEF COOKBOOKS use to deploy riak open source over

    40 commits in 2012 rolling upgrades, iptables, etc... future development Friday, October 19, 12
  53. Erlang Template Helper https://github.com/basho/erlang_template_helper con!g_to_json json_to_con!g specify Erlang con!g !les

    very useful with Chef Friday, October 19, 12
  54. PUPPET MODULES https://github.com/basho/puppet-riak use to deploy riak just open sourced

    I don’t know puppet, so I’m not going to bullshit you Friday, October 19, 12
  55. PUPPET MODULES https://github.com/basho/puppet-riak use to deploy riak just open sourced

    I don’t know puppet, so I’m not going to bullshit you Friday, October 19, 12
  56. RIAK TEST https://github.com/basho/riak_test utility for testing builds run tests in

    a sandbox reset to clean state after tests easy to bootstrap envs also open source! Friday, October 19, 12
  57. GiddyUp https://github.com/basho/giddyup visual scorecard for Riak Test seed Riak Test

    with list of tests to be run on each platform receive test results and logs via REST interface Friday, October 19, 12
  58. Friday, October 19, 12

  59. Investing in Technology Friday, October 19, 12

  60. DataCenter 300 cpu cores 1 TB RAM 500 TB storage

    gigabit internet Friday, October 19, 12
  61. DataCenter 300 cpu cores 1 TB RAM 500 TB storage

    gigabit internet Friday, October 19, 12
  62. Product reduce the overall cost of engineering enhance dat dev

    cycle Friday, October 19, 12
  63. Example #1 pushing the limits with socket performance 4gbps over

    8 erl sockets Friday, October 19, 12
  64. Example #2 General Performance Testing and performing full-sync DC replication

    at scale Friday, October 19, 12
  65. testing repl at SCALE Friday, October 19, 12

  66. Customers replicate issues support spec + test hardware Friday, October

    19, 12
  67. Example #3 replicating customer issues with replication Friday, October 19,

    12
  68. “In both cases I was unable to replicate these issues

    on the hardware available to me locally, both through limited machines (2) and major performance di!erences between nodes ("rst gen mbp vs i7 dell tower)” --Andrew Thompson, Basho Sr. Engineer Friday, October 19, 12
  69. test harness Friday, October 19, 12

  70. !n hugs! Any and all questions can be sent to

    /dev/null Friday, October 19, 12