Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Life After EC2

Life After EC2

A journey from slow recovery to realized potential.

Elasticsearch Inc

October 10, 2013
Tweet

More Decks by Elasticsearch Inc

Other Decks in Technology

Transcript

  1. EC2 40 (data) nodes 1 index 500 shards 12.5T (primaries)

    1 replica 1.6B docs (Jul 2013) Friday, October 11, 13
  2. Carpathia 8 (data) nodes 1 index 128 shards 1 replica

    14 x 600G SSD 32 cores, 64G RAM Friday, October 11, 13
  3. We are upgrading our new search cluster from 0.90.1 to

    0.90.3 The shard sizes are ~100GB on average, and it is taking an obscenely long time to recover shards on the nodes we have restarted. The restart took place roughly 45 minutes ago, and not a single shard has fully recovered yet. The load on the machines is minimal as is disk IO and network IO. We've bumped the node_concurrent_recoveries to 6. But how long should this take? #1004 Tim Pease, 8 Aug 2013 Friday, October 11, 13
  4. Jeez! It has been five hours now and only 5

    of the 128 shards have recovered. At this rate it will take a full week to get the cluster into a green state. ... Friday, October 11, 13
  5. First things first Friday, October 11, 13 Any anomalies in

    the dashboards? GitHub has *excellent* monitoring...
  6. dd if=/dev/zero of=/tmp/file... scp /tmp/file host2:/tmp Friday, October 11, 13

    Check the network... Hm, no way 10gigE is that slow No rush, let’s sleep on it
  7. dd if=/dev/zero of=/tmp/file... scp /tmp/file host2:/tmp ....66M/s Friday, October 11,

    13 Check the network... Hm, no way 10gigE is that slow No rush, let’s sleep on it
  8. curl -s http://git.io/KlTPxw | sh Friday, October 11, 13 OK,

    I think I have enough evidence here...
  9. curl -s http://git.io/KlTPxw | sh --- /tmp/1 2013-08-08 21:34:59.352499371 -0700

    +++ /tmp/2 2013-08-08 21:35:29.404911659 -0700 @@ -66,13 +66,13 @@ -code-search-1 46 r 216782024539 172.16.12.13 codesearch-storage7 +code-search-1 46 r 217412218715 172.16.12.13 codesearch-storage7 Friday, October 11, 13 OK, I think I have enough evidence here...
  10. curl -s http://git.io/KlTPxw | sh --- /tmp/1 2013-08-08 21:34:59.352499371 -0700

    +++ /tmp/2 2013-08-08 21:35:29.404911659 -0700 @@ -66,13 +66,13 @@ -code-search-1 46 r 216782024539 172.16.12.13 codesearch-storage7 +code-search-1 46 r 217412218715 172.16.12.13 codesearch-storage7 ...20M/s Friday, October 11, 13 OK, I think I have enough evidence here...
  11. Friday, October 11, 13 Per node! Why didn’t this help?

    Probably not blocked on deciding where shards go
  12. P P Friday, October 11, 13 Per node! Why didn’t

    this help? Probably not blocked on deciding where shards go
  13. P P R R Friday, October 11, 13 Per node!

    Why didn’t this help? Probably not blocked on deciding where shards go
  14. P P R R cluster .routing .allocation .concurrent_recoveries Friday, October

    11, 13 Per node! Why didn’t this help? Probably not blocked on deciding where shards go
  15. P R Friday, October 11, 13 Chunks (default 512k) read

    & write by max_bytes * ns Setting which controls that... Anyone know the default? Incidentally...
  16. P R Friday, October 11, 13 Chunks (default 512k) read

    & write by max_bytes * ns Setting which controls that... Anyone know the default? Incidentally...
  17. P R indices.recovery.max_bytes_per_sec Friday, October 11, 13 Chunks (default 512k)

    read & write by max_bytes * ns Setting which controls that... Anyone know the default? Incidentally...
  18. P R indices.recovery.max_bytes_per_sec 20M/s Friday, October 11, 13 Chunks (default

    512k) read & write by max_bytes * ns Setting which controls that... Anyone know the default? Incidentally...
  19. curl -XPUT localhost:9202/_cluster/settings -d' { "transient": { "indices.recovery.concurrent_streams": 12, "indices.recovery.max_bytes_per_sec":

    "500mb" } } ' Friday, October 11, 13 Let’s see if we can move the needle Also bump up concurrent_streams to handle interleaving
  20. Friday, October 11, 13 Only one thread active, writes very

    erratic “Nodes basically bored” Nothing else throttled in ES; what’s it doing?
  21. Friday, October 11, 13 Where did we see that before?

    The file copy from our lame network test! We weren’t testing just the network!
  22. 66M/s Friday, October 11, 13 Where did we see that

    before? The file copy from our lame network test! We weren’t testing just the network!
  23. Disk Disk Kernel Kernel n1 n2 Network e t h

    0 e t h 0 Friday, October 11, 13
  24. Disk Disk Kernel Kernel n1 n2 Network e t h

    0 e t h 0 scp Friday, October 11, 13
  25. Disk Disk Kernel Kernel n1 n2 Network e t h

    0 e t h 0 iperf scp Friday, October 11, 13
  26. C F Q Friday, October 11, 13 Reorders access by

    sector ID Designed to most efficiently use rotational media and for multi-user systems, unlike db server * Why is this useless here? (SSD (plus RAID!))
  27. C F Q ompletely Friday, October 11, 13 Reorders access

    by sector ID Designed to most efficiently use rotational media and for multi-user systems, unlike db server * Why is this useless here? (SSD (plus RAID!))
  28. C F Q ompletely air Friday, October 11, 13 Reorders

    access by sector ID Designed to most efficiently use rotational media and for multi-user systems, unlike db server * Why is this useless here? (SSD (plus RAID!))
  29. C F Q ompletely air ueuing Friday, October 11, 13

    Reorders access by sector ID Designed to most efficiently use rotational media and for multi-user systems, unlike db server * Why is this useless here? (SSD (plus RAID!))
  30. N Friday, October 11, 13 Removes all reordering, gets the

    kernel out of the IO game Also deadline, which reorders based on time, didn’t make a difference
  31. Noop Friday, October 11, 13 Removes all reordering, gets the

    kernel out of the IO game Also deadline, which reorders based on time, didn’t make a difference
  32. Defaults Friday, October 11, 13 ES awesome defaults, but tuned

    for ec2 Improving this with more extensive documentation ...big part of having a company behind ES
  33. Friday, October 11, 13 with raid or ssd: noop, otherwise

    experiment indices.* <- still node-level here!
  34. scheduler Friday, October 11, 13 with raid or ssd: noop,

    otherwise experiment indices.* <- still node-level here!
  35. indices.recovery.max_bytes_per_sec scheduler Friday, October 11, 13 with raid or ssd:

    noop, otherwise experiment indices.* <- still node-level here!
  36. Monitoring Friday, October 11, 13 Doesn’t have to be perfect.

    Do it tonight. You cannot make engineering decisions without it. Translates “hrm, this is taking forever” to *action* We’re working on helping you here.