Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Elasticsearch and Resiliency

Elastic Co
February 18, 2016

Elasticsearch and Resiliency

Resiliency is a primary development focus in Elasticsearch. A lot of improvements were released in ES 2.0, and even more is coming in future releases. From faster recovery times to more durable writes, attend this session to learn everything you ever wanted to know about resiliency in Elasticsearch.

Elastic Co

February 18, 2016
Tweet

More Decks by Elastic Co

Other Decks in Technology

Transcript

  1. ‹#› noun 1. the power or ability to return to

    the original form, position, etc., after being bent, compressed, or stretched; elasticity. 2. ability to recover readily from illness, depression, adversity, or the like; buoyancy re·sil·ience
  2. Disk based data structures • Doc values (Lucene’s columnar store)

    • on by default from 2.0 (#10209) • Norms stay on disk in Lucene 5.3.0 • LUCENE-6504 • released in ES 2.1.0 7
  3. Smarter algorithms • Automatic query caching • #10897 (ES 2.0)

    • LUCENE-6077 • Breadth first aggregation trees • Added in #6128 (ES 1.3) • Working on automatic application #9825 8
  4. Prevent abuse #11511 • Limit total number of hit count

    #9311 (2.1.0) • Circuit breakers have been around since 1.0 • Recent additions #16011: ‒ Limit the size of single request ‒ Limit total the size of inflight concurrent request • ThreadPools hard upper bounds #15582 • Settings validation #15278 9
  5. node4 Temporary Node Leave - Promote Primary & Add Replicas

    15 b2 node1 a0 b1 b2 node3 b1 a0 node2 a1 b0 b0 a1 b1 a1 needed but potentially expensive
  6. node4 Temporary Node Leave - A Grace Period #11712 (1.7)

    16 b2 node1 a0 b1 b2 node3 b1 a0 node2 a1 b0 b0 a1 index.unassigned.node_left.delayed_timeout: 1m
  7. node4 Still Requires A Recovery When Node Re-Join 17 b2

    node1 a0 b1 b2 node3 b1 a0 node2 a1 b0 b0 a1
  8. Resync Files With Primary 18 segment 1 + 2 segment

    3 segment 4 + 5 Primary segment 2 + 3 segment 1 Replica segment 4 segment 5 reuse existing segments?
  9. Re-Sync Files With Primary & Synced Flush #10032 (1.6) 19

    segment 1 + 2 segment 3 segment 4 + 5 Primary segment 2 + 3 segment 1 Replica segment 4 segment 5 automatically use inactivity periods to add a
 sync id marker, guaranteeing doc level equality & instant recovery sync_id: 0XYB321 sync_id: 0XYB321
  10. node4 Cancel Ongoing Recoveries If A Perfect Match Found #12421

    (2.0) 20 b2 node1 a0 b1 b2 node3 b1 a0 node2 a1 b0 b0 a1 b1 a1
  11. Durability 22 index a doc time lucene flush buffer index

    a doc buffer index a doc buffer buffer segment
  12. Durability 23 index a doc time lucene flush buffer segment

    trans_log buffer trans_log buffer trans_log elasticsearch flush doc op lucene commit segment segment
  13. Durability - fsync translog every 5s (1.X) 24 index a

    doc buffer trans_log doc op index a doc buffer trans_log doc op Primary Replica redundancy doesn’t help if all nodes lose power
  14. Durability - Translog fsync on every request #11011 (2.0) 25

    • For low volume indexing, fsync matters less • For high volume indexing, we can amortize the costs and fsync on every bulk • Concurrent requests can share an fsync
  15. Durability - Recovery after Translog Crash #11143 (2.0) 27 fsync

    • fsync every 5s means some ops can be partially written • 1.x had to ignore EOF exceptions, making it non-resilient to translog trim • fsync every request still suffers from this • Write a check point file every fsync #12341 2.0 { offset: 3034, ops: 1302 }.ckp
  16. Dynamic Mappings 29 curl -XPUT localhost:9200/twitter/tweet -d ' { "id":

    "467495741683150848", "text": "the problem with distributed systems jokes is that you're never sure if everyone gets it" "created_on": "Sat May 17 02:44:05 +0000 2014" }' { "id": { "type" : "string" } "text": { "type" : "string" } "created_on": { "type" : "date" } }
  17. Dynamic Mappings Optimize for Speed 1.x Assumption: there is a

    "true" schema, we just need to learn it 31 master node node {"i": 1} {"i": 10} "i": int "i": int
  18. Dynamic Mappings Optimize for Speed 1.x If you make assumptions,

    you're going to have a bad time 32 master node node {"f": 1} {"f": "text"} "f": int "f": string ?
  19. Dynamic Mappings 2.x PR #10634 Validate dynamic mapping updates on

    the master node 33 master node node {"f": 1} {"f": "text"} "f": int "f": string
  20. Dynamic Mappings 2.x PR #10634 Validate dynamic mapping updates on

    the master node 34 master node node {"f": 1} {"f": "text"} "f": int "f": string ✔ ✔
  21. Allocate primary shards using allocation IDs #14739 Persist allocation IDs

    as metadata 43 node1 a0 node2 a0 9154ff 5aa72e master a0: 9154ff, 5aa72e
  22. Allocate primary shards using allocation IDs #14739 Index document 44

    node1 a0 node2 a0 6f91cc master a0: 6f91cc 9154ff
  23. Isolation during indexing issue #7572 Primary fails to notify master

    and acknowledges the write 53 node1 a0 node2 a0 master
  24. Isolation during indexing issue #7572 The bad replica is promoted

    to primary and acknowledged writes are lost 54 node1 a0 node2 a0 master
  25. Waiting while failing a shard issue #14252 56 Wait for

    failure publication Master left No longer the primary • Master submits cluster state update task • And waits for the update to be processed • Master responds with success or failure • Enter retry loop waiting for new cluster state with a master changed event • The same retry loop is used for any master channel failure • The old primary is failed • Retry the request on the new primary • The new primary now manages the indexing request The request is no longer acknowledged if the shard is not failed
  26. Master Election 2.x PR #12161 Elected master waits for joins

    70 master1 master2 master3 Join Join ❓
  27. Master Election 2.x PR #12161 Enough nodes join, master is

    elected 71 master1 master2 master3 CS CS
  28. DiscoveryWithServiceDisruptionsIT testFailWithMinimumMasterNodesConfigured testNodesFDAfterMasterReelection testVerifyApiBlocksDuringPartition testIsolateMasterAndVerifyClusterStateConsensus testMasterNodeGCs testStaleMasterNotHijackingMajority testRejoinDocumentExistsInAllShardCopies testUnicastSinglePingResponseContainsMaster testIsolatedUnicastNodes

    testClusterJoinDespiteOfPublishingIssues testSendingShardFailure testClusterFormingWithASlowNode testNodeNotReachableFromMaster testSearchWithRelocationAndSlowClusterStateProcessing testIndexImportedFromDataOnlyNodesIfMasterLostDataFolder testIndicesDeleted Resiliency tests Service disruptions tests 75
  29. Resiliency updates Where to go 79 Reporting: https://github.com/elastic/elasticsearch/issues 1 Tracking:

    https://github.com/elastic/elasticsearch/issues?q=label%3Aresiliency 3 Status: https://www.elastic.co/guide/en/elasticsearch/resiliency/current/index.html 4 2015: https://www.elastic.co/elasticon/2015/sf/resiliency-in-elasticsearch-and-lucene 5 Discourse: https://discuss.elastic.co/c/elasticsearch 2