Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Resiliency in Elasticsearch and Lucene

Resiliency in Elasticsearch and Lucene

As Elasticsearch clusters grow larger, their resilience to hardware and network failure becomes increasingly important. At Elasticsearch, we invest a LOT in making both Elasticsearch and Apache Lucene both detect and cope with increasingly complex failures.

In this talk we will go through recent highlights and some of the future directions we plan to follow. We will touch all aspects of Elasticsearch, ranging from the lowest level of a single file, through network connection of a single node, and all the way up distributed failures on the cluster level. Even though the talk is about possible failures and various coping strategies, you will also learn about the inner workings of Elasticsearch - an interesting peek under the hood.

Talk given at Elastic{ON} 2015, by Igor Motov & Boaz Leskes

Elasticsearch Inc

March 11, 2015
Tweet

More Decks by Elasticsearch Inc

Other Decks in Technology

Transcript

  1. { } CC-BY-ND 4.0 Resiliency noun 1. the power or

    ability to return to the original form, position, etc., after being bent, compressed, or stretched; elasticity. 2. ability to recover readily from illness, depression, adversity, or the like; buoyancy. 2 re·sil·ience
  2. { } CC-BY-ND 4.0 Average Elasticsearch cluster growth 6 Large

    cluster circa 2011 Large cluster circa 2015
  3. { } CC-BY-ND 4.0 8 Slow Fast Node Cluster kill

    -9 dead disk corruption long GC master gone network disconnects timeouts
  4. { } CC-BY-ND 4.0 9 Slow Fast Node Cluster kill

    -9 dead disk corruption long GC master gone network disconnects timeouts SOFTWARE BUGS
  5. { } CC-BY-ND 4.0 10 Slow Fast Node Cluster kill

    -9 dead disk corruption long GC master gone network disconnects timeouts
  6. { } CC-BY-ND 4.0 Replicas and Transaction Log 13 Replica

    Shard Lucene Index Transaction Log Lucene Buffer Primary Shard Lucene Index Transaction Log Lucene Buffer
  7. { } CC-BY-ND 4.0 Replicas and Transaction Log 14 Replica

    Shard Lucene Index Transaction Log Lucene Buffer Primary Shard Lucene Index Transaction Log Lucene Buffer
  8. { } CC-BY-ND 4.0 Replicas and Transaction Log 15 Replica

    Shard Lucene Index Transaction Log Lucene Buffer Primary Shard Lucene Index Transaction Log Lucene Buffer
  9. { } CC-BY-ND 4.0 Replicas and Transaction Log 16 Replica

    Shard Lucene Index Transaction Log Lucene Buffer Primary Shard Lucene Index Transaction Log Lucene Buffer
  10. { } CC-BY-ND 4.0 Transaction Log • Transaction Log –

    stores every operation (create/update/delete) – fsync-ed every 5 sec (configurable) • Lucene Segments – fsync-ed when transaction log is full (every 30 min or 200mb) 17
  11. { } CC-BY-ND 4.0 Hard Drive Failures • Complete failure

    • Running out of disk space • Data corruption 19
  12. { } CC-BY-ND 4.0 Multi data paths 21 Disk 1

    Disk 2 Disk3 Disk 4 shard 1 shard 2 shard 3 shard 4
  13. { } CC-BY-ND 4.0 Multi data paths 22 Disk 1

    Disk 2 Disk3 Disk 4 shard 1 shard 2 shard 3 shard 4
  14. { } CC-BY-ND 4.0 Multi data paths 23 Disk 1

    Disk 2 Disk3 Disk 4 shard 1 shard 2 shard 3 shard 4
  15. { } CC-BY-ND 4.0 Complete Disk Failures (WIP) • Current

    multi-path setup strips shard data across multiple disks • Disk loss impacts all shards on node • Reduce failure impact by using one disk per shard 
 (v2.0, #9498) 24
  16. { } CC-BY-ND 4.0 Running out of Disk Space •

    Can lead to truncated files and thus corruption • Easy to anticipate by monitoring • Disk-space aware allocation decider – added in 0.90.4 – enabled by default since 1.3.5 25
  17. { } CC-BY-ND 4.0 Data Corruption • Elasticsearch automatically checks

    checksums of – Small files on index open (since v1.3.0) – All files during replication, relocation, snapshot, and restore (since v1.3.3 - v1.4.0) – Transaction log (since v1.4.0) – Use checksum to identify entire segments to reduce chance of hash collisions (since v1.4.0) 28
  18. { } CC-BY-ND 4.0 Data Corruption (WIP) • Checksums of

    Metadata files (coming in v1.5.0) • Support validation of checksum on all files when node starts (v2.0.0, #9183) • Make validation during merge operation more efficient (v2.0.0, LUCENE-5894) • Add per-segment/per-commit ids (v2.0.0, LUCENE-5895) • Prevent use of known-bad java versions (v1.5.0, #7580) 29
  19. { } CC-BY-ND 4.0 Why nodes leave the cluster? •

    Complete node failure • Unresponsive nodes • Network Failures 31
  20. { } CC-BY-ND 4.0 Biggest Memory User- Field Data •

    sorting • aggregations • doc[“foo”] in scripts • Parent-child id cache – has_child/has_parent queries 33
  21. { } CC-BY-ND 4.0 Circuit Breakers • Estimate size of

    the field data for each query and fail the query if it tries to load too much data – field data (since v1.0.0) – parent-child (since v1.1.0) – some aggregation structures (since v1.4.0) 34
  22. { } CC-BY-ND 4.0 Doc values • On-disk low memory

    alternative to field data • Significant performance improvements in v1.4.0 35
  23. { } CC-BY-ND 4.0 OOM Resiliency (WIP) • Add hard

    limit on from/size (v1.5.0, #9311) • Add hit size circuit breaker (TBD, #9310) • Prevent combinatorial explosion in aggregations (TBD, #8081) • Smarter filter caching (LUCENE-6303, ES 2.0) 36
  24. { } CC-BY-ND 4.0 Dedicate Master Nodes 37 node 1

    node.master: false node.data: true node 1 node 1 node 1 node 1 node 1 node 1 node 1 node 1 node 1 node 1 data 1 node.master: true node.data: false master 1 master 2 master 3
  25. { } CC-BY-ND 4.0 Remember to set minimum_master_nodes • Set

    discovery.zen.minumum_master_nodes:(N/2 +1) in elasticsearch.yml 40
  26. { } CC-BY-ND 4.0 Improved Zen Discovery • Significant improvements

    in v1.4.0 – Gossip on master loss – bigger ping outreach – resiliency to stale gossip – better two-masters resolution – faster failure detection 42
  27. { } CC-BY-ND 4.0 Improving Zen Discovery (WIP) • Prevent

    setting incorrect minimum_master_nodes (v1.5.0, #8321, #9051) • Refuse revived master (v1.5.0) 43
  28. { } CC-BY-ND 4.0 External pressure - bounded queues and

    thread pools (<0.20) - time out long running queries (WIP, PR #9156) - index throttling (v1.2.0, #6066) 45
  29. { } CC-BY-ND 4.0 Known Unknowns - Simulate disruptions (v1.4.0,

    #7492) - Simulate corruption (v1.3.0, #5924) - Reproducible evil - Users info is critical 47
  30. { } CC-BY-ND 4.0 It’s ongoing effort • Check the

    progress on our resiliency status page – http://www.elasticsearch.org/guide/en/elasticsearch/ resiliency/current/ • Or search for issues labeled “resiliency” on github – https://github.com/elasticsearch/elasticsearch/ 48
  31. { } This work is licensed under the Creative Commons

    Attribution-NoDerivatives 4.0 International License. To view a copy of this license, visit: http://creativecommons.org/licenses/by-nd/4.0/ or send a letter to: Creative Commons PO Box 1866 Mountain View, CA 94042 USA CC-BY-ND 4.0