Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Resiliency in Elasticsearch & Lucene

Boaz Leskes
September 03, 2015

Resiliency in Elasticsearch & Lucene

As one of the most popular search engines based on Apache Lucene, Elasticsearch recognizes the crucial importance of being resilient to hardware and network failure. This is why Elastic invests a lot to enable Elasticsearch and Apache Lucene detect and cope with increasingly complex failures. Elasticsearch’s lead developer, Boaz Leskes, will cover the recent highlights and future plans of the company’s resiliency strategy. He will explain all aspects of Elasticsearch, ranging from the lowest level of a single file, through network connection of a single node, and all the way up to distributed failures on the cluster level. Even though the talk is about possible failures and various coping strategies, participants will also get an interesting peek under the hood and learn about the inner workings of Elasticsearch.

Talk was given at a Textkernel Talks event:
http://www.meetup.com/textkernel-talks/events/224786010/

Boaz Leskes

September 03, 2015
Tweet

More Decks by Boaz Leskes

Other Decks in Technology

Transcript

  1. { } CC-BY-ND 4.0 Resiliency noun 1. the power or

    ability to return to the original form, position, etc., after being bent, compressed, or stretched; elasticity. 2. ability to recover readily from illness, depression, adversity, or the like; buoyancy. 2 re·sil·ience
  2. { } CC-BY-ND 4.0 Average Elasticsearch cluster growth 6 Large

    cluster circa 2011 Large cluster circa 2015
  3. { } CC-BY-ND 4.0 8 Slow Fast Node Cluster kill

    -9 dead disk corruption long GC master gone network disconnects timeouts
  4. { } CC-BY-ND 4.0 9 Slow Fast Node Cluster kill

    -9 dead disk corruption long GC master gone network disconnects timeouts SOFTWARE BUGS
  5. { } CC-BY-ND 4.0 Replicas and Transaction Log 12 Replica

    Shard Lucene Index Transaction Log Lucene Buffer Primary Shard Lucene Index Transaction Log Lucene Buffer
  6. { } CC-BY-ND 4.0 Replicas and Transaction Log 13 Replica

    Shard Lucene Index Transaction Log Lucene Buffer Primary Shard Lucene Index Transaction Log Lucene Buffer
  7. { } CC-BY-ND 4.0 Replicas and Transaction Log 14 Replica

    Shard Lucene Index Transaction Log Lucene Buffer Primary Shard Lucene Index Transaction Log Lucene Buffer
  8. { } CC-BY-ND 4.0 Replicas and Transaction Log 15 Replica

    Shard Lucene Index Transaction Log Lucene Buffer Primary Shard Lucene Index Transaction Log Lucene Buffer
  9. { } CC-BY-ND 4.0 Transaction Log • Transaction Log –

    stores every operation (create/update/delete) – fsync-ed every 5 sec (configurable) • every request (default - #11011, coming v2.0) • Lucene Segments – fsync-ed when transaction log is full (every 30 min or 512mb) 16
  10. { } CC-BY-ND 4.0 Hard Drive Failures • Complete failure

    • Running out of disk space • Data corruption 18
  11. { } CC-BY-ND 4.0 Multi data paths 20 Disk 1

    Disk 2 Disk3 Disk 4 shard 1 shard 2 shard 3 shard 4
  12. { } CC-BY-ND 4.0 Multi data paths 21 Disk 1

    Disk 2 Disk3 Disk 4 shard 1 shard 2 shard 3 shard 4
  13. { } CC-BY-ND 4.0 Multi data paths 22 Disk 1

    Disk 2 Disk3 Disk 4 shard 1 shard 2 shard 3 shard 4
  14. { } CC-BY-ND 4.0 Complete Disk Failures (WIP) • Current

    multi-path setup strips shard data across multiple disks • Disk loss impacts all shards on node • Reduce failure impact by using one disk per shard 
 (v2.0, #9498) 23
  15. { } CC-BY-ND 4.0 Running out of Disk Space •

    Can lead to truncated files and thus corruption • Easy to anticipate by monitoring • Disk-space aware allocation decider – added in 0.90.4 – enabled by default since 1.3.5 • Check-pointed transaction log (v2.0 , #11143) 24
  16. { } CC-BY-ND 4.0 Data Corruption - Checksums • Elasticsearch

    automatically checks checksums of – Small files on index open (since v1.3.0) – All files during replication, relocation, snapshot, and restore (since v1.3.3 - v1.4.0) – Transaction log (since v1.4.0) – Use checksum to identify entire segments to reduce chance of hash collisions (since v1.4.0) 27
  17. { } CC-BY-ND 4.0 Data Corruption (WIP) • Checksums of

    Metadata files (coming in v1.5.0, #8010) • Support validation of checksum on all files when node starts (v2.0.0, #9183) • Make validation during merge operation more efficient (v2.0.0, LUCENE-5894) • Add per-segment/per-commit ids (v2.0.0, LUCENE-5895) • Prevent use of known-bad java versions (coming in v1.5.0, #7580) 28
  18. { } CC-BY-ND 4.0 Why nodes leave the cluster? •

    Complete node failure • Unresponsive nodes • Network Failures 30
  19. { } CC-BY-ND 4.0 Biggest Memory User- Field Data (v1.x)

    • sorting • aggregations • doc[“foo”] in scripts • Parent-child id cache – has_child/has_parent queries 32
  20. { } CC-BY-ND 4.0 Circuit Breakers • Estimate size of

    the field data for each query and fail the query if it tries to load too much data – field data (since v1.0.0) – parent-child (since v1.1.0) – some aggregation structures (since v1.4.0) 33
  21. { } CC-BY-ND 4.0 Doc values • On-disk low memory

    alternative to field data • Significant performance improvements in v1.4.0 • Enabled by default for all numeric and non-analyzed fields (#10209, v2.0) 34
  22. { } CC-BY-ND 4.0 OOM Resiliency (WIP) • Add hard

    limit on from/size (coming in v1.5.0, #9311) • Add hit size circuit breaker (coming in v2.0, #9310) • Prevent combinatorial explosion in aggregations (TBD, #8081, #9825) • Smarter filter query caching (LUCENE-6303, v2.0, #10897) 35
  23. { } CC-BY-ND 4.0 Dedicate Master Nodes 36 node 1

    node.master: false node.data: true node 1 node 1 node 1 node 1 node 1 node 1 node 1 node 1 node 1 node 1 data 1 node.master: true node.data: false master 1 master 2 master 3
  24. { } CC-BY-ND 4.0 Remember to set minimum_master_nodes • Set

    discovery.zen.minumum_master_nodes:(N/2 +1) in elasticsearch.yml 39
  25. { } CC-BY-ND 4.0 Improved Zen Discovery • Significant improvements

    in v1.4.0 – Gossip on master loss – bigger ping outreach – resiliency to stale gossip – better two-masters resolution – faster failure detection 41
  26. { } CC-BY-ND 4.0 Improving Zen Discovery (WIP) • Prevent

    setting incorrect minimum_master_nodes (coming in v1.5.0, #8321, #9051) • Refuse revived master (coming in v1.5.0, #9632) • Diff based ClusterState publishing (v2.0 #10212) 42
  27. { } CC-BY-ND 4.0 External pressure - bounded queues and

    thread pools (<0.20) - time out long running queries (WIP, PR #9156) - index throttling (v1.2.0, #6066) 44
  28. { } CC-BY-ND 4.0 Known Unknowns - Simulate disruptions (v1.4.0,

    #7492) - Simulate corruption (v1.3.0, #5924) - Reproducible evil - Users info is critical 46
  29. { } CC-BY-ND 4.0 It’s an ongoing effort • Check

    the progress on our resiliency status page – www.elastic.co/guide/en/elasticsearch/resiliency/ current/ • Or search for issues labeled “resiliency” on github – https://github.com/elastic/elasticsearch/ 47
  30. { } This work is licensed under the Creative Commons

    Attribution-NoDerivatives 4.0 International License. To view a copy of this license, visit: http://creativecommons.org/licenses/by-nd/4.0/ or send a letter to: Creative Commons PO Box 1866 Mountain View, CA 94042 USA CC-BY-ND 4.0