Resiliency in Elasticsearch & Lucene

9a2049bf377d85f15dd1f7a3ce697a68?s=47 Boaz Leskes
September 03, 2015

Resiliency in Elasticsearch & Lucene

As one of the most popular search engines based on Apache Lucene, Elasticsearch recognizes the crucial importance of being resilient to hardware and network failure. This is why Elastic invests a lot to enable Elasticsearch and Apache Lucene detect and cope with increasingly complex failures. Elasticsearch’s lead developer, Boaz Leskes, will cover the recent highlights and future plans of the company’s resiliency strategy. He will explain all aspects of Elasticsearch, ranging from the lowest level of a single file, through network connection of a single node, and all the way up to distributed failures on the cluster level. Even though the talk is about possible failures and various coping strategies, participants will also get an interesting peek under the hood and learn about the inner workings of Elasticsearch.

Talk was given at a Textkernel Talks event:
http://www.meetup.com/textkernel-talks/events/224786010/

9a2049bf377d85f15dd1f7a3ce697a68?s=128

Boaz Leskes

September 03, 2015
Tweet

Transcript

  1. Resiliency in Elasticsearch and Lucene Boaz Leskes (& Igor Motov)

    six months later
  2. { } CC-BY-ND 4.0 Resiliency noun 1. the power or

    ability to return to the original form, position, etc., after being bent, compressed, or stretched; elasticity. 2. ability to recover readily from illness, depression, adversity, or the like; buoyancy. 2 re·sil·ience
  3. { } CC-BY-ND 4.0 Failures happen 3

  4. { } CC-BY-ND 4.0 Why is it (even more) important

    now? 4
  5. { } CC-BY-ND 4.0 Average Elasticsearch cluster growth 5 Large

    cluster circa 2011
  6. { } CC-BY-ND 4.0 Average Elasticsearch cluster growth 6 Large

    cluster circa 2011 Large cluster circa 2015
  7. { } CC-BY-ND 4.0 Failure rates X number of nodes

    7
  8. { } CC-BY-ND 4.0 8 Slow Fast Node Cluster kill

    -9 dead disk corruption long GC master gone network disconnects timeouts
  9. { } CC-BY-ND 4.0 9 Slow Fast Node Cluster kill

    -9 dead disk corruption long GC master gone network disconnects timeouts SOFTWARE BUGS
  10. { } CC-BY-ND 4.0 10 Work in progress

  11. { } CC-BY-ND 4.0 Pulling the plug 11

  12. { } CC-BY-ND 4.0 Replicas and Transaction Log 12 Replica

    Shard Lucene Index Transaction Log Lucene Buffer Primary Shard Lucene Index Transaction Log Lucene Buffer
  13. { } CC-BY-ND 4.0 Replicas and Transaction Log 13 Replica

    Shard Lucene Index Transaction Log Lucene Buffer Primary Shard Lucene Index Transaction Log Lucene Buffer
  14. { } CC-BY-ND 4.0 Replicas and Transaction Log 14 Replica

    Shard Lucene Index Transaction Log Lucene Buffer Primary Shard Lucene Index Transaction Log Lucene Buffer
  15. { } CC-BY-ND 4.0 Replicas and Transaction Log 15 Replica

    Shard Lucene Index Transaction Log Lucene Buffer Primary Shard Lucene Index Transaction Log Lucene Buffer
  16. { } CC-BY-ND 4.0 Transaction Log • Transaction Log –

    stores every operation (create/update/delete) – fsync-ed every 5 sec (configurable) • every request (default - #11011, coming v2.0) • Lucene Segments – fsync-ed when transaction log is full (every 30 min or 512mb) 16
  17. { } CC-BY-ND 4.0 Hard Disk Failures 17

  18. { } CC-BY-ND 4.0 Hard Drive Failures • Complete failure

    • Running out of disk space • Data corruption 18
  19. { } CC-BY-ND 4.0 Complete Disk Failures • Automatic shard

    failover • Replicas 19
  20. { } CC-BY-ND 4.0 Multi data paths 20 Disk 1

    Disk 2 Disk3 Disk 4 shard 1 shard 2 shard 3 shard 4
  21. { } CC-BY-ND 4.0 Multi data paths 21 Disk 1

    Disk 2 Disk3 Disk 4 shard 1 shard 2 shard 3 shard 4
  22. { } CC-BY-ND 4.0 Multi data paths 22 Disk 1

    Disk 2 Disk3 Disk 4 shard 1 shard 2 shard 3 shard 4
  23. { } CC-BY-ND 4.0 Complete Disk Failures (WIP) • Current

    multi-path setup strips shard data across multiple disks • Disk loss impacts all shards on node • Reduce failure impact by using one disk per shard 
 (v2.0, #9498) 23
  24. { } CC-BY-ND 4.0 Running out of Disk Space •

    Can lead to truncated files and thus corruption • Easy to anticipate by monitoring • Disk-space aware allocation decider – added in 0.90.4 – enabled by default since 1.3.5 • Check-pointed transaction log (v2.0 , #11143) 24
  25. { } CC-BY-ND 4.0 file content (10GB) Data Corruption -

    the bit in the haystack 25
  26. { } CC-BY-ND 4.0 Data Corruption - Checksums 26 footer

    + checksum file content (10gb)
  27. { } CC-BY-ND 4.0 Data Corruption - Checksums • Elasticsearch

    automatically checks checksums of – Small files on index open (since v1.3.0) – All files during replication, relocation, snapshot, and restore (since v1.3.3 - v1.4.0) – Transaction log (since v1.4.0) – Use checksum to identify entire segments to reduce chance of hash collisions (since v1.4.0) 27
  28. { } CC-BY-ND 4.0 Data Corruption (WIP) • Checksums of

    Metadata files (coming in v1.5.0, #8010) • Support validation of checksum on all files when node starts (v2.0.0, #9183) • Make validation during merge operation more efficient (v2.0.0, LUCENE-5894) • Add per-segment/per-commit ids (v2.0.0, LUCENE-5895) • Prevent use of known-bad java versions (coming in v1.5.0, #7580) 28
  29. { } CC-BY-ND 4.0 Cluster Issues 29

  30. { } CC-BY-ND 4.0 Why nodes leave the cluster? •

    Complete node failure • Unresponsive nodes • Network Failures 30
  31. { } CC-BY-ND 4.0 Unresponsive Nodes 31

  32. { } CC-BY-ND 4.0 Biggest Memory User- Field Data (v1.x)

    • sorting • aggregations • doc[“foo”] in scripts • Parent-child id cache – has_child/has_parent queries 32
  33. { } CC-BY-ND 4.0 Circuit Breakers • Estimate size of

    the field data for each query and fail the query if it tries to load too much data – field data (since v1.0.0) – parent-child (since v1.1.0) – some aggregation structures (since v1.4.0) 33
  34. { } CC-BY-ND 4.0 Doc values • On-disk low memory

    alternative to field data • Significant performance improvements in v1.4.0 • Enabled by default for all numeric and non-analyzed fields (#10209, v2.0) 34
  35. { } CC-BY-ND 4.0 OOM Resiliency (WIP) • Add hard

    limit on from/size (coming in v1.5.0, #9311) • Add hit size circuit breaker (coming in v2.0, #9310) • Prevent combinatorial explosion in aggregations (TBD, #8081, #9825) • Smarter filter query caching (LUCENE-6303, v2.0, #10897) 35
  36. { } CC-BY-ND 4.0 Dedicate Master Nodes 36 node 1

    node.master: false node.data: true node 1 node 1 node 1 node 1 node 1 node 1 node 1 node 1 node 1 node 1 data 1 node.master: true node.data: false master 1 master 2 master 3
  37. { } CC-BY-ND 4.0 Network Issues 37

  38. { } CC-BY-ND 4.0 Partitions & partial knowledge 38 node

    M node node node node
  39. { } CC-BY-ND 4.0 Remember to set minimum_master_nodes • Set

    discovery.zen.minumum_master_nodes:(N/2 +1) in elasticsearch.yml 39
  40. { } CC-BY-ND 4.0 Partitions & partial knowledge 40 node

    M node node node node
  41. { } CC-BY-ND 4.0 Improved Zen Discovery • Significant improvements

    in v1.4.0 – Gossip on master loss – bigger ping outreach – resiliency to stale gossip – better two-masters resolution – faster failure detection 41
  42. { } CC-BY-ND 4.0 Improving Zen Discovery (WIP) • Prevent

    setting incorrect minimum_master_nodes (coming in v1.5.0, #8321, #9051) • Refuse revived master (coming in v1.5.0, #9632) • Diff based ClusterState publishing (v2.0 #10212) 42
  43. { } CC-BY-ND 4.0 External pressure 43

  44. { } CC-BY-ND 4.0 External pressure - bounded queues and

    thread pools (<0.20) - time out long running queries (WIP, PR #9156) - index throttling (v1.2.0, #6066) 44
  45. { } CC-BY-ND 4.0 Known Unknowns 45

  46. { } CC-BY-ND 4.0 Known Unknowns - Simulate disruptions (v1.4.0,

    #7492) - Simulate corruption (v1.3.0, #5924) - Reproducible evil - Users info is critical 46
  47. { } CC-BY-ND 4.0 It’s an ongoing effort • Check

    the progress on our resiliency status page – www.elastic.co/guide/en/elasticsearch/resiliency/ current/ • Or search for issues labeled “resiliency” on github – https://github.com/elastic/elasticsearch/ 47
  48. { } Thank you! boaz@elastic.co, igor@elastic.co @bleskes, @imotov

  49. { } This work is licensed under the Creative Commons

    Attribution-NoDerivatives 4.0 International License. To view a copy of this license, visit: http://creativecommons.org/licenses/by-nd/4.0/ or send a letter to: Creative Commons PO Box 1866 Mountain View, CA 94042 USA CC-BY-ND 4.0