Resiliency in Elasticsearch and Lucene

Resiliency in Elasticsearch and Lucene Boaz Leskes and Igor Motov

{ } CC-BY-ND 4.0 Resiliency noun 1. the power or
ability to return to the original form, position, etc., after being bent, compressed, or stretched; elasticity. 2. ability to recover readily from illness, depression, adversity, or the like; buoyancy. 2 re·sil·ience

{ } CC-BY-ND 4.0 Failures happen 3

{ } CC-BY-ND 4.0 Why is it so important now?
4

{ } CC-BY-ND 4.0 Average Elasticsearch cluster growth 5 Large
cluster circa 2011

{ } CC-BY-ND 4.0 Average Elasticsearch cluster growth 6 Large
cluster circa 2011 Large cluster circa 2015

{ } CC-BY-ND 4.0 Failure rates and number of nodes
7

{ } CC-BY-ND 4.0 8 Slow Fast Node Cluster kill
-‐9 dead disk corruption long GC master gone network disconnects timeouts

-‐9 dead disk corruption long GC master gone network disconnects timeouts SOFTWARE BUGS

-‐9 dead disk corruption long GC master gone network disconnects timeouts

{ } CC-BY-ND 4.0 11 Work in progress

{ } CC-BY-ND 4.0 Pulling the plug 12

{ } CC-BY-ND 4.0 Replicas and Transaction Log 13 Replica
Shard Lucene Index Transaction Log Lucene Buffer Primary Shard Lucene Index Transaction Log Lucene Buffer

{ } CC-BY-ND 4.0 Transaction Log • Transaction Log –
stores every operation (create/update/delete) – fsync-ed every 5 sec (configurable) • Lucene Segments – fsync-ed when transaction log is full (every 30 min or 200mb) 17

{ } CC-BY-ND 4.0 Hard Disk Failures 18

{ } CC-BY-ND 4.0 Hard Drive Failures • Complete failure
• Running out of disk space • Data corruption 19

{ } CC-BY-ND 4.0 Complete Disk Failures • Automatic shard
failover • Replicas 20

{ } CC-BY-ND 4.0 Multi data paths 21 Disk 1
Disk 2 Disk3 Disk 4 shard 1 shard 2 shard 3 shard 4

{ } CC-BY-ND 4.0 Complete Disk Failures (WIP) • Current
multi-path setup strips shard data across multiple disks • Disk loss impacts all shards on node • Reduce failure impact by using one disk per shard   (v2.0, #9498) 24

{ } CC-BY-ND 4.0 Running out of Disk Space •
Can lead to truncated files and thus corruption • Easy to anticipate by monitoring • Disk-space aware allocation decider – added in 0.90.4 – enabled by default since 1.3.5 26

{ } CC-BY-ND 4.0 file content (10GB) Data Corruption -
the bit in the haystack 27

{ } CC-BY-ND 4.0 Data Corruption - Checksums 28 footer
+ checksum file content (10gb)

{ } CC-BY-ND 4.0 Data Corruption • Elasticsearch automatically checks
checksums of – Small files on index open (since v1.3.0) – All files during replication, relocation, snapshot, and restore (since v1.3.3 - v1.4.0) – Transaction log (since v1.4.0) – Use checksum to identify entire segments to reduce chance of hash collisions (since v1.4.0) 29

{ } CC-BY-ND 4.0 Data Corruption (WIP) • Checksums of
Metadata files (coming in v1.5.0) • Support validation of checksum on all files when node starts (v2.0.0, #9183) • Make validation during merge operation more efficient (v2.0.0, LUCENE-5894) • Add per-segment/per-commit ids (v2.0.0, LUCENE-5895) • Prevent use of known-bad java versions (v1.5.0, #7580) 30

{ } CC-BY-ND 4.0 Cluster Issues 31

{ } CC-BY-ND 4.0 Why nodes leave the cluster? •
Complete node failure • Unresponsive nodes • Network Failures 32

{ } CC-BY-ND 4.0 Unresponsive Nodes 33

{ } CC-BY-ND 4.0 Biggest Memory User- Field Data •
sorting • aggregations • doc[“foo”] in scripts • Parent-child id cache – has_child/has_parent queries 34

{ } CC-BY-ND 4.0 Circuit Breakers • Estimate size of
the field data for each query and fail the query if it tries to load too much data – field data (since v1.0.0) – parent-child (since v1.1.0) – some aggregation structures (since v1.4.0) 35

{ } CC-BY-ND 4.0 Doc values • On-disk low memory
alternative to field data • Significant performance improvements in v1.4.0 36

{ } CC-BY-ND 4.0 OOM Resiliency (WIP) • Add hard
limit on from/size (v1.5.0, #9311) • Add hit size circuit breaker (v1.5.0, #9310) • Prevent combinatorial explosion in aggregations (v2.0.0, #8081) • Smarter filter caching (LUCENE-6303, ES 2.0) 37

{ } CC-BY-ND 4.0 Dedicate Master Nodes 38 node 1
node.master: false node.data: true node 1 node 1 node 1 node 1 node 1 node 1 node 1 node 1 node 1 node 1 data 1 node.master: true node.data: false master 1 master 2 master 3

{ } CC-BY-ND 4.0 Network Issues 39

{ } CC-BY-ND 4.0 Partitions & partial knowledge 40 node
M node node node node

{ } CC-BY-ND 4.0 Remember to set minimum_master_nodes • Set
discovery.zen.minumum_master_nodes:(N/2 +1) in elasticsearch.yml 41

{ } CC-BY-ND 4.0 Partitions & partial knowledge 42 node
M node node node node

{ } CC-BY-ND 4.0 Improved Zen Discovery • Significant improvements
in v1.4.0 – Gossip on master loss – bigger ping outreach – resiliency to stale gossip – better two-masters resolution – faster failure detection 43

{ } CC-BY-ND 4.0 Improving Zen Discovery (WIP) • Prevent
setting incorrect minimum_master_nodes (v1.5.0, #8321, #9051) • Refuse revived master (v1.5.0) 44

{ } CC-BY-ND 4.0 External pressure 45

{ } CC-BY-ND 4.0 External pressure - bounded queues and
thread pools (<0.20) - time out long running queries (WIP, PR #9156) - index throttling (v1.2.0, #6066) 46

{ } CC-BY-ND 4.0 Known Unknowns 47

{ } CC-BY-ND 4.0 Known Unknowns - Simulate disruptions (v1.4.0,
#7492) - Simulate corruption (v1.3.0, #5924) - Reproducible evil - Users info is critical 48

{ } CC-BY-ND 4.0 It’s ongoing effort • Check the
progress on our resiliency status page – http://www.elasticsearch.org/guide/en/elasticsearch/ resiliency/current/ • Or search for issues labeled “resiliency” on github – https://github.com/elasticsearch/elasticsearch/ 49

{ } Thank you! [email protected], [email protected] @bleskes, @imotov

{ } This work is licensed under the Creative Commons
Attribution-NoDerivatives 4.0 International License. To view a copy of this license, visit: http://creativecommons.org/licenses/by-nd/4.0/ or send a letter to: Creative Commons PO Box 1866 Mountain View, CA 94042 USA CC-BY-ND 4.0

Resiliency in Elasticsearch and Lucene

Resiliency in Elasticsearch and Lucene

More Decks by Elastic Co

Featured

Transcript