Slide 1

Slide 1 text

Resiliency in Elasticsearch and Lucene Boaz Leskes and Igor Motov

Slide 2

Slide 2 text

{ } CC-BY-ND 4.0 Resiliency noun 1. the power or ability to return to the original form, position, etc., after being bent, compressed, or stretched; elasticity. 2. ability to recover readily from illness, depression, adversity, or the like; buoyancy. 2 re·sil·ience

Slide 3

Slide 3 text

{ } CC-BY-ND 4.0 Failures happen 3

Slide 4

Slide 4 text

{ } CC-BY-ND 4.0 Why is it so important now? 4

Slide 5

Slide 5 text

{ } CC-BY-ND 4.0 Average Elasticsearch cluster growth 5 Large cluster circa 2011

Slide 6

Slide 6 text

{ } CC-BY-ND 4.0 Average Elasticsearch cluster growth 6 Large cluster circa 2011 Large cluster circa 2015

Slide 7

Slide 7 text

{ } CC-BY-ND 4.0 Failure rates X number of nodes 7

Slide 8

Slide 8 text

{ } CC-BY-ND 4.0 8 Slow Fast Node Cluster kill -9 dead disk corruption long GC master gone network disconnects timeouts

Slide 9

Slide 9 text

{ } CC-BY-ND 4.0 9 Slow Fast Node Cluster kill -9 dead disk corruption long GC master gone network disconnects timeouts SOFTWARE BUGS

Slide 10

Slide 10 text

{ } CC-BY-ND 4.0 10 Slow Fast Node Cluster kill -9 dead disk corruption long GC master gone network disconnects timeouts

Slide 11

Slide 11 text

{ } CC-BY-ND 4.0 11 Work in progress

Slide 12

Slide 12 text

{ } CC-BY-ND 4.0 Pulling the plug 12

Slide 13

Slide 13 text

{ } CC-BY-ND 4.0 Replicas and Transaction Log 13 Replica Shard Lucene Index Transaction Log Lucene Buffer Primary Shard Lucene Index Transaction Log Lucene Buffer

Slide 14

Slide 14 text

{ } CC-BY-ND 4.0 Replicas and Transaction Log 14 Replica Shard Lucene Index Transaction Log Lucene Buffer Primary Shard Lucene Index Transaction Log Lucene Buffer

Slide 15

Slide 15 text

{ } CC-BY-ND 4.0 Replicas and Transaction Log 15 Replica Shard Lucene Index Transaction Log Lucene Buffer Primary Shard Lucene Index Transaction Log Lucene Buffer

Slide 16

Slide 16 text

{ } CC-BY-ND 4.0 Replicas and Transaction Log 16 Replica Shard Lucene Index Transaction Log Lucene Buffer Primary Shard Lucene Index Transaction Log Lucene Buffer

Slide 17

Slide 17 text

{ } CC-BY-ND 4.0 Transaction Log • Transaction Log – stores every operation (create/update/delete) – fsync-ed every 5 sec (configurable) • Lucene Segments – fsync-ed when transaction log is full (every 30 min or 200mb) 17

Slide 18

Slide 18 text

{ } CC-BY-ND 4.0 Hard Disk Failures 18

Slide 19

Slide 19 text

{ } CC-BY-ND 4.0 Hard Drive Failures • Complete failure • Running out of disk space • Data corruption 19

Slide 20

Slide 20 text

{ } CC-BY-ND 4.0 Complete Disk Failures • Automatic shard failover • Replicas 20

Slide 21

Slide 21 text

{ } CC-BY-ND 4.0 Multi data paths 21 Disk 1 Disk 2 Disk3 Disk 4 shard 1 shard 2 shard 3 shard 4

Slide 22

Slide 22 text

{ } CC-BY-ND 4.0 Multi data paths 22 Disk 1 Disk 2 Disk3 Disk 4 shard 1 shard 2 shard 3 shard 4

Slide 23

Slide 23 text

{ } CC-BY-ND 4.0 Multi data paths 23 Disk 1 Disk 2 Disk3 Disk 4 shard 1 shard 2 shard 3 shard 4

Slide 24

Slide 24 text

{ } CC-BY-ND 4.0 Complete Disk Failures (WIP) • Current multi-path setup strips shard data across multiple disks • Disk loss impacts all shards on node • Reduce failure impact by using one disk per shard 
 (v2.0, #9498) 24

Slide 25

Slide 25 text

{ } CC-BY-ND 4.0 Running out of Disk Space • Can lead to truncated files and thus corruption • Easy to anticipate by monitoring • Disk-space aware allocation decider – added in 0.90.4 – enabled by default since 1.3.5 25

Slide 26

Slide 26 text

{ } CC-BY-ND 4.0 file content (10GB) Data Corruption - the bit in the haystack 26

Slide 27

Slide 27 text

{ } CC-BY-ND 4.0 Data Corruption - Checksums 27 footer + checksum file content (10gb)

Slide 28

Slide 28 text

{ } CC-BY-ND 4.0 Data Corruption • Elasticsearch automatically checks checksums of – Small files on index open (since v1.3.0) – All files during replication, relocation, snapshot, and restore (since v1.3.3 - v1.4.0) – Transaction log (since v1.4.0) – Use checksum to identify entire segments to reduce chance of hash collisions (since v1.4.0) 28

Slide 29

Slide 29 text

{ } CC-BY-ND 4.0 Data Corruption (WIP) • Checksums of Metadata files (coming in v1.5.0) • Support validation of checksum on all files when node starts (v2.0.0, #9183) • Make validation during merge operation more efficient (v2.0.0, LUCENE-5894) • Add per-segment/per-commit ids (v2.0.0, LUCENE-5895) • Prevent use of known-bad java versions (v1.5.0, #7580) 29

Slide 30

Slide 30 text

{ } CC-BY-ND 4.0 Cluster Issues 30

Slide 31

Slide 31 text

{ } CC-BY-ND 4.0 Why nodes leave the cluster? • Complete node failure • Unresponsive nodes • Network Failures 31

Slide 32

Slide 32 text

{ } CC-BY-ND 4.0 Unresponsive Nodes 32

Slide 33

Slide 33 text

{ } CC-BY-ND 4.0 Biggest Memory User- Field Data • sorting • aggregations • doc[“foo”] in scripts • Parent-child id cache – has_child/has_parent queries 33

Slide 34

Slide 34 text

{ } CC-BY-ND 4.0 Circuit Breakers • Estimate size of the field data for each query and fail the query if it tries to load too much data – field data (since v1.0.0) – parent-child (since v1.1.0) – some aggregation structures (since v1.4.0) 34

Slide 35

Slide 35 text

{ } CC-BY-ND 4.0 Doc values • On-disk low memory alternative to field data • Significant performance improvements in v1.4.0 35

Slide 36

Slide 36 text

{ } CC-BY-ND 4.0 OOM Resiliency (WIP) • Add hard limit on from/size (v1.5.0, #9311) • Add hit size circuit breaker (TBD, #9310) • Prevent combinatorial explosion in aggregations (TBD, #8081) • Smarter filter caching (LUCENE-6303, ES 2.0) 36

Slide 37

Slide 37 text

{ } CC-BY-ND 4.0 Dedicate Master Nodes 37 node 1 node.master: false node.data: true node 1 node 1 node 1 node 1 node 1 node 1 node 1 node 1 node 1 node 1 data 1 node.master: true node.data: false master 1 master 2 master 3

Slide 38

Slide 38 text

{ } CC-BY-ND 4.0 Network Issues 38

Slide 39

Slide 39 text

{ } CC-BY-ND 4.0 Partitions & partial knowledge 39 node M node node node node

Slide 40

Slide 40 text

{ } CC-BY-ND 4.0 Remember to set minimum_master_nodes • Set discovery.zen.minumum_master_nodes:(N/2 +1) in elasticsearch.yml 40

Slide 41

Slide 41 text

{ } CC-BY-ND 4.0 Partitions & partial knowledge 41 node M node node node node

Slide 42

Slide 42 text

{ } CC-BY-ND 4.0 Improved Zen Discovery • Significant improvements in v1.4.0 – Gossip on master loss – bigger ping outreach – resiliency to stale gossip – better two-masters resolution – faster failure detection 42

Slide 43

Slide 43 text

{ } CC-BY-ND 4.0 Improving Zen Discovery (WIP) • Prevent setting incorrect minimum_master_nodes (v1.5.0, #8321, #9051) • Refuse revived master (v1.5.0) 43

Slide 44

Slide 44 text

{ } CC-BY-ND 4.0 External pressure 44

Slide 45

Slide 45 text

{ } CC-BY-ND 4.0 External pressure - bounded queues and thread pools (<0.20) - time out long running queries (WIP, PR #9156) - index throttling (v1.2.0, #6066) 45

Slide 46

Slide 46 text

{ } CC-BY-ND 4.0 Known Unknowns 46

Slide 47

Slide 47 text

{ } CC-BY-ND 4.0 Known Unknowns - Simulate disruptions (v1.4.0, #7492) - Simulate corruption (v1.3.0, #5924) - Reproducible evil - Users info is critical 47

Slide 48

Slide 48 text

{ } CC-BY-ND 4.0 It’s ongoing effort • Check the progress on our resiliency status page – http://www.elasticsearch.org/guide/en/elasticsearch/ resiliency/current/ • Or search for issues labeled “resiliency” on github – https://github.com/elasticsearch/elasticsearch/ 48

Slide 49

Slide 49 text

{ } Thank you! boaz@elastic.co, igor@elastic.co @bleskes, @imotov

Slide 50

Slide 50 text

{ } This work is licensed under the Creative Commons Attribution-NoDerivatives 4.0 International License. To view a copy of this license, visit: http://creativecommons.org/licenses/by-nd/4.0/ or send a letter to: Creative Commons PO Box 1866 Mountain View, CA 94042 USA CC-BY-ND 4.0