Scaling Elasticsearch: Designing for Performance and Availability

scaling elasticsearch: designing for performance and availability OpenWest 2015 Tyler
Langlois

www.elastic.co Copyright Elastic 2015 Copying, publishing and/or distributing without written
permission is strictly prohibited Speaker Bio • Infrastructure Engineering @ Elastic ◦ Previous: Qualtrics, Sandia National Laboratories, Blue Coat Systems, BYU • Background in systems, security, *nix, smattering of different coding experience (scripting, web dev, devops) • Happy as long as I’m automating things in a terminal • Permanent mental bindings for vim and zsh leothrix tylerjl tjll.net Introduction

permission is strictly prohibited What We’ll Cover - Overview • When & Why Scale? • Elasticsearch Scalability Primer • Basic Mechanisms ◦ Node Types ◦ Sharding ◦ Adding Nodes • Going Further ◦ Per-host resource tuning ◦ Optimizing Use • Q&A Introduction

permission is strictly prohibited When? • Different for each use case ◦ measure and tune for optimal balance between horizontal scale and performance • Places to look for bottlenecking: ◦ Memory - Resident dataset occupancy in the JVM Heap ◦ Load - Look at hot threads for query load, indexing load, etc. ◦ I/O - CPU iowait for indications of I/O needing to be scaled out • TBH, most often going to be Java OOM errors in elasticsearch logs • How to find these? ◦ Memory pressure: curl [elasticsearch]:9200/_nodes/stats/jvm? human&format=yaml ◦ Dashboards: Marvel, bigdesk, kopf, paramedic, & more ◦ $ tail -F /var/log/elasticsearch/$clustername.log • tl;dr: [before] you have performance or stability issues When & Why Scale?

permission is strictly prohibited Why? • Meant to scale horizontally by design • Native horizontal scaling means: ◦ Fewer cluster state headaches (master election, distributed queries, etc.) ◦ First-class sharding and replicas • Cheaper clusters (scale out, not up) • High availability (master election, data redundancy) • Eas(y|ier) performance scaling • tl;dr when you want more performance or fault tolerance When & Why Scale? Don’t stress! We’ll make horizontal scaling easy.

permission is strictly prohibited Before Diving In There’s some really basic stuff to mention first that applies globally: • Increase the JVM heap to 50% of available RAM, capping at 30GB or so. ◦ [Compressed pointers] • Increase file descriptor limits (set both hard and soft to 65535) • bootstrap.mlockall: true (avoid swapping the ES JVM) • Understand your environment and configure appropriately ◦ Appropriate I/O scheduler, network links, cores > Mhz ◦ On AWS? ▪ discovery.zen.ping.multicast.enabled: false ▪ Mount volume with good iops for ES data volume (point path.data at a RAID) • Choose a unique cluster.name • Google for “elasticsearch pre-flight checklist” Elasticsearch Scalability Primer

permission is strictly prohibited Sharding • Each index in elasticsearch is composed of 1 to n number of shards • Each shard can be backed by n number of replicas • Shards allocate documents according to: shard = hash(routing) % number_of_primary_shards, where routing usually equals the document’s internal _id (customizable) • Replicas are simply copies of primaries • Documents are written to primaries, which get replicated to primaries. Reads can occur on either. Basic Mechanisms

permission is strictly prohibited Indexes & Shards - Primaries • Indexes broken into settings.number_of_shards shards (default 5) • Example with n = 3 • P = Primary shard • Primary shards can serve both reads and writes • Elasticsearch handles all of this under the covers Elasticsearch Scalability Primer

permission is strictly prohibited Indexes & Shards - Replicas • Shards replicated into settings.number_of_replicas shards (default 1) • Example with three shards and one replica • R = Replica shard • Replicas can serve reads, they’re not just idle hot standbys • (setting mutable at any time) Elasticsearch Scalability Primer

permission is strictly prohibited Cluster State • Increase replica count for better redundancy and read balancing • Example with three shards, two replicas, and three nodes Notes: • Elasticsearch inherently knows to place replicas apart from other replicas or primaries • Moving replicas and primaries is handled automatically (set shards and replicas, add needed number of nodes, and forget [kinda]) Elasticsearch Scalability Primer

permission is strictly prohibited Minor Caveats This process is magical and wonderful, however, there are things you should be aware of: • After settings.number_of_shards is set, it’s permanent for that index ◦ This seems like a biggie at first, but either re-indexing or naturally rotating indices (daily logs, etc.) mitigate this ◦ number_of_replicas can be changed at any time (factor in needed time to copy shard data over) • Need to take some minor steps to avoid split brain, which is not a good time ◦ In our example 3-node cluster, setting discovery.zen.minimum_master_nodes = 2 works ◦ Varies depending on cluster architecture (will cover more on that later) Elasticsearch Scalability Primer

permission is strictly prohibited Sharding Demo Pray to the demo gods, for the terminal is dark and full of terrors Basic Mechanisms

permission is strictly prohibited Sharding Provides easy: • Scaling ◦ Overallocate shards to begin with (nodes can have multiple primaries allocated) and as more nodes are added, ES naturally rebalances and distributes load across shards • High Availability ◦ Replicas take over for lost primaries as soon as they’re lost ◦ Change # of replicas at any time ◦ If a node doesn’t return, replicas that don’t exist (unallocated) get allocated by being copied over to other nodes, restoring a good cluster state Sharding is the basic mechanism for horizontal scaling; use it. Basic Mechanisms

permission is strictly prohibited Sharding - con’t Strategies • number_of_shards ◦ Temptation is to create 1,000 shards and never worry about needing to fool with number of shards again ◦ Don’t do it! ◦ Shards have overhead, need to find a balance ◦ Test rates with a single node; extrapolate • Things to remember: ◦ Reads/searches can be served by primaries or replicas — leverage for easier search scaling ◦ ES will allocate shards for optimal distribution, a cluster in a red or yellow state indicates that non-optimal shard distribution is occurring Basic Mechanisms

permission is strictly prohibited You can trust Elasticsearch to make smart decisions when allocating shards. Things to be aware of: • Arbitrary node tags can be defined and considered when routing: ◦ node.aisle: east ◦ cluster.routing.allocation.awareness.attributes: aisle • Disk space magic ◦ By default: cluster.routing.allocation.disk. threshold_enabled: true ◦ Note that filling disks will start making elasticsearch nervous for you Sharding - Allocation Basic Mechanisms

permission is strictly prohibited Why and How Would One Scale? • Our three-node example cluster has some failure cases: ◦ Load on datanodes - if we start heavily querying on datanodes and we lose heartbeart from a master, things get confused ◦ If we start putting too much data into the cluster, we can exhaust RAM and get OOM errors ◦ Data at rest - again, too much data may either 1) exhaust filesystem space or 2) demand too many I/O operations from hardware We’ll address all these! Elasticsearch Scalability Primer

permission is strictly prohibited Node Types Cluster Design… made easy! • node.data: true ◦ Indicates this node should allow shards to be allocated to it, thus holding data • node.data: false ◦ Remain a member of the cluster, but do not accept primaries or replicas • node.master: true ◦ Node should be master-eligible (cluster will only have one active master at a time, but have multiple eligible nodes) • node.master: false ◦ Do not enter pool of master-eligible nodes, and thus will never have possibility of becoming a master Master Nodes Data nodes are self- explanatory; what do master nodes do? • Coordinate cluster operations • Index creation, shard allocation, etc. • Cluster metadata (fields, etc.) Basic Mechanisms

permission is strictly prohibited node.master: true node.data: true Multi-purpose node: can act as a master and hold data as well. Example use case would be our simple three-node cluster. node.master: false node.data: true Dedicated data node. Will never serve as a master, only allocating data to its elasticsearch process. Node Types This means we can design each node with specific purposes with node.data and node.master node.master: true node.data: false Dedicated master node. Will only serve as a master/master eligible node and not allow shards to be allocated to itself. node.master: false node.data: false Client node. Holds no data or master responsibilities, but can query cluster, send data (serve as indexer), etc. Basic Mechanisms

permission is strictly prohibited Node Types - con’t Why split node responsibilities? Separation of concerns for elasticsearch processes. • Garbage collections on work-heavy nodes (data nodes, for example) can drop out of the cluster due to stop-the-world events. Dedicated masters can keep strenuous load to a minimum to ensure availability. • Dedicated client nodes can be used for misc. purposes — such as searching, indexing, etc. — that do not impact data processing or master functions • Spend money on beefy nodes (data) and let more, smaller nodes worry about cluster operations Basic Mechanisms

permission is strictly prohibited Adding Nodes Multicast vs. Unicast • Multicast (zen) ◦ If you’re self-hosting and in a protected subnet, it’s up to you ◦ Easy cluster formation, re-discovery, etc. ◦ Not always best in production (random joining, broadcasts, etc.) • Unicast ◦ Needs >=1 discovery IP ◦ After cluster is discovered, gossip takes care of the rest ◦ Needs config, but less prone to surprises ◦ Only option for some hosting types (DO, Linode, etc.) — EC2 has plugin to manage discovery if that’s your jive Easy! Basic Mechanisms

permission is strictly prohibited Resource Tuning Some notes: • The defaults are defaults for a reason • Don’t go throwing knobs until you’ve covered the cases that cover the majority of cases • Stuff like JVM tuning is probably overkill (use a supported JVM and run with it) • ES trusts the OS with lots of tasks, so take the time to set proper descriptor limits/heap size/etc. • Due to responsibilities ES gives to OS, tuning there can have big effect Going Further

permission is strictly prohibited Resource Tuning - I/O Under the covers, Lucene is talking to disk a lot (immutable structures.) How to improve in cases of I/O thrashing? • RAID ◦ Hardware-level — RAID up EC2 disks and present it to path.data ◦ Software-level — Give ES “path.data: /dev/sdb,/dev/sdc” and let it stripe ▪ Be careful of striping (newer versions of ES stripe even more safely) • I/O Scheduling ◦ SSD — ec2: noop, everything else: deadline ◦ Rotating media — leave at cfq • When bulk indexing in a batch, refresh = -1 (revert later) Going Further

permission is strictly prohibited Optimizing Use Sometimes it’s a better idea to tune usage patterns rather than ramp up horsepower. (we’ll go into each) • Query smarter ◦ filters — bitsets, excluding docs, etc. • Managing the Data Lifecycle ◦ Time-series indices with automatic data expiration ◦ optimize, close, snapshot, delete • Doc Values ◦ Perfect for things that don’t need analysis (log levels, IPs, anything .raw) ◦ Trade disk space for RAM Going Further

permission is strictly prohibited Optimizing Use - Query Smarter If you know how to reduce your in-memory dataset, you can get much more mileage out of Elasticsearch • Filters ◦ Define most-specific filters first to drastically reduce eligible results ◦ Stuff like Kibana makes this easy (point and click) • Bitsets ◦ ES intelligently caches filters for later re-use ◦ Dead simple data structure easily re-queried (docs 1, 2, 4 match a filter — [1, 1, 0, 1] stored in memory to re-use) • With well-designed doc schemas, sometimes don’t need fulltext search for some use cases (queries for logs from specific hosts/processes, etc.) Going Further

permission is strictly prohibited Optimizing Use - Managing the Data Lifecycle Don’t need to keep time-sensitive data forever — retire it! • (daily|weekly|monthly) indices, optimize, close, snapshot, delete ◦ Logstash/fluentd can write to time-specific indices ◦ Once these no longer receive writes, optimize them for maximum efficiency ◦ Close them once they’re outside query window ◦ Snapshot them to S3 eventually for long-term storage, delete if disk space needed ◦ Rotate bucket objects into glacier if you’re a ninja • Curator perfect for these types of tasks • Alternatively, use expiring docs [with caution] (more expensive to detect/use) Going Further

permission is strictly prohibited Specific to some use cases, but extremely useful • Normally tokenized terms are analyzed and added to lucene segments in that analyzed format • This means a facet/aggregation (terms panel in K3 terminology) has to bring every possible value for a field into consideration when determining common values • High cardinality for an analyzed field = OOM the JVM • doc_values can alleviate memory pressure ◦ Write out discrete values for non-analyzed strings, numeric types, geo points to disk on index ◦ When performing aggregations, just look at the values instead of pulling all possible values into memory and calculating (I/O vs. memory) Optimizing Use - doc_values Going Further

permission is strictly prohibited Questions? Q&A

permission is strictly prohibited Information • Elasticsearch documentation ◦ www.elastic.co/guide ◦ Elasticsearch - The Definitive Guide - for in-depth learning ◦ Official documentation, API docs, etc. ◦ Client library docs (javascript, ruby, python, java, php) • Get involved in the ES community ◦ www.elastic.co/community/meetups ◦ SLC Meetup! • Give feedback at: https://joind.in/talk/view/13968 Additional Resources

Scaling Elasticsearch: Designing for Performanc...

Scaling Elasticsearch: Designing for Performance and Availability

Tyler L

More Decks by Tyler L

Other Decks in Technology

Featured

Transcript

scaling elasticsearch: designing for performance and availability OpenWest 2015 Tyler

www.elastic.co Copyright Elastic 2015 Copying, publishing and/or distributing without written

www.elastic.co Copyright Elastic 2015 Copying, publishing and/or distributing without written

www.elastic.co Copyright Elastic 2015 Copying, publishing and/or distributing without written

www.elastic.co Copyright Elastic 2015 Copying, publishing and/or distributing without written

www.elastic.co Copyright Elastic 2015 Copying, publishing and/or distributing without written

www.elastic.co Copyright Elastic 2015 Copying, publishing and/or distributing without written

www.elastic.co Copyright Elastic 2015 Copying, publishing and/or distributing without written

www.elastic.co Copyright Elastic 2015 Copying, publishing and/or distributing without written

www.elastic.co Copyright Elastic 2015 Copying, publishing and/or distributing without written

www.elastic.co Copyright Elastic 2015 Copying, publishing and/or distributing without written

www.elastic.co Copyright Elastic 2015 Copying, publishing and/or distributing without written

www.elastic.co Copyright Elastic 2015 Copying, publishing and/or distributing without written

www.elastic.co Copyright Elastic 2015 Copying, publishing and/or distributing without written

www.elastic.co Copyright Elastic 2015 Copying, publishing and/or distributing without written

www.elastic.co Copyright Elastic 2015 Copying, publishing and/or distributing without written

www.elastic.co Copyright Elastic 2015 Copying, publishing and/or distributing without written

www.elastic.co Copyright Elastic 2015 Copying, publishing and/or distributing without written

www.elastic.co Copyright Elastic 2015 Copying, publishing and/or distributing without written

www.elastic.co Copyright Elastic 2015 Copying, publishing and/or distributing without written

www.elastic.co Copyright Elastic 2015 Copying, publishing and/or distributing without written

www.elastic.co Copyright Elastic 2015 Copying, publishing and/or distributing without written

www.elastic.co Copyright Elastic 2015 Copying, publishing and/or distributing without written

www.elastic.co Copyright Elastic 2015 Copying, publishing and/or distributing without written

www.elastic.co Copyright Elastic 2015 Copying, publishing and/or distributing without written

www.elastic.co Copyright Elastic 2015 Copying, publishing and/or distributing without written

www.elastic.co Copyright Elastic 2015 Copying, publishing and/or distributing without written

www.elastic.co Copyright Elastic 2015 Copying, publishing and/or distributing without written