Slide 1

Slide 1 text

Alexander Reelsen @spinscale [email protected] Maintaining performance in distributed systems

Slide 2

Slide 2 text

Distributed Systems Elasticsearch Performance aspects Hardware & Operating System JVM & GC Libraries & Application Agenda

Slide 3

Slide 3 text

Me Software Engineer at Elasticsearch Interested in all things search & scale Search Meetup Munich http://www.meetup.com/Search-Meetup-Munich/events/218856224/ Elasticsearch Founded in 2012 Products: Elasticsearch, Logstash, Kibana, elasticsearch for Apache Hadoop, Marvel, Shield Professional Services: Support subscriptions, trainings About...

Slide 4

Slide 4 text

Distributed systems

Slide 5

Slide 5 text

Fallacies of distributed computing 1. The network is reliable. 2. Latency is zero. 3. Bandwidth is infinite. 4. The network is secure. 5. Topology doesn't change. 6. There is one administrator. 7. Transport cost is zero. 8. The network is homogeneous. by Peter Deutsch https://en.wikipedia.org/wiki/Fallacies_of_distributed_computing Distributed systems

Slide 6

Slide 6 text

Redundancy Resiliency Recovery Scalability Availability ... Distributed systems expectations

Slide 7

Slide 7 text

Cope with node outage Maintenance, network split, power loss, garbage collection Example...         

Slide 8

Slide 8 text

Cope with node outage Maintenance, network split, power loss, garbage collection Example: Outages          Still operational No data loss CRUD works

Slide 9

Slide 9 text

Nodes come back Maintenance, network failure ends... Example: Recovery          Self healing Shift data back Higher load

Slide 10

Slide 10 text

Nodes come back Maintenance, network failure ends... Example: Recovery         

Slide 11

Slide 11 text

Nodes come back Maintenance, network failure ends... Example: Recovery         

Slide 12

Slide 12 text

Scalability Writes vs. reads Example: Data distribution " # $ % & & 

Slide 13

Slide 13 text

Scalability Writes vs. reads Example: Data distribution " # $ % & & 

Slide 14

Slide 14 text

Scalability Writes vs. reads Example: Data distribution " # $ % & &  

Slide 15

Slide 15 text

Scalability Writes vs. reads Example: Data distribution " # $ % & &  

Slide 16

Slide 16 text

Scalability Writes vs. reads Example: Data distribution " # $ % & &   " $ % & &

Slide 17

Slide 17 text

Scalability Writes vs. reads Example: Data distribution " # $ % & &   " $ % & &

Slide 18

Slide 18 text

Read & write scalability Example: Data distribution     " # $ % & & " $ % & &

Slide 19

Slide 19 text

Read & write scalability Example: Data distribution     " # $ % & & " $ % & &

Slide 20

Slide 20 text

...is affected by this More Coordination, more distant Different boundaries compared to single process applications hard to predict/test on many different layers But... performance?  Application & Libraries Runtime environment (JVM) OS Hardware Network

Slide 21

Slide 21 text

Elasticsearch Introduction

Slide 22

Slide 22 text

HTTP & JSON What is Elasticsearch? '

Slide 23

Slide 23 text

HTTP & JSON Schema-less What is Elasticsearch? ' (

Slide 24

Slide 24 text

HTTP & JSON Schema-less distributed What is Elasticsearch? ) ' (

Slide 25

Slide 25 text

HTTP & JSON Schema-less distributed document-oriented What is Elasticsearch? ) ' ( *

Slide 26

Slide 26 text

HTTP & JSON Schema-less distributed document-oriented near-realtime What is Elasticsearch? ) + ' ( *

Slide 27

Slide 27 text

HTTP & JSON Schema-less distributed document-oriented near-realtime search What is Elasticsearch? , ) + ' ( *

Slide 28

Slide 28 text

HTTP & JSON Schema-less distributed document-oriented near-realtime search analytics What is Elasticsearch? - , ) + ' ( *

Slide 29

Slide 29 text

Master Node 1 . Cluster always has one master Reelection on node failure

Slide 30

Slide 30 text

Node joins Node 1 . Node 2 Node 3 Node 2 & Node 3 ping around

Slide 31

Slide 31 text

Node joins Node 1 . Node 2 Node 3 Node 2 & Node 3 join cluster

Slide 32

Slide 32 text

Data distribution Node 1 a Index: Collection of documents

Slide 33

Slide 33 text

Data distribution Node 1 a0 Shards: Units of scale a1 a2

Slide 34

Slide 34 text

Data distribution Node 1 Node 2 Node 3 a0 Shards: Units of scale a1 a2

Slide 35

Slide 35 text

Data distribution Node 1 Node 2 Node 3 a0 Primary shards Replica shards a1 a2 a0 a2 a1

Slide 36

Slide 36 text

Data distribution Node 1 Node 2 Node 3 a0 Different scaling strategies per index a1 a2 a0 a2 a1 b0 b0 b0

Slide 37

Slide 37 text

CPU Indexing, searching, highlighting I/O Indexing, searching, merging Memory Aggregations, indices Network Relocation, snapshot & restore Elasticsearch can easily max out...

Slide 38

Slide 38 text

Competing resources Resizing is out of our control Requires thorough testing & configuration But... performance?

Slide 39

Slide 39 text

Hardware & operating system

Slide 40

Slide 40 text

What is locked memory? What is the best scheduler for SSDs? Is TRIM supported on every FS? What is mechanical sympathy? Quiz ? ? ? ?

Slide 41

Slide 41 text

Bigger is better? It depends... CPU: # cores, more parallel threads Main memory: No limit Disk: SAN vs. local, SSD vs. spindle Bare metal vs. virtualization https://speakerdeck.com/elasticsearch/life-after-ec2 Hardware

Slide 42

Slide 42 text

TRIM Write amplification Garbage collection Coding for SSDs http://codecapsule.com/2014/02/12/coding-for-ssds-part-1- introduction-and-table-of-contents/ SSDs are awesome

Slide 43

Slide 43 text

File system descriptors, file system cache Memlocked memory bootstrap.mlockall: true NUMA http://engineering.linkedin.com/performance/optimizing-linux- memory-management-low-latency-high-throughput-databases http://queue.acm.org/detail.cfm?id=2513149 Don't swap out if you need performance! OOM killer: Just dont... Operating systems

Slide 44

Slide 44 text

JVM

Slide 45

Slide 45 text

When does the JIT compiler kick in? Are client/server JVMs different? What’s the default thread stack size? Is there a memory based thread limit? Quiz ? ? ? ?

Slide 46

Slide 46 text

Less than 32 GB of heap, allowing to use compressed pointers Serialize everything yourself JVM versions tend to be incompatible use server vm, allocate all memory on startup reduce thread stack size http://rdiyewar-tech.blogspot.de/2013/02/outofmemoryerror- because-of-default.html JVM tricks

Slide 47

Slide 47 text

JVM is good at managing threads, but not several thousands of them Single thread pool does not fit all Solution: Dedicated thread pools, based on the amount of available CPUs and their task complexity JVM Threads

Slide 48

Slide 48 text

JVM Garbage collection direct young old perm

Slide 49

Slide 49 text

JVM Garbage collection direct young old perm

Slide 50

Slide 50 text

JVM Garbage collection direct young old perm

Slide 51

Slide 51 text

JVM Garbage collection direct young old perm

Slide 52

Slide 52 text

JVM Garbage collection direct young old perm

Slide 53

Slide 53 text

JVM Garbage collection direct young old perm stop the world

Slide 54

Slide 54 text

JVM Garbage collection direct young old perm

Slide 55

Slide 55 text

Create less objects, reuse structures Stream data in to avoid object creation reduces young gen promoting pressure -XX:CMSInitiatingOccupancyFraction=75 Elasticsearch: Long GCs can result in nodes dropping out of the cluster and master reelections and data shifting (often happens due to GC pressure) Improving garbage collection

Slide 56

Slide 56 text

Serial, Parallel, ParallelOld CMS - Concurrent mark-and-sweep G1 Pauseless GC (Shenandoah, Azul) Going off-heap Using java.misc.Unsafe & handle memory allocation yourself Garbage collectors

Slide 57

Slide 57 text

GC spiral of death time % heap

Slide 58

Slide 58 text

Libraries

Slide 59

Slide 59 text

dependency injection container allows to create infrastructure for plugins singletons can be created eager (on startup) Guice

Slide 60

Slide 60 text

First, Guava is awesome But can create a lot of objects due to immutability concept Meet High Performance Primitive Collections http://labs.carrotsearch.com/hppc.html Mapping updates (used ImmutableOpenMap) Now uses ObjectObjectOpenHashMap 1000 properties: 0.2 seconds (was 5.1) 2000 properties: 1.2 seconds (was 25) 5000 properties: 4.2 seconds (was 231) 10000 properties: 83.8 seconds (never finished before) HPPC

Slide 61

Slide 61 text

Writes are append-only (segments are immutable) Allows the file system cache to kick in for huge segments Lock-free read access Rate limiting on write Saves IO and CPU Packed* classes, ordinals Lucene

Slide 62

Slide 62 text

Piggyback on Lucene segment lifecycle Filter caching per segment Field data caching per segment FSTs Blazing fast in-memory structures, allow thousands of qps Allow for complex searches like prefix/fuzzy searches or intersections Lucene

Slide 63

Slide 63 text

Awesome monitoring API Great helping library for getting all kinds of stats Output can vary on operating systems Sigar

Slide 64

Slide 64 text

Stable and fast streaming JSON parser Supports YAML, SMILE, CBOR Other implementations https://github.com/RichardHightower/boon/wiki Jackson

Slide 65

Slide 65 text

Elasticsearch

Slide 66

Slide 66 text

Enforces event driven architecture Support for non-blocking model Enforce loose coupling Prefers push over pull Callback based concurrency Helps to avoid contention on resources / threads Going async

Slide 67

Slide 67 text

Page-based cache recycling (old gen!) Reusing netty buffers Fielddata Probalistic data structures Bloom filters, T-Digest, HyperLogLog++ Reduce memory footprint

Slide 68

Slide 68 text

Maintaining different channels with different priorities IMMEDIATE, URGENT, HIGH, NORMAL, LOW, LANGUID Binary protocol TCP connections are held open Node & network communication

Slide 69

Slide 69 text

Fielddata is number one OOM reason Circuit breaker per request & fielddata Doc-value based field data Prevent OOM

Slide 70

Slide 70 text

Conducting performance tests

Slide 71

Slide 71 text

Good Data & queries real life data Similar environment Virtualization, bare-metal, AWS, number of nodes Long running tests Avoid hitting the wrong caches and missing the right ones Rate limit the right things/things right Create your own benchmark numbers Performance test requirements

Slide 72

Slide 72 text

Summary

Slide 73

Slide 73 text

Know your full stack, it is invaluable Hardware, OS, Environment, Language, Protocols, Libraries Monitor all the things Prevent educated guesses Do not trust other people’s numbers! Fake your own! Summary

Slide 74

Slide 74 text

Resources

Slide 75

Slide 75 text

http://www.elasticsearch.org/blog/white-paper-testing-automation-for- distributed-applications/ http://www.elasticsearch.org/blog/elasticsearch-testing-qa-increasing- coverage-randomizing-test-runs/ http://www.elasticsearch.org/blog/performance-considerations- elasticsearch-indexing/ http://www.elasticsearch.org/blog/resiliency-elasticsearch/ http://www.elasticsearch.org/blog/averages-can-dangerous-use- percentile/ http://www.elasticsearch.org/blog/count-elasticsearch/ http://www.elasticsearch.org/guide/en/elasticsearch/resiliency/current/ https://www.youtube.com/watch?v=U1C5m8b0qg0 (Akka Cluster) Resources

Slide 76

Slide 76 text

http://jprante.github.io/2012/11/28/Elasticsearch-Java-Virtual-Machine- settings-explained.html https://plumbr.eu/blog/what-garbage-collector-are-you-using https://plumbr.eu/blog/g1-vs-cms-vs-parallel-gc http://www.slideshare.net/aragozin/garbage-collection-in-jvm https://github.com/aragozin/jvm-tools https://github.com/brettwooldridge/HikariCP/wiki/Down-the-Rabbit-Hole http://www.artima.com/underthehood/flowP.html http://en.wikipedia.org/wiki/Switch_case#Compilation http://www.elasticsearch.org/blog/disk-based-field-data-a-k-a-doc-values/ http://static.googleusercontent.com/media/research.google.com/fr//pubs/ archive/40671.pdf Resources

Slide 77

Slide 77 text

https://www.youtube.com/watch?v=0b3sR32m0nU (How not to measure latency by Gil Tene) http://highscalability.com/blog/2012/3/12/google-taming-the-long- latency-tail-when-more-machines-equal.html http://www.ibm.com/developerworks/library/j-benchmark1/index.html http://www.ibm.com/developerworks/library/j-benchmark2/index.html https://www.youtube.com/watch?v=XmImGiVuJno (Benchmarking - You’re doing it wrong by Aysylu Greenberg) Resources

Slide 78

Slide 78 text

Java Performance by Charlie Hunt http://www.amazon.de/Java-Performance- Charlie-Hunt-ebook/dp/B005R4NELQ Netty in Action by Norman Maurer http://www.amazon.de/Netty-Action-Norman- Maurer/dp/1617291471 Resources

Slide 79

Slide 79 text

Systems Performance - Enterprise and the cloud by Brendan Gregg http://www.amazon.de/Systems-Performance- Enterprise-Brendan-Gregg-ebook/dp/ B00FLYU9T2 Resources

Slide 80

Slide 80 text

Elasticsearch - The Definitive Guide by Clinton Gormley & Zachary Tong http://www.oreilly.de/catalog/ 9781449358549/ Resources

Slide 81

Slide 81 text

Alexander Reelsen @spinscale [email protected] Thanks for listening! We’re hiring! http://elasticsearch.com/jobs We’re helping! http://elasticsearch.com/support http://elasticsearch.com/training