Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Quantitative Cluster Sizing

Elastic Co
February 19, 2016

Quantitative Cluster Sizing

How many shards should I have? How many nodes should I have? What about replicas? Do these questions sound familiar? The answer is often ‘it depends’. This talk will outline the factors that affect sizing and walk you through a quantitative approach to estimating the configuration and size of your cluster.

Elastic Co

February 19, 2016
Tweet

More Decks by Elastic Co

Other Decks in Technology

Transcript

  1. Agenda 3 Understanding why "it depends" Sizing methodology Scenario and

    experiment results Interpreting results and expanding to other scenarios 1 2 3 4
  2. 5 Elasticsearch Factors • Size of shards • Number of

    shards on each node • Size of each document • Mapping configuration ‒ which fields are searchable ‒ automatic multi-fields ‒ whether message and _all are enabled • Backing server capacity (SSD vs. HD, CPU, etc.)
  3. 6 Your Organization Requirements / SLAs • Retention period of

    data • Ratio and quantity of index vs. search • Nature of use case • Continuous vs. bulk indexing • Kinds of queries being executed • Desired response time for queries that are run frequent vs. occasionally • Required sustained vs. peak indexing rate • Budget & failure tolerance
  4. ‹#› 7 Let's try to determine • How much disk

    storage will N documents require? • When is a single shard too big for my requirements • How many active shards saturate my particular hardware • How many shards/nodes will I need to sustain X index rate and Y search response
  5. Agenda 8 Understanding why "it depends" Scenario and experiment results

    Interpreting results and expanding to other scenarios 1 3 4 Sizing methodology 2
  6. Methodology of Experiments Each experiment tries to accomplish a discrete

    goal and build upon previous 9 Determine various disk utilization 1 2 3 4 Determine breaking point of a shard Determine saturation point of a node Test configuration on small cluster
  7. 10 Experiment One • Use a single node cluster with

    one index ‒ 1 primary ‒ 0 replica • Index a decent amount of data (1GB or about 10 million docs) • Calculate storage on disk both as-is and after a _forcemerge • Repeat the above calculations with different mapping configurations ‒ _all both enabled and disabled ‒ settings for each field Determine various disk utilization 1
  8. 11 Experiment Two • Use a single node cluster with

    one index ‒ 1 primary ‒ 0 replica • Index realistic data and use realistic queries • Plot index speed and query response time • Determine where point of diminishing returns is for your requirements Determine breaking point of a shard 2
  9. 12 Experiment Three • Use a single node cluster with

    one index ‒ 2 primary ‒ 0 replica • Repeat experiment two to see how performance varies • Keep adding more shards to see when point of diminishing returns occurs Determine saturation point of a node 3
  10. 13 Experiment Four • Configure small representative cluster • Add

    representative data volume • Run realistic benchmarks: • Max indexing rate • Querying across varying data volumes • Benchmark concurrent querying and indexing at various levels • Measure resource usage, overall docs, disk usage, etc. Test desired configuration on small cluster 4
  11. Agenda 14 Scenario and experiment results Sizing methodology Understanding why

    "it depends" Interpreting results and expanding to other scenarios 3 2 1 4
  12. Sizing Scenario 16 Data Use Case Platform • Structured Logging

    • Events in JSON • Average size: 1.5kB • 40% Structured • 60% Analyzed Text • 15 days retention • Kibana Dashboard for error analysis (interactive) • Complex Kibana Dashboards for trends • Small number of users • Evaluating Elastic Cloud • 1:16 RAM/Disk ratio • 64GB RAM / node • 1TB SSD storage / node
  13. Benchmarking Setup 17 AWS EC2 S3 Snapshot Master node Elastic

    Cloud 2 x 64GB Instances in 2 AZ Elasticsearch Benchmark Driver
  14. Sizing Methodology See style page for more color options 18

    2 3 4 Disk Utilization Shard Sizing Single Node Benchmarking Multi-Node Benchmarking 1
  15. Disk Utilization From raw events to securely indexed on disk

    19 Raw Data JSON Indexed Indexed & Replicated Indexed & Replicated
  16. Disk Utilization From JSON to indexed size on disk 20

    Default Logstash Mapping Custom Mapping 100% Structured 0.585 ratio 0.401 ratio (-31.4%) 40% Structured 60% Analyzed Text 1.055 ratio 0.761 ratio (-27.8%)
  17. Sizing Methodology See style page for more color options 21

    1 3 4 2 Disk Utilization Shard Sizing Single Node Benchmarking Multi-Node Benchmarking
  18. Sizing Methodology See style page for more color options 24

    1 2 4 3 Disk Utilization Shard Sizing Single Node Benchmarking Multi-Node Benchmarking
  19. Sizing Methodology See style page for more color options 27

    1 2 3 4 Disk Utilization Shard Sizing Single Node Benchmarking Multi-Node Benchmarking
  20. Scaling for Benchmarking Creating small representative benchmarking cluster for log

    analytics 28 X Queries Y Index Requests 2 Data nodes N Data nodes X Queries Y * 2/N Index Requests
  21. Maximum Indexing Rate 29 Structured Data Structured Data Semistructured Data

    Semistructured Data Events per Second Throughput MB/s
  22. Maximum Indexing Rate 30 Events per Second Throughput MB/s Increasing

    Event Size Increasing Event Size Structured Data Structured Data Semistructured Data Semistructured Data
  23. Maximum Indexing Rate 31 Events per Second Throughput MB/s Increasing

    Event Size Increasing Event Size Smallest Events Largest Events Structured Data Structured Data Semistructured Data Semistructured Data
  24. Concurrent Indexing and Querying Indexing rate vs Dashboard query latency

    vs Data volume queried 32 Achieved Indexing Rate Target Indexing Rate Dashboard Latency Data Volume Queried
  25. Concurrent Indexing and Querying Indexing rate vs Dashboard query latency

    vs Data volume queried 33 Achieved Rate < Target Rate Increasing Query Latency
  26. Agenda 34 Interpreting results and expanding to other scenarios Sizing

    methodology Scenario and experiment results Understanding why "it depends" 4 2 3 1
  27. Interpreting Results What can we learn from this simple benchmark?

    36 1TB storage, 15 days retention => ~68GB index size/day 1 68GB index size => 700 events/s, 89GB raw JSON logs/day 2 More ingest or retention => Scale out 3 Evaluate more settings => Optimize further Simplified example => Your results WILL be different 4 5
  28. 1 1TB storage, 15 days retention => ~68GB index size/day

    Interpreting Results What can we learn from this simple benchmark? 37 68GB index size => 700 events/s, 89GB raw JSON logs/day More ingest or retention => Scale out 2 3 Evaluate more settings => Optimize further Simplified example => Your results WILL be different 4 5
  29. 1TB storage, 15 days retention => ~68GB index size/day 1

    Interpreting Results What can we learn from this simple benchmark? 38 68GB index size => 700 events/s, 89GB raw JSON logs/day More ingest or retention => Scale out 3 2 Evaluate more settings => Optimize further Simplified example => Your results WILL be different 4 5
  30. Evaluate more settings => Optimize further 4 Interpreting Results What

    can we learn from this simple benchmark? 39 Simplified example => Your results WILL be different 5 1TB storage, 15 days retention => ~68GB index size/day 1 68GB index size => 700 events/s, 89GB raw JSON logs/day 2 More ingest or retention => Scale out 3
  31. Simplified example => Your results WILL be different 5 Interpreting

    Results What can we learn from this simple benchmark? 40 Evaluate more settings => Optimize further 4 1TB storage, 15 days retention => ~68GB index size/day 1 68GB index size => 700 events/s, 89GB raw JSON logs/day 2 More ingest or retention => Scale out 3
  32. 42 Other Factors to Consider • Hot vs. cold nodes

    and architecture • Risk and fault tolerances (one replica not enough?) • Mixed use clusters
  33. 43 Recommendations / Tips • The more realistic data and

    queries, the better results • Be systematic; standard scientific method • Record your results • Script your tests • Rerun your tests ‒ when you need to upgrade hardware ‒ when your requirements change greatly • Monitor your cluster and usage
  34. ‹#› 44 Summary • Why "it depends" • A methodology

    to apply • An example to use for reference • How to apply for each unique situation
 • Elastic is here to help ‒ Community resources ‒ Subscription support ‒ Professional services We now know...
  35. ‹#› Please attribute Elastic with a link to elastic.co Except

    where otherwise noted, this work is licensed under http://creativecommons.org/licenses/by-nd/4.0/ Creative Commons and the double C in a circle are registered trademarks of Creative Commons in the United States and other countries. Third party marks and brands are the property of their respective holders. 47