Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Stories from Support: Top Problems and Solutions

Elastic Co
February 18, 2016

Stories from Support: Top Problems and Solutions

You’re not alone when it comes to certain challenges with your Elasticsearch deployment. Come learn about solutions to some of the most common issues our users face from a few of our support engineers. You’ll be sure to leave with learnings you can apply to our own clusters when you get home!

Elastic Co

February 18, 2016
Tweet

More Decks by Elastic Co

Other Decks in Technology

Transcript

  1. ‹#› Chris Earle - Support Engineer @pickypg Mark Walkom -

    Support Engineer @warkolm Stories from Support: Top Problems and Solutions
  2. Agenda 2 • This One Weird Phrase Will Endear You

    To Any Elastic Engineer • Fielddata • Cluster State • Recovery • Node Sizing • Sharing • Configuration Options • Queries • Aggregations • Indexing An Old Gem The Big Issues Of Course, There’s More!
  3. 4

  4. Of Total Heap Can be allocated to fielddata in 1.X

    60% 100% 2.X Fielddata 6 Of Data Nodes Will be impacted (Mostly) Solves This Thanks to doc_values being on by default
  5. Doc Values! • Columnar store of values • Written at

    index time to disk • Leverages the operating system’s filesystem cache • Will require a reindex for existing data • GET /_cat/fielddata?v is your friend 7
  6. Doc Value Caveats • Analyzed strings do not currently support

    doc_values, which means that you must avoid using such fields for sorting, aggregating, and scripting • Analyzed strings are generally tokenized into multiple terms, which means that there is an array of values • With few exceptions (e.g., significant terms), aggregating against analyzed strings is not doing what you want • Unless you want the individual tokens, scripting is largely not useful • Big improvement coming in ES 2.3 (“keyword” field) 8
  7. Cluster State • Every cluster state change is sent to

    every node • Requires a lot of short lived, potentially large network messages • Gets worse with more nodes or indices • Mappings tend to be the largest portion • GET /_cluster/state?pretty • Not stored in memory as JSON, so this is just to give the idea (it’s likely 5% of it, at best) 10
  8. State Of The Union • ES 2.0 introduces Cluster State

    Diffs between nodes • Changes become far more manageable and a large cluster state is no longer as problematic • Reducing your mapping size helps too • Do not allow dynamic mappings in production • Do not use _types to separate data • Create a “type” field to do this for you • Prefer changes in bulk rather than one-by-one (allow changes to be batched) 11
  9. The Big Issues 12 Node Sizing 4 Cluster State 2

    Fielddata 1 Recovery 3 Sharding 5
  10. ‹#› • Restarting a node or otherwise needing to replicate

    shards • Terribly slow process • Segment by segment • Minor risk for corruption pre-ES 1.5 with sketchy networks 13
  11. Fully Recovered! 14 Used temporary file names 1.5 Asynchronous allocation

    and synced flushing Delayed allocation and prioritized allocation Cancelled allocation Prioritized allocation for replicas 1.6 1.7 2.0 2.1
  12. 16 • Elasticsearch is a parallel processing machine • Java

    can be a slow garbage collecting calculator • Slow disks. The problem for every data store? • A few huge boxes or a ton of tiny boxes?
  13. And How Long Is A Piece Of String? 17 •

    50% of System RAM to heap • Up to 30500M - no more or your heap loses optimizations! Memory • Indexing tends to be CPU bound • At least 2 cores per instance CPU IO • Disks get hammered for other reasons, including write-impacting • Translog in 2.0 fsyncs for every index operation • SSDs or Flash are always welcome
  14. Do You Know Where Your Shards Are At Night 19

    Elasticsearch 1.X defaults to 5 primary, 1 replica Elasticsearch 2.0 defaults to 5 primary, 1 replica Increase primaries for higher write throughput and to spread load 50GB is the rule of thumb max size for a primary shard. More for recovery than performance Replicas are not backups. Rarely see a benefit with more than 1
  15. Queries • Deep pagination • ES 2.0 has a soft

    limit on 10K hits per request. Linearly more expensive per shard • Use scan and/or scroll API • Leading wildcards • Equivalent to a full table scan (bad) • Scripting • Without parameters • Dynamically (inline) • Unnecessary filter caching (e.g., exact date ranges down to the millisecond) 22
  16. Aggregations • Cardinality • Setting the threshold to 40K (or

    higher) is memory intensive and generally unnecessary • Using in place of search • Searching will be faster • Enormous sizes • Requesting large shard sizes (relative to actual size) • Linearly more expensive per shard touched • Generally unnecessary • Returning hits when you don’t want them 23
  17. Indexing • Too many shards • If your shards are

    small (define: small as < 5 GB) and they outnumber your nodes, then you have too many • Refreshing too fast • This controls “near real time” search • Merge throttling • Disable it on SSDs • Make single threaded on HDDs (see node sizing link) • Not using bulk processing 24