Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Scaling Elasticsearch Successfully

Scaling Elasticsearch Successfully

Presentation held at the data2day conference in Karlsruhe on November 27, 2014.


Patrick Peschlow

November 27, 2014


  1. codecentric AG Patrick Peschlow Scaling Elasticsearch Successfully

  2. codecentric AG Cluster Basics master client data data data data

    data data client client client Nodes
  3. codecentric AG Split Brain ! ! ! ! ! !

    ! ! ! ! − Set minimum_master_nodes to quorum − Prevents split brains caused by „full“ partitioning − But: Split brains may still occur when single links fail, e.g., due to overload
  4. codecentric AG Sharding − Enable larger indexes − Parallelize/scale direct

    operations on individual documents − Index, update, delete, get Node 1 Node 2 Shard 1 Shard 2 Shard 3
  5. codecentric AG Routing − By default the document _id field

    is used for shard key calculation ! − Can be overridden via explicit „routing“ ! − For example, select the shard depending on a user ID or some document field
  6. codecentric AG Distributed Search − Sharding (by default) implies distributed

    search − Tends to make each individual search request (much) slower ! − A single search may involve several round trips to various nodes − 1. Gather global information for more accurate scoring − 2. Perform the actual search and compute scores − 3. Retrieve the final set of documents from the relevant shards − In between coordination/reduction by the node that initially received the request ! − The desired behavior may be specified on a per request basis („search type“) − By default, step 1 is omitted − Step 2 and 3 may be combined into one (but that’s risky with pagination)
  7. codecentric AG Sharding Gotcha − The number of shards needs

    to be chosen on index creation − No shard splitting later ! − General recommendation for determining the number of shards − Define metrics for a shard „capacity limit“ − Test the capacity limit of a single shard − Use realistic data and workloads − Depending on expected total amount of data, calculate the required number of shards − Overallocate a little
  8. codecentric AG Replication Node 1 Node 2 Primary 1 Primary

    2 Primary 3 Replica 2 Replica 3 Replica 1 − Enable HA − Parallelize/scale read operations − Get, Search
  9. codecentric AG Consistency − „consistency“ − How many shards need

    to be available to permit a write operation − all, quorum (default), one ! − „replication“ − Complete a write request already when the primary is done − Or only when all replicas have acknowledged the write (translog) − sync (default), async ! − „preference“ − On which shards to execute a search − round robin (default), local, primary, only some shards or nodes, arbitrary string − Helps to avoid inconsistent user experience when scoring differs between replicas − May happen because documents marked for deletion still affect scoring
  10. codecentric AG Index Aliases − A logical name for one

    or more Elasticsearch index(es) ! − Decouples client view from physical storage − Create views on an index, e.g., for different users − Search across multiple indexes ! − Enables changes without clients noticing − Point an alias to something new, e.g., switch to another index ! − Limitation: Writes are only permitted for aliases that point to a single index ! − Recommendation: Use aliases right from the start
  11. codecentric AG Designing for Scalability − Why should we think

    about scaling right from the start? − Fixed number of shards per index − Each new index involves some basic costs − Distributed searches are expensive ! − Consider possible patterns in your data − Time-based data − User-based data − Or maybe none at all ! − Recommended reading − http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/scale.html
  12. codecentric AG Time-based Data − Assumptions − Documents arrive with

    (close-to-real-time) timestamps − (Almost) no updates of existing documents ! − Examples − Log files − Tweets
  13. codecentric AG One Index per Time Frame 2014-11-25 2014-11-26 current

    (used for indexing) 2014-11-27 Search for „last 2 days“ some-old-index ...
  14. codecentric AG Observations − Relatively simple to implement − Thanks

    to index templates and aliases ! − The cost of error is small − Frequent index creation facilitates quick improvements ! − But more complicated when updates/deletes of individual documents are needed
  15. codecentric AG User-based Data − Assumption − Documents form disjoint

    partitions with respect to visibility ! − Examples − Unrelated users on the same platform − Unrelated tenants with multiple users each
  16. codecentric AG One Index per User Index 1 Index 2

    Index N ... User 1 User 2 User N ! ! ! ! ! ! ! ! ! ! ! − Disadvantage − Each index consumes resources, does not scale to large numbers of users
  17. codecentric AG Single Index Shard 1 Shard 2 Shard M

    ... Search by user 1 filter by user 1 ! ! ! ! ! ! ! ! ! ! ! − Disadvantage − Distributed search even for users with little data
  18. codecentric AG filter by user 1 Single Index with Routing

    Shard 1 Shard 2 Shard M ... User 2 User 1 User 5 User 3 User 4 User 10 User N User N-1 Search by user 1 ! ! ! ! ! ! ! ! ! ! ! − Disadvantage − Some shards may become much bigger than others
  19. codecentric AG Observations − Clients do not need to know

    the approach chosen − Aliases can be associated with filter and routing information − In all cases, the client may address separate „user“ indexes (aliases) ! − It is possible to combine the approaches behind scenes − For example, start with „single index with routing“ − Depending on the need, migrate big users to dedicated indexes ! − Regardless of the approach chosen, we may always hit capacity limits − An index or a shard (and thus, with it, the index) may become too large − Then we basically have to deal with a „one big index“ scenario
  20. codecentric AG One Big Index − What to do when

    an index has reached its capacity? − Let’s say we even overallocated a bit, but growth is larger than expected ! − Option 1: Extend the index by a second one ! − Option 2: Migrate to a new index with more shards ! − Note: Searching multiple indexes is the same as searching a sharded index − 1 index with 50 shards =~ 50 indexes with 1 shard each − In both cases, 50 Lucene indexes are searched
  21. codecentric AG Extending an Index − Create a second index

    for new documents − Define an alias so that search considers both indexes ! − Challenge: Which index to address for updates, deletes, everything „by ID“? − Boils down to some kind of „sharding“ in the application − Documents need to carry something that can be used as „shard key“ Old New Client ???
  22. codecentric AG Possible Approaches − Use information from the main

    DB for mapping documents to indexes − For example, everything beyond a certain „creation date“ is directed to the new index − Need to add client-side logic for mapping dates to index name − Alternatively, store the index name directly in the main DB − Only applicable if there actually is a main DB ! − Encode the index name into the document ID − For example, UUID followed by index name − Does not require a main DB − Need to add logic during document ID generation − Clients need to know how to extract the index name from the document ID ! − A bit fragile overall, as it depends on non-search parts of the application
  23. codecentric AG Old New Migrating to a new Index Migrator

  24. codecentric AG Old New Migrating to a new Index Client

  25. codecentric AG Old New Migrating to a new Index Client

    ! ! ! ! ! ! ! ! ! ! ! ! − Can do that easily with downtime, but usually we want zero downtime
  26. codecentric AG Migrator − Helper application that reads from the

    old index and writes to the new index ! − Read via scan+scroll API − Iterate in batches over a snapshot of the data ! − Write via bulk API − Send batches of documents in single requests − Bulk size needs to be determined empirically ! − Notes − Requires _source, but that’s really a best practice anyway − Consider having the migrator read and write in parallel − Consider (partially) disabling replication during migration
  27. codecentric AG Old New Zero Downtime Migration Client

  28. codecentric AG Old New Zero Downtime Migration Client Writes Reads

  29. codecentric AG Old New Zero Downtime Migration Migrator Client Writes

  30. codecentric AG Old New Zero Downtime Migration Client Writes Reads

  31. codecentric AG Old New Zero Downtime Migration Client

  32. codecentric AG Old New Caveats Migrator Client Writes Reads Create

  33. codecentric AG Old New Caveats Migrator Client Writes Reads Make

  34. codecentric AG Old New Caveats Migrator Client Writes Reads Sync

 start with
 writing to
 new index
  35. codecentric AG Summary of Steps − For read operations use

    an alias pointing to the old index − Create a new index − Set index.gc_deletes to a large enough value (makes deletes irreversible) − Direct writes to both indexes − Wait until all old-index-only writes have been refreshed (check via search) − Run the migrator using optype=create (prevents lost updates) − When the migrator is done, stop indexing into the old index − Switch the read alias to the new index − Delete the old index ! − Note: Having a global „indexing queue“ eases the implementation of some steps − Single point where we need to make changes or monitor things
  36. codecentric AG Support for the Update API − Things are

    more complex when the application uses the Update API − Updates to the new index require an existing document ! − Possible solutions − Buffer writes to the new index, only run them when the migrator is done − But need to prevent duplicate updates − For example, by explicit versioning or a synced start of buffering and migrator − Another idea is to turn each update into a full re-indexing during migration − Start using the Update API again only after done with migration ! − Once again, having an „indexing queue“ is highly beneficial ! − Recipes for different scenarios will be detailed in the codecentric blog :-)
  37. codecentric AG Questions? Dr. rer. nat. Patrick Peschlow
 codecentric AG

    Merscheider Straße 1
 42699 Solingen
 tel +49 (0) 212.23 36 28 54
 fax +49 (0) 212.23 36 28 79