Scaling Elasticsearch Successfully

Slide 1

Slide 1 text

codecentric AG Patrick Peschlow Scaling Elasticsearch Successfully

Slide 2

Slide 2 text

codecentric AG Cluster Basics master client data data data data data data client client client Nodes

Slide 3

Slide 3 text

codecentric AG Split Brain ! ! ! ! ! ! ! ! ! ! − Set minimum_master_nodes to quorum − Prevents split brains caused by „full“ partitioning − But: Split brains may still occur when single links fail, e.g., due to overload

Slide 4

Slide 4 text

codecentric AG Sharding − Enable larger indexes − Parallelize/scale direct operations on individual documents − Index, update, delete, get Node 1 Node 2 Shard 1 Shard 2 Shard 3

Slide 5

Slide 5 text

codecentric AG Routing − By default the document _id field is used for shard key calculation ! − Can be overridden via explicit „routing“ ! − For example, select the shard depending on a user ID or some document field

Slide 6

Slide 6 text

codecentric AG Distributed Search − Sharding (by default) implies distributed search − Tends to make each individual search request (much) slower ! − A single search may involve several round trips to various nodes − 1. Gather global information for more accurate scoring − 2. Perform the actual search and compute scores − 3. Retrieve the final set of documents from the relevant shards − In between coordination/reduction by the node that initially received the request ! − The desired behavior may be specified on a per request basis („search type“) − By default, step 1 is omitted − Step 2 and 3 may be combined into one (but that’s risky with pagination)

Slide 7

Slide 7 text

codecentric AG Sharding Gotcha − The number of shards needs to be chosen on index creation − No shard splitting later ! − General recommendation for determining the number of shards − Define metrics for a shard „capacity limit“ − Test the capacity limit of a single shard − Use realistic data and workloads − Depending on expected total amount of data, calculate the required number of shards − Overallocate a little

Slide 8

Slide 8 text

codecentric AG Replication Node 1 Node 2 Primary 1 Primary 2 Primary 3 Replica 2 Replica 3 Replica 1 − Enable HA − Parallelize/scale read operations − Get, Search

Slide 9

Slide 9 text

codecentric AG Consistency − „consistency“ − How many shards need to be available to permit a write operation − all, quorum (default), one ! − „replication“ − Complete a write request already when the primary is done − Or only when all replicas have acknowledged the write (translog) − sync (default), async ! − „preference“ − On which shards to execute a search − round robin (default), local, primary, only some shards or nodes, arbitrary string − Helps to avoid inconsistent user experience when scoring differs between replicas − May happen because documents marked for deletion still affect scoring

Slide 10

Slide 10 text

codecentric AG Index Aliases − A logical name for one or more Elasticsearch index(es) ! − Decouples client view from physical storage − Create views on an index, e.g., for different users − Search across multiple indexes ! − Enables changes without clients noticing − Point an alias to something new, e.g., switch to another index ! − Limitation: Writes are only permitted for aliases that point to a single index ! − Recommendation: Use aliases right from the start

Slide 11

Slide 11 text

codecentric AG Designing for Scalability − Why should we think about scaling right from the start? − Fixed number of shards per index − Each new index involves some basic costs − Distributed searches are expensive ! − Consider possible patterns in your data − Time-based data − User-based data − Or maybe none at all ! − Recommended reading − http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/scale.html

Slide 12

Slide 12 text

codecentric AG Time-based Data − Assumptions − Documents arrive with (close-to-real-time) timestamps − (Almost) no updates of existing documents ! − Examples − Log files − Tweets

Slide 13

Slide 13 text

codecentric AG One Index per Time Frame 2014-11-25 2014-11-26 current  (used for indexing) 2014-11-27 Search for „last 2 days“ some-old-index ...

Slide 14

Slide 14 text

codecentric AG Observations − Relatively simple to implement − Thanks to index templates and aliases ! − The cost of error is small − Frequent index creation facilitates quick improvements ! − But more complicated when updates/deletes of individual documents are needed

Slide 15

Slide 15 text

codecentric AG User-based Data − Assumption − Documents form disjoint partitions with respect to visibility ! − Examples − Unrelated users on the same platform − Unrelated tenants with multiple users each

Slide 16

Slide 16 text

codecentric AG One Index per User Index 1 Index 2 Index N ... User 1 User 2 User N ! ! ! ! ! ! ! ! ! ! ! − Disadvantage − Each index consumes resources, does not scale to large numbers of users

Slide 17

Slide 17 text

codecentric AG Single Index Shard 1 Shard 2 Shard M ... Search by user 1 filter by user 1 ! ! ! ! ! ! ! ! ! ! ! − Disadvantage − Distributed search even for users with little data

Slide 18

Slide 18 text

codecentric AG filter by user 1 Single Index with Routing Shard 1 Shard 2 Shard M ... User 2 User 1 User 5 User 3 User 4 User 10 User N User N-1 Search by user 1 ! ! ! ! ! ! ! ! ! ! ! − Disadvantage − Some shards may become much bigger than others

Slide 19

Slide 19 text

codecentric AG Observations − Clients do not need to know the approach chosen − Aliases can be associated with filter and routing information − In all cases, the client may address separate „user“ indexes (aliases) ! − It is possible to combine the approaches behind scenes − For example, start with „single index with routing“ − Depending on the need, migrate big users to dedicated indexes ! − Regardless of the approach chosen, we may always hit capacity limits − An index or a shard (and thus, with it, the index) may become too large − Then we basically have to deal with a „one big index“ scenario

Slide 20

Slide 20 text

codecentric AG One Big Index − What to do when an index has reached its capacity? − Let’s say we even overallocated a bit, but growth is larger than expected ! − Option 1: Extend the index by a second one ! − Option 2: Migrate to a new index with more shards ! − Note: Searching multiple indexes is the same as searching a sharded index − 1 index with 50 shards =~ 50 indexes with 1 shard each − In both cases, 50 Lucene indexes are searched

Slide 21

Slide 21 text

codecentric AG Extending an Index − Create a second index for new documents − Define an alias so that search considers both indexes ! − Challenge: Which index to address for updates, deletes, everything „by ID“? − Boils down to some kind of „sharding“ in the application − Documents need to carry something that can be used as „shard key“ Old New Client ???

Slide 22

Slide 22 text

codecentric AG Possible Approaches − Use information from the main DB for mapping documents to indexes − For example, everything beyond a certain „creation date“ is directed to the new index − Need to add client-side logic for mapping dates to index name − Alternatively, store the index name directly in the main DB − Only applicable if there actually is a main DB ! − Encode the index name into the document ID − For example, UUID followed by index name − Does not require a main DB − Need to add logic during document ID generation − Clients need to know how to extract the index name from the document ID ! − A bit fragile overall, as it depends on non-search parts of the application

Slide 23

Slide 23 text

codecentric AG Old New Migrating to a new Index Migrator Client

Slide 24

Slide 24 text

codecentric AG Old New Migrating to a new Index Client

Slide 25

Slide 25 text

codecentric AG Old New Migrating to a new Index Client ! ! ! ! ! ! ! ! ! ! ! ! − Can do that easily with downtime, but usually we want zero downtime

Slide 26

Slide 26 text

codecentric AG Migrator − Helper application that reads from the old index and writes to the new index ! − Read via scan+scroll API − Iterate in batches over a snapshot of the data ! − Write via bulk API − Send batches of documents in single requests − Bulk size needs to be determined empirically ! − Notes − Requires _source, but that’s really a best practice anyway − Consider having the migrator read and write in parallel − Consider (partially) disabling replication during migration

Slide 27

Slide 27 text

codecentric AG Old New Zero Downtime Migration Client

Slide 28

Slide 28 text

codecentric AG Old New Zero Downtime Migration Client Writes Reads

Slide 29

Slide 29 text

codecentric AG Old New Zero Downtime Migration Migrator Client Writes Reads

Slide 30

Slide 30 text

codecentric AG Old New Zero Downtime Migration Client Writes Reads

Slide 31

Slide 31 text

codecentric AG Old New Zero Downtime Migration Client

Slide 32

Slide 32 text

codecentric AG Old New Caveats Migrator Client Writes Reads Create  only

Slide 33

Slide 33 text

codecentric AG Old New Caveats Migrator Client Writes Reads Make deletes  irreversible

Slide 34

Slide 34 text

codecentric AG Old New Caveats Migrator Client Writes Reads Sync migrator  start with  writing to  new index

Slide 35

Slide 35 text

codecentric AG Summary of Steps − For read operations use an alias pointing to the old index − Create a new index − Set index.gc_deletes to a large enough value (makes deletes irreversible) − Direct writes to both indexes − Wait until all old-index-only writes have been refreshed (check via search) − Run the migrator using optype=create (prevents lost updates) − When the migrator is done, stop indexing into the old index − Switch the read alias to the new index − Delete the old index ! − Note: Having a global „indexing queue“ eases the implementation of some steps − Single point where we need to make changes or monitor things

Slide 36

Slide 36 text

codecentric AG Support for the Update API − Things are more complex when the application uses the Update API − Updates to the new index require an existing document ! − Possible solutions − Buffer writes to the new index, only run them when the migrator is done − But need to prevent duplicate updates − For example, by explicit versioning or a synced start of buffering and migrator − Another idea is to turn each update into a full re-indexing during migration − Start using the Update API again only after done with migration ! − Once again, having an „indexing queue“ is highly beneficial ! − Recipes for different scenarios will be detailed in the codecentric blog :-)

Slide 37

Slide 37 text

codecentric AG Questions? Dr. rer. nat. Patrick Peschlow  codecentric AG  Merscheider Straße 1  42699 Solingen    tel +49 (0) 212.23 36 28 54  fax +49 (0) 212.23 36 28 79  [email protected]    www.codecentric.de