Elastic Silicon Valley Meetup - Zero-Downtime Re-Indexing of Elasticsearch at SignalFx.

M a h d i Be n H a m
i d a @m a h d ou c h m a h d i @s i g n a l fx . c om Zero-Downtime Re-Indexing of Elasticsearch 0 8 / 2 3 / 2 0 1 6

Elasticsearch at SignalFx • ES is used for ad-hoc  queries
and full-text search • 4 clusters in production, AWS across 3 AZs, ~one billion documents, tens of terabytes of data in our biggest cluster. We monitor using SignalFx ! • We do real-time monitoring and intelligent alerting for modern infrastructure • ES powers our metadata search layer • Source of truth is Cassandra.

Agenda • Why re-index ? • Re-indexing strategies • using
aliases • using aliases & generation numbers • SignalFx polyglot persistence : metabase overview • re-indexing walk-through • Learnings & best practices

Why re-index ? • Increase the number of shards (scale
up, or down) • Update the index mapping • Update existing fields: field type, analyzer, tokenizer, enable doc values, make field stored, include in _all, etc.. • Remove un-needed fields => Multiple strategies exist to do this depending on the use case, we will cover two

Strategy A: use aliases https://www.elastic.co/blog/changing-mapping-with-zero-downtime

Pre-requisite: always use an alias for querying and indexing myindex_v1
myindex

myindex_v2 Create new index with updated mappings myindex_v1 myindex

myindex_v2 Bulk import from current index myindex_v1 myindex

Switch alias when bulk import is finished myindex_v1 myindex myindex_v2

Remove old index myindex myindex_v2

Strategy A: Limitations • Works nicely for read-only indices (time
based for example) • Doesn’t work when you concurrently update/delete documents • In that case, some options are available: • during a downtime window, stop all writes, re-index, switch alias and resume writes • you can use an “indexing log” (using kafka for example), and replay the log from before you started the re-indexing => None works for us. That’s why we came up with strategy B.

Strategy B: Generation numbers https://signalfx.com/scaling-elasticsearch-sharding-availability-hundreds-millions-documents/

service-A metabase-client mb- mb- metabase-1 index-topic write-topic Metabase: SignalFx’s metadata
store (simplified) (1) enqueue write (2) dequeue write (3) write to C* (4) enqueue index (7) index document (5) dequeue index (6) read from C*

pre-requisite1 : readers query from an alias myindex_v1 myindex reader
reader reader

Second pre-requisite: indexing state accessible by indexers myindex_v1 indexer generation:
42 extra: <null> current: myindex_v1

myindex_v2 phase1: create new index with updated mappings myindex_v1 indexer
generation: 42 extra: <null> current: myindex_v1

phase2: increment generation, then start bulk re-indexing of older generations
myindex_v1 myindex_v2 _generation <= 42 indexer generation: 43 extra: <null> current: myindex_v1

during this step, documents may get added/ updated (or deleted*)
_generation <= 42 43 43 updated created indexer myindex_v1 generation: 43 extra: <null> current: myindex_v1

Index state at the end of the bulk indexing 43
43 43 43 43 indexer myindex_v1 generation: 43 extra: <null> current: myindex_v1

phase3 - enable double writing & bump generation 43 43
43 43 43 43 indexer myindex_v2 myindex_v1 generation: 44 extra: myindex_v2 current: myindex_v1

phase3 - step3: re-index documents at generation 43 43 43
43 43 43 43 44 44 44 44 indexer myindex_v2 myindex_v1 generation: 44 extra: myindex_v2 current: myindex_v1

43 43 43 43 44 44 44 44 44 44 indexer myindex_v2 myindex_v1 generation: 44 extra: myindex_v2 current: myindex_v1

43 43 43 43 43 44 44 44 44 44 44 44 44 indexer myindex_v2 myindex_v1 generation: 44 extra: myindex_v2 current: myindex_v1

43 43 43 43 43 44 44 44 44 44 44 44 44 44 44 44 44 44 44 indexer myindex_v2 myindex_v1 generation: 44 extra: myindex_v2 current: myindex_v1

phase3 - done : perfect sync of both indices 43
43 43 43 43 43 43 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 indexer myindex_v2 myindex_v1 generation: 44 extra: myindex_v2 current: myindex_v1

phase4 -A/B testing of the new index 43 43 43
43 43 43 43 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 indexer myindex_v2 myindex_v1 generation: 44 extra: current: myindex_v1 myindex reader reader reader

phase4 -swap read alias (or swap back !) 43 43
43 43 43 43 43 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 indexer myindex_v2 myindex_v1 generation: 44 extra: current: myindex_v1 myindex reader reader reader

phase5 - switch write index, generation, stop double writing 43
43 43 43 43 43 43 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 45 indexer 45 44 45 45 myindex_v1 generation: 45 extra: <null> current: myindex_v2 myindex_v2

phase6 - relax, then cleanup !

Dealing with failures

Handling failures • Bulk re-indexing can fail (and it does),
you don’t want to restart from scratch • Each document in our index has a “_partition_id” that allows us to partition the object id space into buckets. • We migrate ranges of partitions. If we fail, we only restart from the last failed range. • Failures during double-writing phase shouldn’t impact main indexing

What about deletions ? • We cheat today by adding
a deletion marker (_deleted_on_ms) to documents before deleting them. Queries filter out documents with that marker. • After the re-indexing is done, we will delete documents with the marker on the target index. • We also run a sanity check between the data in ES and the data in Cassandra (source of truth).

What about performance ?

Performance considerations • Migrate using partition ranges to avoid holding
segments for a long time • Add temporary nodes to handle the load • Disable refreshes on the target index (so worth it !) • Start with no replicas (or one, just in case) • Avoid “hot” shards by sorting on a field (a timestamp for example) • Have throttling controls to control indexing load

Zero-downtime upgrade from 1.7 to 2.3 • Same re-indexing procedure,
only difference is that it’s across clusters • lots of work to deal with the java library not being backward compatible (shaded jars, built lots of boilerplate abstractions, etc..) • Can switch back & forth between 1.7 and 2.3 live while double writing (useful in case we run into issues with 2.3) • Already done in our staging environments generation: 44 extra: myindex_v2 current: myindex_v1 current_cluster: mb-es extra_cluster: mb-es2

S I G N U P F O R A
T R I A L AT: signalfx.com Q U E S T I O N S ? The most advanced monitoring solution for modern infrastructure and applications.

Elastic Silicon Valley Meetup - Zero-Downtime R...

Elastic Silicon Valley Meetup - Zero-Downtime Re-Indexing of Elasticsearch at SignalFx.

More Decks by Elastic Co

Featured

Transcript