Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Elastic Silicon Valley Meetup - Zero-Downtime Re-Indexing of Elasticsearch at SignalFx.

Elastic Co
August 24, 2016
560

Elastic Silicon Valley Meetup - Zero-Downtime Re-Indexing of Elasticsearch at SignalFx.

Zero-Downtime Re-Indexing of Elasticsearch at SignalFx.

Certain changes in Elasticsearch, such as number of primary shards and mapping changes, require re-indexing. This can be challenging to do depending upon your data flows and uptime requirements. At SignalFx, we have designed our indexing pipeline to allow for a full re-indexing without impacting our service or its users. In this talk, Mahdi will present SignalFx's polyglot metadata search infrastructure and go through the zero-downtime re-indexing process. He will also talk about the lessons learned running this in production as well as how we plan to use this to do a zero-downtime upgrade from Elasticsearch 1.7 to 2.0.

Mahdi is a software engineer with a decade of experience writing software. Previously, he spent 7 years at VMware building key components of its cloud management stack. At SignalFx, Mahdi enjoys the challenges of concurrency and distributed systems while working on the search and metadata persistence layers.

Elastic Co

August 24, 2016
Tweet

More Decks by Elastic Co

Transcript

  1. M a h d i Be n H a m

    i d a @m a h d ou c h m a h d i @s i g n a l fx . c om Zero-Downtime Re-Indexing of Elasticsearch 0 8 / 2 3 / 2 0 1 6
  2. Elasticsearch at SignalFx • ES is used for ad-hoc
 queries

    and full-text search • 4 clusters in production, AWS across 3 AZs, ~one billion documents, tens of terabytes of data in our biggest cluster. We monitor using SignalFx ! • We do real-time monitoring and intelligent alerting for modern infrastructure • ES powers our metadata search layer • Source of truth is Cassandra.
  3. Agenda • Why re-index ? • Re-indexing strategies • using

    aliases • using aliases & generation numbers • SignalFx polyglot persistence : metabase overview • re-indexing walk-through • Learnings & best practices
  4. Why re-index ? • Increase the number of shards (scale

    up, or down) • Update the index mapping • Update existing fields: field type, analyzer, tokenizer, enable doc values, make field stored, include in _all, etc.. • Remove un-needed fields => Multiple strategies exist to do this depending on the use case, we will cover two
  5. Strategy A: Limitations • Works nicely for read-only indices (time

    based for example) • Doesn’t work when you concurrently update/delete documents • In that case, some options are available: • during a downtime window, stop all writes, re-index, switch alias and resume writes • you can use an “indexing log” (using kafka for example), and replay the log from before you started the re-indexing => None works for us. That’s why we came up with strategy B.
  6. service-A metabase-client mb- mb- metabase-1 index-topic write-topic Metabase: SignalFx’s metadata

    store (simplified) (1) enqueue write (2) dequeue write (3) write to C* (4) enqueue index (7) index document (5) dequeue index (6) read from C*
  7. myindex_v2 phase1: create new index with updated mappings myindex_v1 indexer

    generation: 42 extra: <null> current: myindex_v1
  8. phase2: increment generation, then start bulk re-indexing of older generations

    myindex_v1 myindex_v2 _generation <= 42 indexer generation: 43 extra: <null> current: myindex_v1
  9. during this step, documents may get added/ updated (or deleted*)

    _generation <= 42 43 43 updated created indexer myindex_v1 generation: 43 extra: <null> current: myindex_v1
  10. Index state at the end of the bulk indexing 43

    43 43 43 43 indexer myindex_v1 generation: 43 extra: <null> current: myindex_v1
  11. phase3 - enable double writing & bump generation 43 43

    43 43 43 43 indexer myindex_v2 myindex_v1 generation: 44 extra: myindex_v2 current: myindex_v1
  12. phase3 - step3: re-index documents at generation 43 43 43

    43 43 43 43 44 44 44 44 indexer myindex_v2 myindex_v1 generation: 44 extra: myindex_v2 current: myindex_v1
  13. phase3 - step3: re-index documents at generation 43 43 43

    43 43 43 43 44 44 44 44 44 44 indexer myindex_v2 myindex_v1 generation: 44 extra: myindex_v2 current: myindex_v1
  14. phase3 - step3: re-index documents at generation 43 43 43

    43 43 43 43 43 44 44 44 44 44 44 44 44 indexer myindex_v2 myindex_v1 generation: 44 extra: myindex_v2 current: myindex_v1
  15. phase3 - step3: re-index documents at generation 43 43 43

    43 43 43 43 43 44 44 44 44 44 44 44 44 44 44 44 44 44 44 indexer myindex_v2 myindex_v1 generation: 44 extra: myindex_v2 current: myindex_v1
  16. phase3 - done : perfect sync of both indices 43

    43 43 43 43 43 43 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 indexer myindex_v2 myindex_v1 generation: 44 extra: myindex_v2 current: myindex_v1
  17. phase4 -A/B testing of the new index 43 43 43

    43 43 43 43 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 indexer myindex_v2 myindex_v1 generation: 44 extra: current: myindex_v1 myindex reader reader reader
  18. phase4 -swap read alias (or swap back !) 43 43

    43 43 43 43 43 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 indexer myindex_v2 myindex_v1 generation: 44 extra: current: myindex_v1 myindex reader reader reader
  19. phase5 - switch write index, generation, stop double writing 43

    43 43 43 43 43 43 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 45 indexer 45 44 45 45 myindex_v1 generation: 45 extra: <null> current: myindex_v2 myindex_v2
  20. Handling failures • Bulk re-indexing can fail (and it does),

    you don’t want to re- start from scratch • Each document in our index has a “_partition_id” that allows us to partition the object id space into buckets. • We migrate ranges of partitions. If we fail, we only restart from the last failed range. • Failures during double-writing phase shouldn’t impact main indexing
  21. What about deletions ? • We cheat today by adding

    a deletion marker (_deleted_on_ms) to documents before deleting them. Queries filter out documents with that marker. • After the re-indexing is done, we will delete documents with the marker on the target index. • We also run a sanity check between the data in ES and the data in Cassandra (source of truth).
  22. Performance considerations • Migrate using partition ranges to avoid holding

    segments for a long time • Add temporary nodes to handle the load • Disable refreshes on the target index (so worth it !) • Start with no replicas (or one, just in case) • Avoid “hot” shards by sorting on a field (a timestamp for example) • Have throttling controls to control indexing load
  23. Zero-downtime upgrade from 1.7 to 2.3 • Same re-indexing procedure,

    only difference is that it’s across clusters • lots of work to deal with the java library not being backward compatible (shaded jars, built lots of boilerplate abstractions, etc..) • Can switch back & forth between 1.7 and 2.3 live while double writing (useful in case we run into issues with 2.3) • Already done in our staging environments generation: 44 extra: myindex_v2 current: myindex_v1 current_cluster: mb-es extra_cluster: mb-es2
  24. S I G N U P F O R A

    T R I A L AT: signalfx.com Q U E S T I O N S ? The most advanced monitoring solution for modern infrastructure and applications.