Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Elastic Silicon Valley Meetup - Zero-Downtime Re-Indexing of Elasticsearch at SignalFx.

Dd9d954997353b37b4c2684f478192d3?s=47 Elastic Co
August 24, 2016
480

Elastic Silicon Valley Meetup - Zero-Downtime Re-Indexing of Elasticsearch at SignalFx.

Zero-Downtime Re-Indexing of Elasticsearch at SignalFx.

Certain changes in Elasticsearch, such as number of primary shards and mapping changes, require re-indexing. This can be challenging to do depending upon your data flows and uptime requirements. At SignalFx, we have designed our indexing pipeline to allow for a full re-indexing without impacting our service or its users. In this talk, Mahdi will present SignalFx's polyglot metadata search infrastructure and go through the zero-downtime re-indexing process. He will also talk about the lessons learned running this in production as well as how we plan to use this to do a zero-downtime upgrade from Elasticsearch 1.7 to 2.0.

Mahdi is a software engineer with a decade of experience writing software. Previously, he spent 7 years at VMware building key components of its cloud management stack. At SignalFx, Mahdi enjoys the challenges of concurrency and distributed systems while working on the search and metadata persistence layers.

Dd9d954997353b37b4c2684f478192d3?s=128

Elastic Co

August 24, 2016
Tweet

Transcript

  1. M a h d i Be n H a m

    i d a @m a h d ou c h m a h d i @s i g n a l fx . c om Zero-Downtime Re-Indexing of Elasticsearch 0 8 / 2 3 / 2 0 1 6
  2. Elasticsearch at SignalFx • ES is used for ad-hoc
 queries

    and full-text search • 4 clusters in production, AWS across 3 AZs, ~one billion documents, tens of terabytes of data in our biggest cluster. We monitor using SignalFx ! • We do real-time monitoring and intelligent alerting for modern infrastructure • ES powers our metadata search layer • Source of truth is Cassandra.
  3. Agenda • Why re-index ? • Re-indexing strategies • using

    aliases • using aliases & generation numbers • SignalFx polyglot persistence : metabase overview • re-indexing walk-through • Learnings & best practices
  4. Why re-index ? • Increase the number of shards (scale

    up, or down) • Update the index mapping • Update existing fields: field type, analyzer, tokenizer, enable doc values, make field stored, include in _all, etc.. • Remove un-needed fields => Multiple strategies exist to do this depending on the use case, we will cover two
  5. Strategy A: use aliases https://www.elastic.co/blog/changing-mapping-with-zero-downtime

  6. Pre-requisite: always use an alias for querying and indexing myindex_v1

    myindex
  7. myindex_v2 Create new index with updated mappings myindex_v1 myindex

  8. myindex_v2 Bulk import from current index myindex_v1 myindex

  9. myindex_v2 Bulk import from current index myindex_v1 myindex

  10. Switch alias when bulk import is finished myindex_v1 myindex myindex_v2

  11. Switch alias when bulk import is finished myindex_v1 myindex myindex_v2

  12. Remove old index myindex myindex_v2

  13. Strategy A: Limitations • Works nicely for read-only indices (time

    based for example) • Doesn’t work when you concurrently update/delete documents • In that case, some options are available: • during a downtime window, stop all writes, re-index, switch alias and resume writes • you can use an “indexing log” (using kafka for example), and replay the log from before you started the re-indexing => None works for us. That’s why we came up with strategy B.
  14. Strategy B: Generation numbers https://signalfx.com/scaling-elasticsearch-sharding-availability-hundreds-millions-documents/

  15. service-A metabase-client mb- mb- metabase-1 index-topic write-topic Metabase: SignalFx’s metadata

    store (simplified) (1) enqueue write (2) dequeue write (3) write to C* (4) enqueue index (7) index document (5) dequeue index (6) read from C*
  16. pre-requisite1 : readers query from an alias myindex_v1 myindex reader

    reader reader
  17. Second pre-requisite: indexing state accessible by indexers myindex_v1 indexer generation:

    42 extra: <null> current: myindex_v1
  18. myindex_v2 phase1: create new index with updated mappings myindex_v1 indexer

    generation: 42 extra: <null> current: myindex_v1
  19. phase2: increment generation, then start bulk re-indexing of older generations

    myindex_v1 myindex_v2 _generation <= 42 indexer generation: 43 extra: <null> current: myindex_v1
  20. during this step, documents may get added/ updated (or deleted*)

    _generation <= 42 43 43 updated created indexer myindex_v1 generation: 43 extra: <null> current: myindex_v1
  21. Index state at the end of the bulk indexing 43

    43 43 43 43 indexer myindex_v1 generation: 43 extra: <null> current: myindex_v1
  22. phase3 - enable double writing & bump generation 43 43

    43 43 43 43 indexer myindex_v2 myindex_v1 generation: 44 extra: myindex_v2 current: myindex_v1
  23. phase3 - step3: re-index documents at generation 43 43 43

    43 43 43 43 44 44 44 44 indexer myindex_v2 myindex_v1 generation: 44 extra: myindex_v2 current: myindex_v1
  24. phase3 - step3: re-index documents at generation 43 43 43

    43 43 43 43 44 44 44 44 44 44 indexer myindex_v2 myindex_v1 generation: 44 extra: myindex_v2 current: myindex_v1
  25. phase3 - step3: re-index documents at generation 43 43 43

    43 43 43 43 43 44 44 44 44 44 44 44 44 indexer myindex_v2 myindex_v1 generation: 44 extra: myindex_v2 current: myindex_v1
  26. phase3 - step3: re-index documents at generation 43 43 43

    43 43 43 43 43 44 44 44 44 44 44 44 44 44 44 44 44 44 44 indexer myindex_v2 myindex_v1 generation: 44 extra: myindex_v2 current: myindex_v1
  27. phase3 - done : perfect sync of both indices 43

    43 43 43 43 43 43 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 indexer myindex_v2 myindex_v1 generation: 44 extra: myindex_v2 current: myindex_v1
  28. phase4 -A/B testing of the new index 43 43 43

    43 43 43 43 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 indexer myindex_v2 myindex_v1 generation: 44 extra: current: myindex_v1 myindex reader reader reader
  29. phase4 -swap read alias (or swap back !) 43 43

    43 43 43 43 43 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 indexer myindex_v2 myindex_v1 generation: 44 extra: current: myindex_v1 myindex reader reader reader
  30. phase5 - switch write index, generation, stop double writing 43

    43 43 43 43 43 43 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 45 indexer 45 44 45 45 myindex_v1 generation: 45 extra: <null> current: myindex_v2 myindex_v2
  31. phase6 - relax, then cleanup !

  32. Dealing with failures

  33. Handling failures • Bulk re-indexing can fail (and it does),

    you don’t want to re- start from scratch • Each document in our index has a “_partition_id” that allows us to partition the object id space into buckets. • We migrate ranges of partitions. If we fail, we only restart from the last failed range. • Failures during double-writing phase shouldn’t impact main indexing
  34. What about deletions ? • We cheat today by adding

    a deletion marker (_deleted_on_ms) to documents before deleting them. Queries filter out documents with that marker. • After the re-indexing is done, we will delete documents with the marker on the target index. • We also run a sanity check between the data in ES and the data in Cassandra (source of truth).
  35. What about performance ?

  36. Performance considerations • Migrate using partition ranges to avoid holding

    segments for a long time • Add temporary nodes to handle the load • Disable refreshes on the target index (so worth it !) • Start with no replicas (or one, just in case) • Avoid “hot” shards by sorting on a field (a timestamp for example) • Have throttling controls to control indexing load
  37. Zero-downtime upgrade from 1.7 to 2.3 • Same re-indexing procedure,

    only difference is that it’s across clusters • lots of work to deal with the java library not being backward compatible (shaded jars, built lots of boilerplate abstractions, etc..) • Can switch back & forth between 1.7 and 2.3 live while double writing (useful in case we run into issues with 2.3) • Already done in our staging environments generation: 44 extra: myindex_v2 current: myindex_v1 current_cluster: mb-es extra_cluster: mb-es2
  38. S I G N U P F O R A

    T R I A L AT: signalfx.com Q U E S T I O N S ? The most advanced monitoring solution for modern infrastructure and applications.