Upgrade to Pro — share decks privately, control downloads, hide ads and more …

How to reindex 1B documents in 1 hour?

Efa70d43d8e57623d0d5fafaeef1109f?s=47 Qaiser Abbasi
December 13, 2018

How to reindex 1B documents in 1 hour?

Efa70d43d8e57623d0d5fafaeef1109f?s=128

Qaiser Abbasi

December 13, 2018
Tweet

Transcript

  1. Rene Treffer, Qaiser Abbasi How to reindex 1B documents in

    1 hour?
  2. Search @ SoundCloud

  3. Powered by ElasticSearch

  4. Typical search document

  5. Clusters of 30 nodes

  6. Clusters of 30 nodes data size * replication = 120%

    * total memory
  7. Cluster 2 Cluster 3 Cluster 1 Cluster 2 Cluster 3

    Cluster 1 Multiple clusters per use-case
  8. Problems?

  9. Lead time of features and bugfixes Problems?

  10. Indexing

  11. Indexing 1. Extract

  12. Indexing 1. Extract 2. Build ES documents

  13. Indexing 1. Extract 2. Build ES documents 3. Load into

    ES
  14. Indexing 1. Extract 2. Build ES documents 3. Load into

    ES 0. Live updates
  15. Indexing 1. Extract 2. Build ES documents 3. Load into

    ES 0. Live updates Kafka Kafka +
  16. Kafka historic current compaction Cluster 1 Cluster 2 shipper 1

    shipper 2 indexer
  17. Kafka for ES documents 1. Enable compaction 2. Use fast

    compression 3. Use enough partitions 4. Use SSDs + 10GBit
  18. ES cluster lifecycle Reindex Live Maintenance

  19. Reindex settings 1. Shards 2. Replication settings 3. Async Translog

    4. Refresh Interval
  20. Finish reindexing 1. Merge into one segment*** 2. Set #

    replicas
  21. Throughput ≈ 600K OP/s ≈ 30 Mins

  22. 4X faster for 95% ≈ 40ms for 50%

  23. 4X Reindexing in 1 Sprint

  24. Summary • Solved initial problem • Enablement in daily life

  25. Future work

  26. Q & A

  27. Sounds interesting? Come and talk to us!

  28. THANK YOU rene.treffer@soundcloud.com qaiser.abbasi@soundcloud.com