Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Elastic{ON} Tour 2017 New York : Elastic @ Vimeo: Elasticsearch for... SEARCH?

Elastic Co
November 09, 2017

Elastic{ON} Tour 2017 New York : Elastic @ Vimeo: Elasticsearch for... SEARCH?

Elastic{ON} Tour New York - November 9, 2017

Vimeo engineers discuss the unique considerations required to build a scalable search product using Elasticsearch. Many modern Elasticsearch deployments are based around the increasingly common time series/log aggregation use case. These projects require different scaling considerations than those of a generic search application. This talk will explore some specific considerations Vimeo has encountered while designing their new, soon-to-launch search service, as well as themes relating to Elasticsearch's heritage as a scalable, distributed search engine using Apache Lucene.

Christopher Simpson | Software Engineer | Vimeo

Elastic Co

November 09, 2017
Tweet

More Decks by Elastic Co

Other Decks in Technology

Transcript

  1. 3 YEARS of video uploaded per month 75 PETABYTES served

    per month 50+ search req per second via elasticsearch >1K
  2. 6 Time-Based Data logs-2017-11-08 logs-2017-11-07 logs-2017-11-06 logs-2017-11-05 logs-2017-11-04 logs-2017-11-03 logs-2017-11-02

    logs-2017-11-01 logs-2017-10-31 logs-2017-10-30 logs-2017-10-29 logs-2017-10-28 logs-2017-10-27 logs-2017-10-26 logs-2017-10-25
  3. 7 Time-Based Data logs-2017-11-08 logs-2017-11-07 logs-2017-11-06 logs-2017-11-05 logs-2017-11-04 logs-2017-11-03 logs-2017-11-02

    logs-2017-11-01 logs-2017-10-31 logs-2017-10-30 logs-2017-10-29 logs-2017-10-28 logs-2017-10-27 logs-2017-10-26 logs-2017-10-25
  4. 10 Index Design • Even if your data isn’t primarily

    time-based, it doesn’t mean that it can’t be partitioned based on another property. • Take the time to understand how your users will query the data and select a suitable index pattern. • Not all datasets grow as quickly as ours, or require updates to the documents as regularly. Not everything needs to be ‘manually’ partitioned. • (Remember that your index is split into shards) • We doubled the size of the data we have stored in ES, without any noticeable impact, by using a decent indexing pattern.
  5. 14 Querying • Templates are a natural fit for a

    JSON DSL. • Helps isolate logic, and makes unit testing easy. • Easy to introduce AB tests.
  6. 16 Summary • Updates are expensive, but often necessary. •

    Understand the resources of your cluster. • Ensure that it is easy to slow or pause updates. • If bulk indexing into a new index. Follow best practices: ‒Turn number of replicas to zero. ‒Temporarily disable index refresh
  7. 18

  8. 19

  9. 20

  10. 22 Summary • Elasticsearch is powerful search engine. It does

    other things very well too. • Index design is important and can help you scale. • A flexible querying layer will help you iterate faster. • Understanding the resources of your cluster will make it easier to scale. • Updates are expensive - so ensure you can control how aggressively you are sending indexing requests. • If you design your service well, major version upgrades don’t have to hurt! “You know, for Search”