Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Handling Exponential Growth With Elasticsearch

Handling Exponential Growth With Elasticsearch

PHP and MySQL are found throughout the stacks of tech companies. They are particularly favoured by start-ups due to their ease of use and low barriers to entry. But what happens when a start-up becomes successful and strays in a direction that the existing stack was not built to deal with? This session takes a look at the example of Voxpopme, who found their data ingestion needs doubling year on year, with a sudden need to support datasets hundreds of times larger than before. We will explore the benefits of an inverted index DBMS such as Elasticsearch for storing and querying this kind of data, whilst looking at methods of integrating this new technology into a legacy PHP/MySQL application in a scalable, backwards-compatible way.

Avatar for David Maidment

David Maidment

October 25, 2018
Tweet

Other Decks in Programming

Transcript

  1. !4 Basics of an inverted index Get all words &

    phrases across entire dataset Each word links to the documents that contain it No DB scanning — just access word in hash table
  2. !5 UK Offices Birmingham London US Office Salt Lake City

    Who are we? World’s #1 video insight platform
 Founded in 2013
  3. !7 Exponential growth MILLION video responses since 2013 THOUSAND video

    responses in the last 12 months MILLION minutes of video processed 1.5 500 1
  4. !10 Legacy search mechanism Elasticsearch Voxpopme API MySQL MongoDB Relational

    data stored in MySQL Free-form data stored in MongoDB Transcripts stored in Elasticsearch ID lookups (responses, users, etc.) Additional data queries (user demographics, etc.) Free-text transcript searches Search MySQL, MongoDB and Elasticsearch Calculate intersection of record IDs Look up final records by ID
  5. !12 Exploring the options MySQL MongoDB Pro: Most of our

    data already stored in MySQL Pro: Easier to search freeform and text fields Pro: Indexed data very quick to search Pro: Easier to set up and scale than MySQL Pro: Text fields fast to search with small dataset Con: Most data currently not stored in MongoDB Con: No easy way to store unstructured data Con: Huge API rewrite to make MongoDB main DB Con: Text fields very slow to search at scale Con: Not our area of expertise
  6. !13

  7. !14 New search mechanism Elasticsearch Voxpopme API MySQL MongoDB Data

    still stored in MySQL and MongoDB User searches only hit Elasticsearch MongoDB can eventually be removed Store data as before Messaging Queue ORM hook Reindex Microservice Fetch and combine records into JSON document Index document User search
  8. !16

  9. !17

  10. !20 Lessons learned • Elasticsearch solves a lot of problems,

    but is not a silver bullet • Your first mapping will be wrong • Your second mapping will probably also be wrong • Iterating over designs while A/B testing will take months, but will be worth it • The cluster is disposable, and that’s a good thing ‒ Upgrades are easier ‒ Unexpected data loss is only a temporary setback What we wish we knew when we started