Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Journey to the Real-Time Analytics in Extreme G...

AppsFlyer
September 22, 2016

Journey to the Real-Time Analytics in Extreme Growth

At AppsFlyer we provide a real-time analytics dashboard for Marketers. With our dashboard they invest $$$ budgets wisely. We aggregate some 8 billion daily events in real-time and our solution could not handle this load - dashboard just loaded forever and the Kafka lags were our daily and nightly headache. Product constantly demanded new features and guess what - we just couldn't do it! Moreover, we faced dangerous failures and the risk of losing serious data - something we obviously couldn't afford to do.
We started looking for a new infrastructure: We tried different databases and technologies and none of them provided the desired solution. We tried Cassandra, Mongo, Redis and Druid - with no success.
Join me on our journey and I will show you the current solution that implements real-time aggregation over MemSQL integrated with the batch processing over Apache Spark. The new architecture solved not only our pains but allowed us to aggregate X10 amount of data with much faster response times, keep up with product demands and it was a cheaper solution from the production cost perspective.

AppsFlyer

September 22, 2016
Tweet

More Decks by AppsFlyer

Other Decks in Technology

Transcript

  1. Requirements • RealTime • More events (more data) • More

    dimensions (MUCH MORE DATA !!!) • Stability • Faster
  2. Dashboard - DB abstraction level KAFKA Toku writers Toku master

    Toku slaves Dashboard Middleware (Vishnu)
  3. What did we gain? • Flexible middleware • Batch daily

    process - first step to recovery • Developers Paradise
  4. Recovery KAFKA (24h) MemSQL writers Master Memsql Cluster Dashboard Middleware

    (Vishnu) Yesterday snapshot Recovery Memsql Cluster MemSQL writers - only current day
  5. Mem SQL - Quick Win • Fast • Recoverable •

    Possibility to return to 0 point • Ability to add new features • More Data (X30)
  6. Show me the numbers • Data - 100 GB x

    2 clusters • Query Latency - 1-3 seconds • Machines x 2 clusters – 2 aggregators - m4.4xlarge – 4 leaves - r3.4xlarge • Cost reduction $20K less than toku monthly
  7. Current - Architecture KAFKA writers - only new data Memsql

    Rowstore Cluster 1-2 weeks Dashboard Middleware (Vishnu) Daily Batch process S3 files Memsql Columnstore History Cluster Daily