Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Journey to the Real-Time Analytics in Extreme Growth

AppsFlyer
September 22, 2016

Journey to the Real-Time Analytics in Extreme Growth

At AppsFlyer we provide a real-time analytics dashboard for Marketers. With our dashboard they invest $$$ budgets wisely. We aggregate some 8 billion daily events in real-time and our solution could not handle this load - dashboard just loaded forever and the Kafka lags were our daily and nightly headache. Product constantly demanded new features and guess what - we just couldn't do it! Moreover, we faced dangerous failures and the risk of losing serious data - something we obviously couldn't afford to do.
We started looking for a new infrastructure: We tried different databases and technologies and none of them provided the desired solution. We tried Cassandra, Mongo, Redis and Druid - with no success.
Join me on our journey and I will show you the current solution that implements real-time aggregation over MemSQL integrated with the batch processing over Apache Spark. The new architecture solved not only our pains but allowed us to aggregate X10 amount of data with much faster response times, keep up with product demands and it was a cheaper solution from the production cost perspective.

AppsFlyer

September 22, 2016
Tweet

More Decks by AppsFlyer

Other Decks in Technology

Transcript

  1. Journey to the
    Real-Time Analytics
    in Extreme Growth
    [email protected]

    View full-size slide

  2. Real Time Dashboard
    • User acquisition
    • 8B events daily

    View full-size slide

  3. Data is Mutable

    View full-size slide

  4. Previous solution - Toku (Mongo)
    KAFKA
    Toku
    writers Toku master
    Toku slaves
    Dashboard

    View full-size slide

  5. Toku Problems
    • Failures on weekly basis
    • Bad modeling
    • No recovery

    View full-size slide

  6. Requirements
    • RealTime
    • More events (more data)
    • More dimensions (MUCH MORE DATA !!!)
    • Stability
    • Faster

    View full-size slide

  7. Dashboard - DB abstraction level
    KAFKA
    Toku
    writers Toku master
    Toku slaves
    Dashboard Middleware
    (Vishnu)

    View full-size slide

  8. https://www.meetup.com/Druid-Israel/events/232075974/

    View full-size slide

  9. What did we gain?
    • Flexible middleware
    • Batch daily process - first step to recovery
    • Developers Paradise

    View full-size slide

  10. Down to Earth

    View full-size slide

  11. MemSQL
    In Memory Scalable DB

    View full-size slide

  12. Current Solution - MemSQL

    View full-size slide

  13. MemSQL Architecture
    KAFKA
    MemSQL
    writers
    Memsql
    Cluster
    Dashboard
    Middleware
    (Vishnu)
    MemSQL
    writers
    Memsql
    Cluster (Slave)

    View full-size slide

  14. Recovery
    KAFKA (24h)
    MemSQL
    writers
    Master
    Memsql
    Cluster
    Dashboard
    Middleware
    (Vishnu)
    Yesterday
    snapshot
    Recovery
    Memsql
    Cluster
    MemSQL writers -
    only current day

    View full-size slide

  15. Mem SQL - Quick Win
    • Fast
    • Recoverable
    • Possibility to return to 0 point
    • Ability to add new features
    • More Data (X30)

    View full-size slide

  16. Show me the numbers
    • Data - 100 GB x 2 clusters
    • Query Latency - 1-3 seconds
    • Machines x 2 clusters
    – 2 aggregators - m4.4xlarge
    – 4 leaves - r3.4xlarge
    • Cost reduction $20K less
    than toku monthly

    View full-size slide

  17. Good Enough Approach
    • More data - more money
    • Less money - less data

    View full-size slide

  18. Current - Architecture
    KAFKA
    writers -
    only new
    data
    Memsql
    Rowstore
    Cluster
    1-2 weeks
    Dashboard
    Middleware
    (Vishnu)
    Daily Batch
    process
    S3
    files
    Memsql
    Columnstore
    History
    Cluster
    Daily

    View full-size slide

  19. “Premature optimization is
    a root of all evil”
    Donald Knuth

    View full-size slide

  20. appsflyer.com/jobs

    View full-size slide

  21. http://www.shutterstock.com/pic.mhtml?utm_campaign=ClipartLogo&irgwc=1&tpl=46764-50655&id=154723511&language=en&utm_medi
    um=Affiliate&utm_source=46764
    http://www.samatters.com/wp-content/uploads/2015/07/round-peg.jpg
    http://marsmedia.info/en/blog/cassandra.png
    http://www.zdnet.de/wp-content/uploads/2013/10/mongodb-logo.jpg
    https://chris.lu/upload/images/redis.png
    https://upload.wikimedia.org/wikipedia/en/b/ba/Druid_MasterLogo_Full_Color_Small.png
    https://www.leftronic.com/wp-content/uploads/2015/04/Amazonredshift_220x110.png

    View full-size slide