Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Sometimes, Druid is not the best solution for a business use case

Sometimes, Druid is not the best solution for a business use case

The presentation is about few use cases when you should use druid and when not.

AppsFlyer

July 27, 2016
Tweet

More Decks by AppsFlyer

Other Decks in Programming

Transcript

  1. Sometimes, Druid is not the best
    solution for a business use case
    Yulia Trakhtenberg

    View full-size slide

  2. Real Time Dashboard
    • Optimizing campaigns
    based on user
    performance analytics
    • 8 billion events daily
    • 12-15 dimensions
    • Data starting 2011
    • Extendable

    View full-size slide

  3. Time based data is mutable! -
    Life Time Value
    Session started
    - date1
    Install date purchase -
    date2
    uninstall -
    date3

    View full-size slide

  4. Previous Solution - Toku (Mongo)
    KAFKA
    Toku
    writers Toku master
    Toku slaves
    Dashboard

    View full-size slide

  5. Toku Problems
    • Failures on weekly basis
    – master lost
    – no ability to select a new master
    • Bad modeling - slow writes
    • Data loss was possible
    • No ability to recover

    View full-size slide

  6. Dashboard - DB abstraction level
    KAFKA
    Toku
    writers Toku master
    Toku slaves
    Dashboard Middleware
    (Vishnu)

    View full-size slide

  7. 1. Cassandra
    2. mongo 3
    3. proprietary DB
    4. redis labs + indices DB
    5. Pinot
    6. Redshift
    7. druid
    We tried...

    View full-size slide

  8. Druid DB
    • Storage optimized for analytics
    • Lambda architecture inside
    • JSON-based query language
    • Developed by analytics SAAS company
    • Free and open source
    • Scalable to petabytes...
    • Optimized for time based data

    View full-size slide

  9. Druid - take 1 - Batch and
    Realtime combination
    KAFKA
    Druid
    sink
    Dashboard Middleware
    (Vishnu)
    Daily Batch
    process
    In
    Memory
    Historical
    node
    Daily
    backup
    S3 files
    Druid

    View full-size slide

  10. Druid - take 1 - Going to production
    • biggest customers cannot open a dashboard
    • Busy weekend to give Druid a shot
    • On Sunday morning - dashboard opens and
    works well!!!
    • Customer was happy!!!
    • We moved more and more customers to Druid

    View full-size slide

  11. Druid - problems
    • Not time based data - LTV
    • Solution:
    – Timeseries on event time
    – Secondary index on event install date
    • Realtime data - in memory, no secondary
    indices - too slow

    View full-size slide

  12. Current solution - MemSQL
    • In Memory DB
    • Rowstore and
    Columnstore
    • Aggregators
    and Leaves

    View full-size slide

  13. MemSQL Architecture
    KAFKA
    MemSQL
    writers
    Memsql
    Cluster
    Dashboard
    Middleware
    (Vishnu)
    MemSQL
    writers
    Memsql
    Cluster (Slave)

    View full-size slide

  14. MemSQL - why is it a solution
    • Fast
    • Extendable
    • Better modeling
    • Recoverable solution
    • Possibility to return to 0 point

    View full-size slide

  15. MemSQL - why is it a solution
    • Data - 450 GB x 2 clusters
    • Query Latency - 1-3 seconds
    • Machines x 2 clusters
    – 2 aggregators - m4.4xlarge
    – 4 leaves - r3.4xlarge
    • Cost reduction -$20K monthly

    View full-size slide

  16. Recovery
    KAFKA (24h)
    MemSQL
    writers
    Master
    Memsql
    Cluster
    Dashboard
    Middleware
    (Vishnu)
    Yesterday
    snapshot
    Recovery
    Memsql
    Cluster
    MemSQL writers -
    only current day

    View full-size slide

  17. Yesterday S3
    files
    Daily
    backup S3
    files
    Daily Incremental Batch -
    Yesterday Snapshot
    KAFKA
    Daily Batch
    Incremental
    process
    Past S3
    files

    View full-size slide

  18. Next steps - Architecture
    KAFKA
    writers -
    only new
    data
    Memsql
    Rowstore
    Cluster
    1-2 weeks
    Dashboard
    Middleware
    (Vishnu)
    Daily Batch
    process
    S3
    files
    Memsql
    Columnstore
    History
    Cluster
    Daily

    View full-size slide

  19. Thank you
    Join the Data
    Team!
    [email protected]

    View full-size slide