$30 off During Our Annual Pro Sale. View Details »

1, 2, 3, 4 Add Another Data Store (And Other Rhymes)

simplereach
September 05, 2012

1, 2, 3, 4 Add Another Data Store (And Other Rhymes)

Eric Lubow's presentation from the 2012 Cassandra Summit.

simplereach

September 05, 2012
Tweet

More Decks by simplereach

Other Decks in Technology

Transcript

  1. 1,2,3,4
    Add Another Data Store
    (And Other Rhymes)
    Eric Lubow
    @elubow
    [email protected]
    #cassandra12

    View Slide

  2. 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
    Overview
    • SimpleReach
    • Definitions and Data Stores
    • Evolution to Polyglottany
    • Tie It Together
    • Questions

    View Slide

  3. 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
    Socially Intelligent

    View Slide

  4. 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
    Size
    • 100m events
    recorded per day and
    growing
    • 500m Pageviews per
    month and growing

    View Slide

  5. 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
    Polyglot Persistence
    Polyglot Persistence, like polyglot programming, is all
    about choosing the right persistence option for the task
    at hand.
    http://www.sleberknight.com/blog/sleberkn/entry/polyglot_persistence

    View Slide

  6. 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
    Right Tool For The Job

    View Slide

  7. 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
    Why?
    • Heavier READ loads vs heavier write loads
    • Data relationships may be less important
    • Different aspects of a system have different requirements

    View Slide

  8. 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
    No One Size Fits All

    View Slide

  9. 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
    Tools

    View Slide

  10. 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
    Free vs. Cost

    View Slide

  11. 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
    Languages

    View Slide

  12. 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
    Pre-Scale

    View Slide

  13. 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
    Scale

    View Slide

  14. 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
    SimpleReach Pre-Scale

    View Slide

  15. 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
    SimpleReach

    View Slide

  16. 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
    • Large data volume ingestion
    • Really fast writes to many locations (eventual consistency)
    • Query by column groups within rows
    • Range queries in Hive (partial CF scans)
    Cassandra

    View Slide

  17. 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
    • Fast atomic increments (Node.js is native JSON)
    • Sharding for faster distributed increments
    • Solid ORM for Rails (MongoID)
    • Fast access for pub/sub of durable/persisted documents
    mongoDB

    View Slide

  18. 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
    • Supports hundreds of thousands transactions per
    second
    • Great caching engine
    • Supports useful variable types like sorted set
    • Pay SerDe price on each access
    Redis

    View Slide

  19. 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
    • Column Stores for ad-hoc analytics queries in SQL
    • Databases built for business intelligence
    • Heavy compression of data
    • Pre-aggregated data (Extents/Knowledge Grid)
    InfiniDB and Infobright

    View Slide

  20. 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
    • Polyglottany doesn’t only apply to data stores
    • Each language has its own benefit to each data storage layer
    • Each language has its own individual benefits
    • JSON, APIs, Performance
    Ruby, Node.js, Python

    View Slide

  21. 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
    Choice

    View Slide

  22. 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
    Cons
    • Redis - Can only utilize a single core
    • MySQL Column Store - DELETE/UPDATEs are VERY expensive
    • Cassandra - No btree indexes
    • Mongo - Queries slow down when shard count increases. Indexes must fit in memory
    • Python - Whitespace. Community
    • Ruby - Not high performance enough for our standards
    • Javascript (Node.js) - Bad for CPU or IO intensive workloads

    View Slide

  23. 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
    Tying It Together
    • Built in the cloud
    • Service Oriented Architecture (Internal API)
    • Built Helenus (Cassandra Node.js driver)
    • Data accuracy checks: visual and programmatic
    • Built framework for testing out storage engines

    View Slide

  24. 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
    Service Architecture
    Internal API
    Analytics
    Real-time

    View Slide

  25. 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
    Helenus
    • Built Node.js driver for Cassandra
    • https://github.com/simplereach/helenus
    • CQL 2/3, Composite Column, Thrift Interface
    • More about Node.js and Cassandra

    View Slide

  26. 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
    Points To Consider
    • Data consistency - Same in all data stores
    • How important is data durability?
    • Managing many servers (Chef, AWS, CSSH)
    • Managing and learning many different applications and
    tuning for them

    View Slide

  27. 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
    Summary
    • Polyglottany is not a sin
    • Know your data read/write patterns
    • Know the tools available to you
    • Know your compromises

    View Slide

  28. 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
    We’re Hiring

    View Slide

  29. Questions are guaranteed in life.
    Answers aren’t.
    Eric Lubow
    @elubow
    [email protected]
    #cassandra12
    Thank you.

    View Slide