Upgrade to Pro — share decks privately, control downloads, hide ads and more …

1, 2, 3, 4 Add Another Data Store (And Other Rhymes)

simplereach
September 05, 2012

1, 2, 3, 4 Add Another Data Store (And Other Rhymes)

Eric Lubow's presentation from the 2012 Cassandra Summit.

simplereach

September 05, 2012
Tweet

More Decks by simplereach

Other Decks in Technology

Transcript

  1. 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow

    @elubow [email protected] #cassandra12
  2. 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow

    @elubow Overview • SimpleReach • Definitions and Data Stores • Evolution to Polyglottany • Tie It Together • Questions
  3. 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow

    @elubow Socially Intelligent
  4. 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow

    @elubow Size • 100m events recorded per day and growing • 500m Pageviews per month and growing
  5. 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow

    @elubow Polyglot Persistence Polyglot Persistence, like polyglot programming, is all about choosing the right persistence option for the task at hand. http://www.sleberknight.com/blog/sleberkn/entry/polyglot_persistence
  6. 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow

    @elubow Right Tool For The Job
  7. 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow

    @elubow Why? • Heavier READ loads vs heavier write loads • Data relationships may be less important • Different aspects of a system have different requirements
  8. 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow

    @elubow No One Size Fits All
  9. 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow

    @elubow Tools
  10. 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow

    @elubow Free vs. Cost
  11. 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow

    @elubow Languages
  12. 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow

    @elubow Pre-Scale
  13. 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow

    @elubow Scale
  14. 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow

    @elubow SimpleReach Pre-Scale
  15. 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow

    @elubow SimpleReach
  16. 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow

    @elubow • Large data volume ingestion • Really fast writes to many locations (eventual consistency) • Query by column groups within rows • Range queries in Hive (partial CF scans) Cassandra
  17. 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow

    @elubow • Fast atomic increments (Node.js is native JSON) • Sharding for faster distributed increments • Solid ORM for Rails (MongoID) • Fast access for pub/sub of durable/persisted documents mongoDB
  18. 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow

    @elubow • Supports hundreds of thousands transactions per second • Great caching engine • Supports useful variable types like sorted set • Pay SerDe price on each access Redis
  19. 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow

    @elubow • Column Stores for ad-hoc analytics queries in SQL • Databases built for business intelligence • Heavy compression of data • Pre-aggregated data (Extents/Knowledge Grid) InfiniDB and Infobright
  20. 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow

    @elubow • Polyglottany doesn’t only apply to data stores • Each language has its own benefit to each data storage layer • Each language has its own individual benefits • JSON, APIs, Performance Ruby, Node.js, Python
  21. 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow

    @elubow Choice
  22. 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow

    @elubow Cons • Redis - Can only utilize a single core • MySQL Column Store - DELETE/UPDATEs are VERY expensive • Cassandra - No btree indexes • Mongo - Queries slow down when shard count increases. Indexes must fit in memory • Python - Whitespace. Community • Ruby - Not high performance enough for our standards • Javascript (Node.js) - Bad for CPU or IO intensive workloads
  23. 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow

    @elubow Tying It Together • Built in the cloud • Service Oriented Architecture (Internal API) • Built Helenus (Cassandra Node.js driver) • Data accuracy checks: visual and programmatic • Built framework for testing out storage engines
  24. 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow

    @elubow Service Architecture Internal API Analytics Real-time
  25. 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow

    @elubow Helenus • Built Node.js driver for Cassandra • https://github.com/simplereach/helenus • CQL 2/3, Composite Column, Thrift Interface • More about Node.js and Cassandra
  26. 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow

    @elubow Points To Consider • Data consistency - Same in all data stores • How important is data durability? • Managing many servers (Chef, AWS, CSSH) • Managing and learning many different applications and tuning for them
  27. 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow

    @elubow Summary • Polyglottany is not a sin • Know your data read/write patterns • Know the tools available to you • Know your compromises
  28. 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow

    @elubow We’re Hiring
  29. Questions are guaranteed in life. Answers aren’t. Eric Lubow @elubow

    [email protected] #cassandra12 Thank you.