Upgrade to Pro — share decks privately, control downloads, hide ads and more …

1, 2, 3, 4 Add Another Data Store (And Other Rhymes)

simplereach
September 05, 2012

1, 2, 3, 4 Add Another Data Store (And Other Rhymes)

Eric Lubow's presentation from the 2012 Cassandra Summit.

simplereach

September 05, 2012
Tweet

More Decks by simplereach

Other Decks in Technology

Transcript

  1. 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow

    @elubow Overview • SimpleReach • Definitions and Data Stores • Evolution to Polyglottany • Tie It Together • Questions
  2. 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow

    @elubow Size • 100m events recorded per day and growing • 500m Pageviews per month and growing
  3. 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow

    @elubow Polyglot Persistence Polyglot Persistence, like polyglot programming, is all about choosing the right persistence option for the task at hand. http://www.sleberknight.com/blog/sleberkn/entry/polyglot_persistence
  4. 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow

    @elubow Why? • Heavier READ loads vs heavier write loads • Data relationships may be less important • Different aspects of a system have different requirements
  5. 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow

    @elubow • Large data volume ingestion • Really fast writes to many locations (eventual consistency) • Query by column groups within rows • Range queries in Hive (partial CF scans) Cassandra
  6. 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow

    @elubow • Fast atomic increments (Node.js is native JSON) • Sharding for faster distributed increments • Solid ORM for Rails (MongoID) • Fast access for pub/sub of durable/persisted documents mongoDB
  7. 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow

    @elubow • Supports hundreds of thousands transactions per second • Great caching engine • Supports useful variable types like sorted set • Pay SerDe price on each access Redis
  8. 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow

    @elubow • Column Stores for ad-hoc analytics queries in SQL • Databases built for business intelligence • Heavy compression of data • Pre-aggregated data (Extents/Knowledge Grid) InfiniDB and Infobright
  9. 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow

    @elubow • Polyglottany doesn’t only apply to data stores • Each language has its own benefit to each data storage layer • Each language has its own individual benefits • JSON, APIs, Performance Ruby, Node.js, Python
  10. 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow

    @elubow Cons • Redis - Can only utilize a single core • MySQL Column Store - DELETE/UPDATEs are VERY expensive • Cassandra - No btree indexes • Mongo - Queries slow down when shard count increases. Indexes must fit in memory • Python - Whitespace. Community • Ruby - Not high performance enough for our standards • Javascript (Node.js) - Bad for CPU or IO intensive workloads
  11. 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow

    @elubow Tying It Together • Built in the cloud • Service Oriented Architecture (Internal API) • Built Helenus (Cassandra Node.js driver) • Data accuracy checks: visual and programmatic • Built framework for testing out storage engines
  12. 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow

    @elubow Service Architecture Internal API Analytics Real-time
  13. 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow

    @elubow Helenus • Built Node.js driver for Cassandra • https://github.com/simplereach/helenus • CQL 2/3, Composite Column, Thrift Interface • More about Node.js and Cassandra
  14. 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow

    @elubow Points To Consider • Data consistency - Same in all data stores • How important is data durability? • Managing many servers (Chef, AWS, CSSH) • Managing and learning many different applications and tuning for them
  15. 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow

    @elubow Summary • Polyglottany is not a sin • Know your data read/write patterns • Know the tools available to you • Know your compromises