Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Leveraging MongoDB: An introductory case study ...

mongodb
November 01, 2011

Leveraging MongoDB: An introductory case study - Sean Laurent, StudyBlue

Leveraging MongoDB: An introductory case study - Sean Laurent, Director of Operations, StudyBlue

mongodb

November 01, 2011
Tweet

More Decks by mongodb

Other Decks in Technology

Transcript

  1. StudyBlue, Inc. Overview • Who am I? • Who is

    StudyBlue? • Why MongoDB? • How did we leverage MongoDB? • What lessons did we learn? • Q&A 2
  2. StudyBlue, Inc. Who am I? • Sean Laurent • [email protected]

    • Director of Operations at StudyBlue, Inc. 3
  3. StudyBlue, Inc. • Bottom-up attempt to improve student outcomes •

    Online service for storing, studying, sharing and ultimately mastering course material • Digital backpack for students • Freemium business model About StudyBlue 5
  4. StudyBlue, Inc. Flashcard Scoring • Track flashcard scoring • Every

    single card • Every single user • Forever • Provide aggregate statistics • Flashcard deck • Folder • Overall • Focus on content mastery 8
  5. StudyBlue, Inc. The Problem • Existing PostgreSQL database • Reasonably

    large number of cards • Large number of users • Users base increasing rapidly • Shift in usage - increasing faster than users • Time on site • Decks per user • Average deck size • Study sessions per user 10
  6. StudyBlue, Inc. Additional Requirements • Support sustained rapid growth •

    Highly available • Minimize maintenance costs • Active community • Done yesterday 11
  7. StudyBlue, Inc. Alternatives • Amazon Simple DB • Far too

    simple • Cassandra • Difficult to add nodes and rebalance • Column families cannot be modified w/out restart • CouchDB • Difficult to add nodes and rebalance • Redis • No native support for sharding/partitioning • Master/slave only - no automatic failover 13
  8. StudyBlue, Inc. MongoDB for the Win • Highly available •

    Replica sets • Automatic failover • Shards • Works across replica sets • Easy to add additional shards • Node addition • Read performance degradation when adding nodes • “hidden” flag • No down time 14
  9. StudyBlue, Inc. More winning • Atomic insert & replace •

    Read balancing across slaves • BSON/JSON document model • It just works. Seriously. 15
  10. StudyBlue, Inc. DevOps • Amazon EC2 • Separate dev, test

    and production environments • Operations testing • Replication • Failover • Scripting & automation • Creation • Cloning 17
  11. StudyBlue, Inc. Development • 100% Java • Existing PostgreSQL database

    • System of record • Synchronization issues 18
  12. StudyBlue, Inc. SQL Integration & Synchronization • PostgreSQL considered system

    of record • Asynchronous event driven • Web servers queue change events • Scoring server processes events • Query PostgreSQL • Update MongoDB 19
  13. StudyBlue, Inc. MongoDB Schema • Many shallow collections vs monolithic

    deep collection • Leverage existing SQL knowledge • Simplify SQL integration 21
  14. StudyBlue, Inc. Schema Design • Two collections used together to

    map relationships • Folder containing Deck • Decks in a Folder • Decks containing a Card • Cards in a Deck • Folders arranged in tree structure, • One row per folder that points to its parent. • Multiple queries required to build tree • Postgres primary keys are used instead of object ids 22
  15. StudyBlue, Inc. Slave Reads • SlaveOk set to true for

    most data retrieval • Scoring calculations use Primary to ensure correctness 25
  16. StudyBlue, Inc. Data migration • One-time process • Postgres to

    MongoDB • Ruby scripts • Separate server 26
  17. StudyBlue, Inc. Summary • Amazon EC2/EBS • Java API •

    MapReduce • Replication • Partitioning / Shards • Performance 28
  18. StudyBlue, Inc. • Plan for failure • “When” not “if”

    • EBS performance • Inconsistent • Limited by bandwidth • 60GB minimum • RAID-0 Amazon EC2 & EBS 29
  19. StudyBlue, Inc. Java API • Not perfect • Verbose •

    Type safety • Failover requires retry • Up to 1 minute delay • Read-only requests • “slaveOk” works • Burden on developer 30
  20. StudyBlue, Inc. Map Reduce • Perfect for aggregation • Not

    used by StudyBlue • Not needed (yet) • Difficult with multiple collections • Reduce limited to masters • Keep scalability simple • Under consideration 31
  21. StudyBlue, Inc. Replication • Automated failover • Read scaling •

    Maintenance • Easy setup & configuration • “Seed” node(s) for clients 32
  22. StudyBlue, Inc. Partitioning in the Cloud • Operations perspective •

    Dynamic changes in machines • Config servers track machines • Each node in replica set knows other nodes • Avoids restarting applications when Mongo servers change • Easy scaling • Local shard servers • Config servers store redundant copies • Two-phase commit 33
  23. StudyBlue, Inc. Useful EC2 Instance Types • Config servers •

    t1.micro or m1.small Name Memory CU I/O m2.xlarge 17.1 GB 6.5 (2 cores x 3.25) medium m2.2xlarge 34.2 GB 13 (4 cores x 3.25) high m2.4xlarge 68.4 GB 26 (8 cores x 3.25) high cc1.4xlarge 23 GB 33.5 (2 x Xeon X5570) very high • Mongo replica nodes • Depends on memory needs • m2.xlarge, m2.2xlarge, m2.4xlarge or cc1.4xlarge 34
  24. StudyBlue, Inc. Performance Issues • Missing indexes • Performance terrible

    without indexes • Index on the fly • Store array sizes in collection • OR vs IN • Redundant updates • Events not consolidated 35
  25. StudyBlue, Inc. • Amazon great, but plan for failure •

    Leverage test platforms • Use replica sets & partitions early • Indexes critical • Use IN instead of OR • Java API cumbersome, but solid • Design schema carefully Key Lessons 37