$30 off During Our Annual Pro Sale. View Details »

Leveraging MongoDB: An introductory case study - Sean Laurent, StudyBlue

mongodb
November 01, 2011

Leveraging MongoDB: An introductory case study - Sean Laurent, StudyBlue

Leveraging MongoDB: An introductory case study - Sean Laurent, Director of Operations, StudyBlue

mongodb

November 01, 2011
Tweet

More Decks by mongodb

Other Decks in Technology

Transcript

  1. StudyBlue, Inc.
    StudyBlue
    October 18, 2011
    StudyBlue and MongoDB:
    Implementation 101
    1

    View Slide

  2. StudyBlue, Inc.
    Overview
    • Who am I?
    • Who is StudyBlue?
    • Why MongoDB?
    • How did we leverage MongoDB?
    • What lessons did we learn?
    • Q&A
    2

    View Slide

  3. StudyBlue, Inc.
    Who am I?
    • Sean Laurent
    [email protected]
    • Director of Operations at StudyBlue, Inc.
    3

    View Slide

  4. StudyBlue, Inc.
    studyblue.com
    4

    View Slide

  5. StudyBlue, Inc.
    • Bottom-up attempt to improve student
    outcomes
    • Online service for storing, studying, sharing
    and ultimately mastering course material
    • Digital backpack for students
    • Freemium business model
    About StudyBlue
    5

    View Slide

  6. StudyBlue, Inc.
    StudyBlue Usage
    • Many simultaneous users
    • Rapid growth
    • Cyclical usage
    6

    View Slide

  7. StudyBlue, Inc.
    The Challenge
    7

    View Slide

  8. StudyBlue, Inc.
    Flashcard Scoring
    • Track flashcard scoring
    • Every single card
    • Every single user
    • Forever
    • Provide aggregate statistics
    • Flashcard deck
    • Folder
    • Overall
    • Focus on content mastery
    8

    View Slide

  9. StudyBlue, Inc.
    Scoring Results
    9

    View Slide

  10. StudyBlue, Inc.
    The Problem
    • Existing PostgreSQL database
    • Reasonably large number of cards
    • Large number of users
    • Users base increasing rapidly
    • Shift in usage - increasing faster than users
    • Time on site
    • Decks per user
    • Average deck size
    • Study sessions per user
    10

    View Slide

  11. StudyBlue, Inc.
    Additional Requirements
    • Support sustained rapid growth
    • Highly available
    • Minimize maintenance costs
    • Active community
    • Done yesterday
    11

    View Slide

  12. StudyBlue, Inc.
    Why Mongo?
    12

    View Slide

  13. StudyBlue, Inc.
    Alternatives
    • Amazon Simple DB
    • Far too simple
    • Cassandra
    • Difficult to add nodes and rebalance
    • Column families cannot be modified w/out restart
    • CouchDB
    • Difficult to add nodes and rebalance
    • Redis
    • No native support for sharding/partitioning
    • Master/slave only - no automatic failover
    13

    View Slide

  14. StudyBlue, Inc.
    MongoDB for the Win
    • Highly available
    • Replica sets
    • Automatic failover
    • Shards
    • Works across replica sets
    • Easy to add additional shards
    • Node addition
    • Read performance degradation when adding nodes
    • “hidden” flag
    • No down time
    14

    View Slide

  15. StudyBlue, Inc.
    More winning
    • Atomic insert & replace
    • Read balancing across slaves
    • BSON/JSON document model
    • It just works. Seriously.
    15

    View Slide

  16. StudyBlue, Inc.
    Implementation
    16

    View Slide

  17. StudyBlue, Inc.
    DevOps
    • Amazon EC2
    • Separate dev, test and production environments
    • Operations testing
    • Replication
    • Failover
    • Scripting & automation
    • Creation
    • Cloning
    17

    View Slide

  18. StudyBlue, Inc.
    Development
    • 100% Java
    • Existing PostgreSQL
    database
    • System of record
    • Synchronization issues
    18

    View Slide

  19. StudyBlue, Inc.
    SQL Integration & Synchronization
    • PostgreSQL considered system of record
    • Asynchronous event driven
    • Web servers queue change events
    • Scoring server processes events
    • Query PostgreSQL
    • Update MongoDB
    19

    View Slide

  20. StudyBlue, Inc.
    Architecture
    20

    View Slide

  21. StudyBlue, Inc.
    MongoDB Schema
    • Many shallow collections vs monolithic deep collection
    • Leverage existing SQL knowledge
    • Simplify SQL integration
    21

    View Slide

  22. StudyBlue, Inc.
    Schema Design
    • Two collections used together to map relationships
    • Folder containing Deck
    • Decks in a Folder
    • Decks containing a Card
    • Cards in a Deck
    • Folders arranged in tree structure,
    • One row per folder that points to its parent.
    • Multiple queries required to build tree
    • Postgres primary keys are used instead of object ids
    22

    View Slide

  23. StudyBlue, Inc.
    23

    View Slide

  24. StudyBlue, Inc.
    Document Scores Example
    24

    View Slide

  25. StudyBlue, Inc.
    Slave Reads
    • SlaveOk set to true for most data retrieval
    • Scoring calculations use Primary to ensure correctness
    25

    View Slide

  26. StudyBlue, Inc.
    Data migration
    • One-time process
    • Postgres to MongoDB
    • Ruby scripts
    • Separate server
    26

    View Slide

  27. StudyBlue, Inc.
    Key Issues
    27

    View Slide

  28. StudyBlue, Inc.
    Summary
    • Amazon EC2/EBS
    • Java API
    • MapReduce
    • Replication
    • Partitioning / Shards
    • Performance
    28

    View Slide

  29. StudyBlue, Inc.
    • Plan for failure
    • “When” not “if”
    • EBS performance
    • Inconsistent
    • Limited by bandwidth
    • 60GB minimum
    • RAID-0
    Amazon EC2 & EBS
    29

    View Slide

  30. StudyBlue, Inc.
    Java API
    • Not perfect
    • Verbose
    • Type safety
    • Failover requires retry
    • Up to 1 minute delay
    • Read-only requests
    • “slaveOk” works
    • Burden on developer
    30

    View Slide

  31. StudyBlue, Inc.
    Map Reduce
    • Perfect for aggregation
    • Not used by StudyBlue
    • Not needed (yet)
    • Difficult with multiple collections
    • Reduce limited to masters
    • Keep scalability simple
    • Under consideration
    31

    View Slide

  32. StudyBlue, Inc.
    Replication
    • Automated failover
    • Read scaling
    • Maintenance
    • Easy setup & configuration
    • “Seed” node(s) for clients
    32

    View Slide

  33. StudyBlue, Inc.
    Partitioning in the Cloud
    • Operations perspective
    • Dynamic changes in machines
    • Config servers track machines
    • Each node in replica set knows other nodes
    • Avoids restarting applications when Mongo servers change
    • Easy scaling
    • Local shard servers
    • Config servers store redundant copies
    • Two-phase commit
    33

    View Slide

  34. StudyBlue, Inc.
    Useful EC2 Instance Types
    • Config servers
    • t1.micro or m1.small
    Name Memory CU I/O
    m2.xlarge 17.1 GB 6.5 (2 cores x 3.25) medium
    m2.2xlarge 34.2 GB 13 (4 cores x 3.25) high
    m2.4xlarge 68.4 GB 26 (8 cores x 3.25) high
    cc1.4xlarge 23 GB 33.5 (2 x Xeon X5570) very high
    • Mongo replica nodes
    • Depends on memory needs
    • m2.xlarge, m2.2xlarge, m2.4xlarge or
    cc1.4xlarge
    34

    View Slide

  35. StudyBlue, Inc.
    Performance Issues
    • Missing indexes
    • Performance terrible without indexes
    • Index on the fly
    • Store array sizes in collection
    • OR vs IN
    • Redundant updates
    • Events not consolidated
    35

    View Slide

  36. StudyBlue, Inc.
    Lessons Learned
    36

    View Slide

  37. StudyBlue, Inc.
    • Amazon great, but plan for failure
    • Leverage test platforms
    • Use replica sets & partitions early
    • Indexes critical
    • Use IN instead of OR
    • Java API cumbersome, but solid
    • Design schema carefully
    Key Lessons
    37

    View Slide

  38. StudyBlue, Inc.
    Q & A
    38

    View Slide

  39. StudyBlue, Inc.
    Contact us
    Web: http://www.studyblue.com
    Twitter: @StudyBlue
    Email: [email protected]
    39

    View Slide