Upgrade to Pro — share decks privately, control downloads, hide ads and more …

MongoDB Versatility: Scaling the MapMyFitness Platform - Chris Merz, Data Storage Engineer, MapMyFitness

mongodb
February 02, 2012

MongoDB Versatility: Scaling the MapMyFitness Platform - Chris Merz, Data Storage Engineer, MapMyFitness

MongoDB Boulder 2012

The MMF user base more than doubled in 2011, beginning an era of rapid data growth. With Big Data come Big Data Headaches. The traditional MySQL solution for our suite of web applications had hit it's ceiling. MongoDB was chosen as the candidate for exploration into NoSQL implementations, and now serves as our go-to data store for rapid application deployment. This talk will detail several of the MongoDB use cases at MMF, from serving 2TB+ of geolocation data, to time-series data for live tracking, to user sessions, app logging, and beyond. Topics will include migration patterns, indexing practices, backend storage choices, and application access patterns, monitoring, and more.

mongodb

February 02, 2012
Tweet

More Decks by mongodb

Other Decks in Technology

Transcript

  1. MapMyRUN.com | MapMyRIDE.com | MapMyWALK.com | MapMyTRI.com | MapMyHIKE.com |

    MapMyMOUNTAIN.com | MapMyFITNESS.com MongoDB Versatility Scaling the MapMyFITNESS platform Feb 1, 2012
  2. Introduction • MapMyFITNESS founded in 2007 • Offices in Denver,

    CO & Austin, TX (w/ associates in SF, Boston, New York, LA, and Chicago) • Over 6 million registered users • 30 million geo-data routes (runs, rides, walks, hikes, etc) • Core sites, mobile apps, API, white-label (MapMyRUN, MapMyRIDE, MapMyWALK, MapMyTRI, MapMyHIKE, MapMyFITNESS, MapMyRACE)
  3. Platform Overview and Background • Origins in the LAMP stack

    (Linux-Apache-MySQL-PHP) • Scaled well to ~2 million users • Redesigned in Python/Django • MySQL backend not sufficient “How to scale from 2.5 to 6 million users?”
  4. Functional Scaling • Identify high-growth / large-data collections • Must

    be able to live outside the existing relational schema • Integrate via remote resource mapping tables in the RDBMS • Functional Scaling can facilitate movement towards a Service Architecture
  5. Use Case 1: Route Data Store • Geo-location data stored

    in json blocks • MySQL → S3 → File Server → MongoDB • Initial size of ~500GB, ~18 million objects • 3 member replica set • Dedicated iron servers with 24GB RAM
  6. Route Data Example var mmr_route_data = { id: "e4da3b7fbbce2345d7772b0674a318d5", updated_date:

    "2005-07-23 15:47:31", city: "San Diego", user_id: "4", created_date: "2005-07-23 15:47:31", route_name: "balboa park", state: "CA", total_distance: "3.09", points: [ {lat: 32.7199629309, lng: -117.159318924, type: 1}, {lat: 32.7313715848, lng: -117.159404755, type: 1}, {lat: 32.7314437868, lng: -117.158031464, type: 1}, {lat: 32.7329600157, lng: -117.158074379, type: 1}, {lat: 32.7337903206, lng: -117.158589363, type: 1}, {lat: 32.7370392655, lng: -117.158589363, type: 1}, {lat: 32.7388802817, lng: -117.158074379, type: 1}, {lat: 32.7203239866, lng: -117.159147263, type: 1}, ... ] };
  7. Solution Summary Migration Pattern: • RESTful API modified to use

    Mongo PHP driver • Implemented a 'pass thru' migration function • Batch 'backfill' migrations via pass-thru • Data transform handled in PHP code
  8. SAN storage and MongoDB • Needed to quickly expand available

    disk • Implemented high-end SAN subsystem • Impressive i/o performance with MongoDB • Migration to SAN painless thanks to OpLog • Easily expandable due to the use of XFS • ~30 million routes, ~2.5TB of data
  9. “Gotchas” a.k.a. Lessons Learned • Pay attention to potential document

    size. (Utilize GridFS for larger objects) • Allocate enough RAM for indexes! (Especially important for Large data collections) • File dump backups may not scale for TB+ size datasets. (Utilize delayed and 'hidden' member for DR) • Evaluate filesystem choice carefully (hint: xfs)
  10. Use Case 2: Django Session Store • Django sessions not

    scaling in MySQL • Modified core methods to use MongoDB • Cutover of new data (Test for Mongo data, fallback to MySQL) • Migration of data via export/import (Simple python transform script using pymongo)
  11. Capped Collections • Used for retaining a fixed amount of

    data (based on data size, not number of rows) • Utilizes FIFO method for pruning collection (Especially useful for data that devalues with age) Gotcha! Explicitly create the capped collection before any data is put into the system to avoid auto-creation of collection.
  12. Use Case 3: Athletic Live Tracking • Beta feature utilized

    TT + MySQL (did not scale for large events) • Required to be “burstable” for Live Events (deployable in 'The Cloud') • Data size relatively small (compared to Routes DB) • “Live” data, no archiving required
  13. Use Case 3: Athletic Live Tracking • RS Cloud, 3+n

    MongoDB replica set • Quickly scalable via MongoDB replication • Highly optimized, indexes for every query • Low administration overhead (vs MySQL) “Gotchas” – Know your application (tune indexes and 'find()' ops accordingly) – Know your driver (python pooling driver defaults way too low)
  14. As a DBA: Ease of Administration • Replication made elegant

    (as compared with MySQL) • Ridiculously simple to add add'l members • Be sure to run InitialSync from a secondary rs.add( “host” : “livetrack_db09”, “initialSync” : { “state” : 2 } )
  15. Use Case 4: Micro-Messaging Framework • Initial use case providing

    'micro-goals' (user-defined stats aggregation) • MongoDB for persistence of aggregates • Python server + RabbitMQ (AMQP) • Implemented between Django and MySQL (service subscribes to 'interesting' stats) • Horizontally scalable “cloud” architecture
  16. Indexing Patterns or “Know Your App” • Proper indexing critical

    to performance at scale • MongoDB is ultimately flexible, being schemaless (mongo gives you enough rope to hang yourself) • Avoid un-indexed queries at all costs (no. really. quickest way to crater your app) • Onus on DevOps to match application to indexes (know your query profile, never assume) • Shoot for 'covered queries' wherever possible (answer can be obtained from indexes only)
  17. Use Case 5: API Logging DB • MongoDB is great

    for logging (especially if you log in json format!) • Good application for capped collections • Running with 'safe mode' off for speed (fire-n-forget logging can reduce latency) • Cloud servers are perfect for logging apps
  18. Monitoring MongoDB at MMF • Monitor for real-time events (Faster

    response time = less impact) • Track historical performance data trends (Useful for predictive failure analysis and scaling need projections) • Zabbix open source monitoring • Makoomi plugins for MongoDB (Query latency, total Ops, replica set health, heap memory utilization, etc) • Mongostat – realtime troubleshooting godsend
  19. Conclusion • MongoDB is extremely versatile, and can help your

    application scale, even if you don't design your app with MongoDB from the start. • MongoDB fits well into both dedicated and virtual architecture environments. • Low maintenance overhead compared to traditional RDMBS. • Provides the horizontal scaling path required for Internet Sized applications.
  20. MapMyRUN.com | MapMyRIDE.com | MapMyWALK.com | MapMyTRI.com | MapMyHIKE.com |

    MapMyMOUNTAIN.com | MapMyFITNESS.com We're Hiring! http://www.mapmyfitness.com/careers