MongoDB Versatility: Scaling the MapMyFitness Platform - Chris Merz, Data Storage Engineer, MapMyFitness

Introduction • MapMyFITNESS founded in 2007 • Offices in Denver,
CO & Austin, TX (w/ associates in SF, Boston, New York, LA, and Chicago) • Over 6 million registered users • 30 million geo-data routes (runs, rides, walks, hikes, etc) • Core sites, mobile apps, API, white-label (MapMyRUN, MapMyRIDE, MapMyWALK, MapMyTRI, MapMyHIKE, MapMyFITNESS, MapMyRACE)

Platform Overview and Background • Origins in the LAMP stack
(Linux-Apache-MySQL-PHP) • Scaled well to ~2 million users • Redesigned in Python/Django • MySQL backend not sufficient “How to scale from 2.5 to 6 million users?”

Functional Scaling • Identify high-growth / large-data collections • Must
be able to live outside the existing relational schema • Integrate via remote resource mapping tables in the RDBMS • Functional Scaling can facilitate movement towards a Service Architecture

Use Case 1: Route Data Store • Geo-location data stored
in json blocks • MySQL → S3 → File Server → MongoDB • Initial size of ~500GB, ~18 million objects • 3 member replica set • Dedicated iron servers with 24GB RAM

Route Data Example var mmr_route_data = { id: "e4da3b7fbbce2345d7772b0674a318d5", updated_date:
"2005-07-23 15:47:31", city: "San Diego", user_id: "4", created_date: "2005-07-23 15:47:31", route_name: "balboa park", state: "CA", total_distance: "3.09", points: [ {lat: 32.7199629309, lng: -117.159318924, type: 1}, {lat: 32.7313715848, lng: -117.159404755, type: 1}, {lat: 32.7314437868, lng: -117.158031464, type: 1}, {lat: 32.7329600157, lng: -117.158074379, type: 1}, {lat: 32.7337903206, lng: -117.158589363, type: 1}, {lat: 32.7370392655, lng: -117.158589363, type: 1}, {lat: 32.7388802817, lng: -117.158074379, type: 1}, {lat: 32.7203239866, lng: -117.159147263, type: 1}, ... ] };

Solution Summary Migration Pattern: • RESTful API modified to use
Mongo PHP driver • Implemented a 'pass thru' migration function • Batch 'backfill' migrations via pass-thru • Data transform handled in PHP code

SAN storage and MongoDB • Needed to quickly expand available
disk • Implemented high-end SAN subsystem • Impressive i/o performance with MongoDB • Migration to SAN painless thanks to OpLog • Easily expandable due to the use of XFS • ~30 million routes, ~2.5TB of data

“Gotchas” a.k.a. Lessons Learned • Pay attention to potential document
size. (Utilize GridFS for larger objects) • Allocate enough RAM for indexes! (Especially important for Large data collections) • File dump backups may not scale for TB+ size datasets. (Utilize delayed and 'hidden' member for DR) • Evaluate filesystem choice carefully (hint: xfs)

Use Case 2: Django Session Store • Django sessions not
scaling in MySQL • Modified core methods to use MongoDB • Cutover of new data (Test for Mongo data, fallback to MySQL) • Migration of data via export/import (Simple python transform script using pymongo)

Capped Collections • Used for retaining a fixed amount of
data (based on data size, not number of rows) • Utilizes FIFO method for pruning collection (Especially useful for data that devalues with age) Gotcha! Explicitly create the capped collection before any data is put into the system to avoid auto-creation of collection.

Use Case 3: Athletic Live Tracking • Beta feature utilized
TT + MySQL (did not scale for large events) • Required to be “burstable” for Live Events (deployable in 'The Cloud') • Data size relatively small (compared to Routes DB) • “Live” data, no archiving required

Use Case 3: Athletic Live Tracking • RS Cloud, 3+n
MongoDB replica set • Quickly scalable via MongoDB replication • Highly optimized, indexes for every query • Low administration overhead (vs MySQL) “Gotchas” – Know your application (tune indexes and 'find()' ops accordingly) – Know your driver (python pooling driver defaults way too low)

As a DBA: Ease of Administration • Replication made elegant
(as compared with MySQL) • Ridiculously simple to add add'l members • Be sure to run InitialSync from a secondary rs.add( “host” : “livetrack_db09”, “initialSync” : { “state” : 2 } )

Use Case 4: Micro-Messaging Framework • Initial use case providing
'micro-goals' (user-defined stats aggregation) • MongoDB for persistence of aggregates • Python server + RabbitMQ (AMQP) • Implemented between Django and MySQL (service subscribes to 'interesting' stats) • Horizontally scalable “cloud” architecture

Indexing Patterns or “Know Your App” • Proper indexing critical
to performance at scale • MongoDB is ultimately flexible, being schemaless (mongo gives you enough rope to hang yourself) • Avoid un-indexed queries at all costs (no. really. quickest way to crater your app) • Onus on DevOps to match application to indexes (know your query profile, never assume) • Shoot for 'covered queries' wherever possible (answer can be obtained from indexes only)

Use Case 5: API Logging DB • MongoDB is great
for logging (especially if you log in json format!) • Good application for capped collections • Running with 'safe mode' off for speed (fire-n-forget logging can reduce latency) • Cloud servers are perfect for logging apps

Monitoring MongoDB at MMF • Monitor for real-time events (Faster
response time = less impact) • Track historical performance data trends (Useful for predictive failure analysis and scaling need projections) • Zabbix open source monitoring • Makoomi plugins for MongoDB (Query latency, total Ops, replica set health, heap memory utilization, etc) • Mongostat – realtime troubleshooting godsend

Conclusion • MongoDB is extremely versatile, and can help your
application scale, even if you don't design your app with MongoDB from the start. • MongoDB fits well into both dedicated and virtual architecture environments. • Low maintenance overhead compared to traditional RDMBS. • Provides the horizontal scaling path required for Internet Sized applications.

MongoDB Versatility: Scaling the MapMyFitness P...

MongoDB Versatility: Scaling the MapMyFitness Platform - Chris Merz, Data Storage Engineer, MapMyFitness

mongodb

More Decks by mongodb

Other Decks in Technology

Featured

Transcript

MapMyRUN.com | MapMyRIDE.com | MapMyWALK.com | MapMyTRI.com | MapMyHIKE.com |

Introduction • MapMyFITNESS founded in 2007 • Offices in Denver,

Platform Overview and Background • Origins in the LAMP stack

Functional Scaling • Identify high-growth / large-data collections • Must

Use Case 1: Route Data Store • Geo-location data stored

Route Data Example var mmr_route_data = { id: "e4da3b7fbbce2345d7772b0674a318d5", updated_date:

Solution Summary Migration Pattern: • RESTful API modified to use

SAN storage and MongoDB • Needed to quickly expand available

“Gotchas” a.k.a. Lessons Learned • Pay attention to potential document

Use Case 2: Django Session Store • Django sessions not

Capped Collections • Used for retaining a fixed amount of

Use Case 3: Athletic Live Tracking • Beta feature utilized

Use Case 3: Athletic Live Tracking • RS Cloud, 3+n

As a DBA: Ease of Administration • Replication made elegant

Use Case 4: Micro-Messaging Framework • Initial use case providing

Indexing Patterns or “Know Your App” • Proper indexing critical

Use Case 5: API Logging DB • MongoDB is great

Monitoring MongoDB at MMF • Monitor for real-time events (Faster

Conclusion • MongoDB is extremely versatile, and can help your

MapMyRUN.com | MapMyRIDE.com | MapMyWALK.com | MapMyTRI.com | MapMyHIKE.com |