Upgrade to Pro — share decks privately, control downloads, hide ads and more …

MongoDB at Foursquare: From the Cloud to Bare Metal

2f2d2e2cfcf666877c52702ffdcf5e32?s=47 Jon Hoffman
December 04, 2012

MongoDB at Foursquare: From the Cloud to Bare Metal

MongoSV 2012 Talk about how and we me moved our Mongo servers from EC2 to bare metal hardware in hybrid environment.


Jon Hoffman

December 04, 2012


  1. Mongo at Foursquare From the cloud to bare metal MongoSV

    December 4, 2012 Jon  Hoffman   Server  Engineer   @hoffrocket
  2. Agenda •  A little bit about Foursquare •  A little

    bit of Foursquare tech history •  Why we moved mongo from Amazon EC2 to our own hardware •  Where we are now
  3. Where do my friends go? Where should I go? Social

    and Algorithmic Discovery
  4. Growth 25,000,000+ people 3,000,000,000+ check-ins 40,000,000+ places

  5. Data and Volume •  Thousands of requests/s •  Tens of

    thousands of mongo queries/s •  A single check-in request performs 50 back-end queries before returning a result
  6. Early Infrastructure History •  Winter 2008 – Fall 2009 (prototype)

    •  PHP + MySQL on Slicehost VPS •  Fall 2009 – Winter 2009 (harryh joins) •  Rewrite into Scala + PostgreSQL on EC2 •  Winter 2010 – Spring 2010 (mongo) •  Started to transition some tables from postgres to mongo. Write only. •  Flipped the switch on venue data serving from mongo in April
  7. The mongo era •  Spring 2010 – Fall 2011 (migration)

    •  Slowly rewrite DB code one table at a time •  Double write, throttle over reads •  Auto balancing in Fall 2010 •  Replica sets in May 2011 •  Fall 2011 (Migration done) •  Finally moved the most tangled table (users) •  Summer 2012 (EC2 to bare metal)
  8. Why move mongo from EC2? • Reliability • Cost • Timing

  9. AWS Server environment •  7 sharded clusters, 3 small non-sharded

    •  Replica set of 3 or 4 nodes per shard •  m2-4xlarge 68GB nodes •  Data + indexes limited to 60GB •  60 replica sets and over 200 replicas
  10. Reliability •  IO Performance required EBS •  RAID0 across 4

    volumes •  EBS is a network service •  Degrades to blocking all IO system calls •  Many times per day with 100s of servers!! •  User space programs are written with the possibility of very long blocking IO •  Mongo replica set failover doesn’t see that as a failure mode
  11. Reliability Hacks •  Created test code to simulate IO halts

    in userspace (FUSE) •  Disk monitor on every mongod node •  Periodically writes to a few sections of disk •  Touches a kill-file on timeout •  Modified mongo codebase to watch a kill-file •  Secondary removes itself from slaveOK rotation •  Primary steps itself down
  12. Costs – RAM is expensive •  EBS IO relatively slow,

    so page faults are very expensive (a few milliseconds) •  Required that most data be in RAM •  Test hardware with SSDs that allowed us to fault to disk safely with only a fraction of data covered by RAM
  13. Costs – Flexibility comes at a premium

  14. Timing •  We’re able to predict usage and commit to

    a large capital expense •  Buying a rack at a time gets us good deals •  Amazon recently started direct connect option •  Dedicated network links at 1Gbps increments •  Low/Consistent latency to Equinix DC
  15. Migration process 1.  Hardware configuration testing 2.  Server build and

    installation 3.  VPC migration 4.  Internal Tools 5.  Replica Set migration
  16. Hardware testing •  Questions •  What’s the most cost effective

    configuration? •  multiple mongod per machine? •  Ram/Disk ratio •  Tried 4 server types in a small cabinet •  142GB, 24 cores 2.4Ghz Westmere •  8 SSDs •  4 SSDs, LSI Warpdrive •  4 SSDs, FUSIONIO
  17. No Benchmarks •  Real world query load •  On a

    few replica set combinations •  Watched existing performance graphs to assess impact
  18. Winning configuration •  192GB, 24 core, 4 180GB SSDs • 

    RAID 10 on the drives, 360 GB •  4 replicas per server •  Each shard limited to 60GB for easier maintenance operations (and disk limits) •  Resyncs complete faster •  Backups done in parallel for all shards
  19. Migration Setup •  Purchased enough capacity to handle 1.5x growth

    •  Moved most of EC2 fleet into Virtual Private Cloud
  20. Replica Set Sanity •  Software to control placement of replicas

    •  No more than one replica per shard per server •  Noticed that primary replicas eat up more resources than secondaries •  Limit of one Primary per server
  21. Replica Set Migration 1.  Added new SECONDARIES to replica set

    in DC 2.  Stepdown PRIMARY from EC2 to DC 3.  Shutdown SECONDARIES in EC2
  22. Problems that have gone away •  On slow disk we

    had to “warm” up a replica before sending it query traffic by paging all the data into memory •  No need to worry about IO halting •  Lower failure rate on machines •  EC2 machines sometimes suffered from degraded network connectivity
  23. Future of Hardware •  Reconsidering raid 10 decision •  All

    eggs in one basket •  A lot of automation work still pending around re-imaging machines •  Current plan is to address problems in batch
  24. Future Work •  Mongo 2.2.x upgrade in process •  Understanding

    capacity •  Primary stepDown resiliency •  Hot Chunks
  25. None
  26. Questions?" " jon@foursquare.com" " foursquare.com/jobs