Upgrade to Pro — share decks privately, control downloads, hide ads and more …

MongoDB at Foursquare: From the Cloud to Bare Metal

Jon Hoffman
December 04, 2012

MongoDB at Foursquare: From the Cloud to Bare Metal

MongoSV 2012 Talk about how and we me moved our Mongo servers from EC2 to bare metal hardware in hybrid environment.

Jon Hoffman

December 04, 2012
Tweet

More Decks by Jon Hoffman

Other Decks in Technology

Transcript

  1. Mongo at Foursquare From the cloud to bare metal MongoSV

    December 4, 2012 Jon  Hoffman   Server  Engineer   @hoffrocket
  2. Agenda •  A little bit about Foursquare •  A little

    bit of Foursquare tech history •  Why we moved mongo from Amazon EC2 to our own hardware •  Where we are now
  3. Data and Volume •  Thousands of requests/s •  Tens of

    thousands of mongo queries/s •  A single check-in request performs 50 back-end queries before returning a result
  4. Early Infrastructure History •  Winter 2008 – Fall 2009 (prototype)

    •  PHP + MySQL on Slicehost VPS •  Fall 2009 – Winter 2009 (harryh joins) •  Rewrite into Scala + PostgreSQL on EC2 •  Winter 2010 – Spring 2010 (mongo) •  Started to transition some tables from postgres to mongo. Write only. •  Flipped the switch on venue data serving from mongo in April
  5. The mongo era •  Spring 2010 – Fall 2011 (migration)

    •  Slowly rewrite DB code one table at a time •  Double write, throttle over reads •  Auto balancing in Fall 2010 •  Replica sets in May 2011 •  Fall 2011 (Migration done) •  Finally moved the most tangled table (users) •  Summer 2012 (EC2 to bare metal)
  6. AWS Server environment •  7 sharded clusters, 3 small non-sharded

    •  Replica set of 3 or 4 nodes per shard •  m2-4xlarge 68GB nodes •  Data + indexes limited to 60GB •  60 replica sets and over 200 replicas
  7. Reliability •  IO Performance required EBS •  RAID0 across 4

    volumes •  EBS is a network service •  Degrades to blocking all IO system calls •  Many times per day with 100s of servers!! •  User space programs are written with the possibility of very long blocking IO •  Mongo replica set failover doesn’t see that as a failure mode
  8. Reliability Hacks •  Created test code to simulate IO halts

    in userspace (FUSE) •  Disk monitor on every mongod node •  Periodically writes to a few sections of disk •  Touches a kill-file on timeout •  Modified mongo codebase to watch a kill-file •  Secondary removes itself from slaveOK rotation •  Primary steps itself down
  9. Costs – RAM is expensive •  EBS IO relatively slow,

    so page faults are very expensive (a few milliseconds) •  Required that most data be in RAM •  Test hardware with SSDs that allowed us to fault to disk safely with only a fraction of data covered by RAM
  10. Timing •  We’re able to predict usage and commit to

    a large capital expense •  Buying a rack at a time gets us good deals •  Amazon recently started direct connect option •  Dedicated network links at 1Gbps increments •  Low/Consistent latency to Equinix DC
  11. Migration process 1.  Hardware configuration testing 2.  Server build and

    installation 3.  VPC migration 4.  Internal Tools 5.  Replica Set migration
  12. Hardware testing •  Questions •  What’s the most cost effective

    configuration? •  multiple mongod per machine? •  Ram/Disk ratio •  Tried 4 server types in a small cabinet •  142GB, 24 cores 2.4Ghz Westmere •  8 SSDs •  4 SSDs, LSI Warpdrive •  4 SSDs, FUSIONIO
  13. No Benchmarks •  Real world query load •  On a

    few replica set combinations •  Watched existing performance graphs to assess impact
  14. Winning configuration •  192GB, 24 core, 4 180GB SSDs • 

    RAID 10 on the drives, 360 GB •  4 replicas per server •  Each shard limited to 60GB for easier maintenance operations (and disk limits) •  Resyncs complete faster •  Backups done in parallel for all shards
  15. Migration Setup •  Purchased enough capacity to handle 1.5x growth

    •  Moved most of EC2 fleet into Virtual Private Cloud
  16. Replica Set Sanity •  Software to control placement of replicas

    •  No more than one replica per shard per server •  Noticed that primary replicas eat up more resources than secondaries •  Limit of one Primary per server
  17. Replica Set Migration 1.  Added new SECONDARIES to replica set

    in DC 2.  Stepdown PRIMARY from EC2 to DC 3.  Shutdown SECONDARIES in EC2
  18. Problems that have gone away •  On slow disk we

    had to “warm” up a replica before sending it query traffic by paging all the data into memory •  No need to worry about IO halting •  Lower failure rate on machines •  EC2 machines sometimes suffered from degraded network connectivity
  19. Future of Hardware •  Reconsidering raid 10 decision •  All

    eggs in one basket •  A lot of automation work still pending around re-imaging machines •  Current plan is to address problems in batch
  20. Future Work •  Mongo 2.2.x upgrade in process •  Understanding

    capacity •  Primary stepDown resiliency •  Hot Chunks