MongoDB at Foursquare: From the Cloud to Bare Metal

Mongo at Foursquare From the cloud to bare metal MongoSV
December 4, 2012 Jon Hoﬀman Server Engineer @hoﬀrocket

Agenda •  A little bit about Foursquare •  A little
bit of Foursquare tech history •  Why we moved mongo from Amazon EC2 to our own hardware •  Where we are now

Where do my friends go? Where should I go? Social
and Algorithmic Discovery

Growth 25,000,000+ people 3,000,000,000+ check-ins 40,000,000+ places

Data and Volume •  Thousands of requests/s •  Tens of
thousands of mongo queries/s •  A single check-in request performs 50 back-end queries before returning a result

Early Infrastructure History •  Winter 2008 – Fall 2009 (prototype)
•  PHP + MySQL on Slicehost VPS •  Fall 2009 – Winter 2009 (harryh joins) •  Rewrite into Scala + PostgreSQL on EC2 •  Winter 2010 – Spring 2010 (mongo) •  Started to transition some tables from postgres to mongo. Write only. •  Flipped the switch on venue data serving from mongo in April

The mongo era •  Spring 2010 – Fall 2011 (migration)
•  Slowly rewrite DB code one table at a time •  Double write, throttle over reads •  Auto balancing in Fall 2010 •  Replica sets in May 2011 •  Fall 2011 (Migration done) •  Finally moved the most tangled table (users) •  Summer 2012 (EC2 to bare metal)

Why move mongo from EC2? • Reliability • Cost • Timing

AWS Server environment •  7 sharded clusters, 3 small non-sharded
•  Replica set of 3 or 4 nodes per shard •  m2-4xlarge 68GB nodes •  Data + indexes limited to 60GB •  60 replica sets and over 200 replicas

Reliability •  IO Performance required EBS •  RAID0 across 4
volumes •  EBS is a network service •  Degrades to blocking all IO system calls •  Many times per day with 100s of servers!! •  User space programs are written with the possibility of very long blocking IO •  Mongo replica set failover doesn’t see that as a failure mode

Reliability Hacks •  Created test code to simulate IO halts
in userspace (FUSE) •  Disk monitor on every mongod node •  Periodically writes to a few sections of disk •  Touches a kill-file on timeout •  Modified mongo codebase to watch a kill-file •  Secondary removes itself from slaveOK rotation •  Primary steps itself down

Costs – RAM is expensive •  EBS IO relatively slow,
so page faults are very expensive (a few milliseconds) •  Required that most data be in RAM •  Test hardware with SSDs that allowed us to fault to disk safely with only a fraction of data covered by RAM

Costs – Flexibility comes at a premium

Timing •  We’re able to predict usage and commit to
a large capital expense •  Buying a rack at a time gets us good deals •  Amazon recently started direct connect option •  Dedicated network links at 1Gbps increments •  Low/Consistent latency to Equinix DC

Migration process 1.  Hardware configuration testing 2.  Server build and
installation 3.  VPC migration 4.  Internal Tools 5.  Replica Set migration

Hardware testing •  Questions •  What’s the most cost effective
configuration? •  multiple mongod per machine? •  Ram/Disk ratio •  Tried 4 server types in a small cabinet •  142GB, 24 cores 2.4Ghz Westmere •  8 SSDs •  4 SSDs, LSI Warpdrive •  4 SSDs, FUSIONIO

No Benchmarks •  Real world query load •  On a
few replica set combinations •  Watched existing performance graphs to assess impact

Winning configuration •  192GB, 24 core, 4 180GB SSDs • 
RAID 10 on the drives, 360 GB •  4 replicas per server •  Each shard limited to 60GB for easier maintenance operations (and disk limits) •  Resyncs complete faster •  Backups done in parallel for all shards

Migration Setup •  Purchased enough capacity to handle 1.5x growth
•  Moved most of EC2 fleet into Virtual Private Cloud

Replica Set Sanity •  Software to control placement of replicas
•  No more than one replica per shard per server •  Noticed that primary replicas eat up more resources than secondaries •  Limit of one Primary per server

Replica Set Migration 1.  Added new SECONDARIES to replica set
in DC 2.  Stepdown PRIMARY from EC2 to DC 3.  Shutdown SECONDARIES in EC2

Problems that have gone away •  On slow disk we
had to “warm” up a replica before sending it query traffic by paging all the data into memory •  No need to worry about IO halting •  Lower failure rate on machines •  EC2 machines sometimes suffered from degraded network connectivity

Future of Hardware •  Reconsidering raid 10 decision •  All
eggs in one basket •  A lot of automation work still pending around re-imaging machines •  Current plan is to address problems in batch

Future Work •  Mongo 2.2.x upgrade in process •  Understanding
capacity •  Primary stepDown resiliency •  Hot Chunks

Questions?" " [email protected]" " foursquare.com/jobs

MongoDB at Foursquare: From the Cloud to Bare M...

MongoDB at Foursquare: From the Cloud to Bare Metal

Jon Hoffman

More Decks by Jon Hoffman

Other Decks in Technology

Featured

Transcript

Mongo at Foursquare From the cloud to bare metal MongoSV

Agenda •  A little bit about Foursquare •  A little

Where do my friends go? Where should I go? Social

Growth 25,000,000+ people 3,000,000,000+ check-ins 40,000,000+ places

Data and Volume •  Thousands of requests/s •  Tens of

Early Infrastructure History •  Winter 2008 – Fall 2009 (prototype)

The mongo era •  Spring 2010 – Fall 2011 (migration)

Why move mongo from EC2? • Reliability • Cost • Timing

AWS Server environment •  7 sharded clusters, 3 small non-sharded

Reliability •  IO Performance required EBS •  RAID0 across 4

Reliability Hacks •  Created test code to simulate IO halts

Costs – RAM is expensive •  EBS IO relatively slow,

Costs – Flexibility comes at a premium

Timing •  We’re able to predict usage and commit to

Migration process 1.  Hardware configuration testing 2.  Server build and

Hardware testing •  Questions •  What’s the most cost effective

No Benchmarks •  Real world query load •  On a

Winning configuration •  192GB, 24 core, 4 180GB SSDs •

Migration Setup •  Purchased enough capacity to handle 1.5x growth

Replica Set Sanity •  Software to control placement of replicas

Replica Set Migration 1.  Added new SECONDARIES to replica set

Problems that have gone away •  On slow disk we

Future of Hardware •  Reconsidering raid 10 decision •  All

Future Work •  Mongo 2.2.x upgrade in process •  Understanding

Questions?" " [email protected]" " foursquare.com/jobs