Shortcuts Around the Mistakes I've Made Scaling MongoDB

SHORTCUTS AROUND THE MISTAKES I’VE MADE SCALING MONGODB Theo, Chief
Architect at onsdag 21 september 11

What we do We want to revolutionize the digital advertising
industry by showing that there is more to ad analytics than click through rates. onsdag 21 september 11

Ads onsdag 21 september 11

Data onsdag 21 september 11

Assembling sessions exposure ping ping ping ping ping event event
ping session ➔ ➔ onsdag 21 september 11

Crunching session session session session session session session session session
session session session session ➔ ➔ 42 onsdag 21 september 11

Reports onsdag 21 september 11

What we do Track ads, make pretty reports. onsdag 21
september 11

That doesn’t sound so hard onsdag 21 september 11

That doesn’t sound so hard We don’t know when sessions
end onsdag 21 september 11

end There’s a lot of data onsdag 21 september 11

end There’s a lot of data It’s all done in (close to) real time onsdag 21 september 11

Numbers onsdag 21 september 11

Numbers 40 Gb data onsdag 21 september 11

Numbers 40 Gb data 50 million documents onsdag 21 september
11

Numbers 40 Gb data 50 million documents per day onsdag
21 september 11

How we use MongoDB onsdag 21 september 11

How we use MongoDB “Virtual memory” to ofﬂoad data while
we wait for sessions to ﬁnish onsdag 21 september 11

we wait for sessions to ﬁnish Short time storage (<48 hours) for batch jobs onsdag 21 september 11

we wait for sessions to ﬁnish Short time storage (<48 hours) for batch jobs Metrics storage onsdag 21 september 11

Why we use MongoDB onsdag 21 september 11

Why we use MongoDB Schemalessness makes things so much easier,
the data we collect changes as we come up with new ideas onsdag 21 september 11

the data we collect changes as we come up with new ideas Sharding makes it possible to scale writes onsdag 21 september 11

the data we collect changes as we come up with new ideas Sharding makes it possible to scale writes Secondary indexes and rich query language are great features (for the metrics store) onsdag 21 september 11

the data we collect changes as we come up with new ideas Sharding makes it possible to scale writes Secondary indexes and rich query language are great features (for the metrics store) It’s just… nice onsdag 21 september 11

Btw. onsdag 21 september 11

Btw. We use JRuby, it’s awesome onsdag 21 september 11

A story in 7 iterations onsdag 21 september 11

secondary indexes and updates 1st iteration onsdag 21 september 11

secondary indexes and updates 1st iteration One document per session,
update as new data comes along Outcome: 1000% write lock onsdag 21 september 11

#1 Everything is about working around the GLOBAL WRITE LOCK
onsdag 21 september 11

MongoDB 2.0.0 db.coll.update({_id: "xyz"}, {$inc: {x: 1}}, true) db.coll.update({_id: "abc"},
{$push: {x: “...”}}, true) onsdag 21 september 11

MongoDB 1.8.1 db.coll.update({_id: "xyz"}, {$inc: {x: 1}}, true) db.coll.update({_id: "abc"},
{$push: {x: “...”}}, true) onsdag 21 september 11

using scans for two step assembling 2nd iteration Instead of
updating, save each fragment, then scan over _id to assemble sessions onsdag 21 september 11

using scans for two step assembling 2nd iteration Outcome: not
as much lock, but still not great performance. We also realised we couldn’t remove data fast enough onsdag 21 september 11

#3 Give a lot of thought to your PRIMARY KEY

partitioning 3rd iteration onsdag 21 september 11

partitioning 3rd iteration We came up with the idea of
partitioning the data by writing to a new collection every hour onsdag 21 september 11

partitioning 3rd iteration We came up with the idea of
partitioning the data by writing to a new collection every hour Outcome: lots of complicated code, lots of bugs, but we didn’t have to care about removing data onsdag 21 september 11

#4 Make sure you can REMOVE OLD DATA onsdag 21
september 11

sharding 4th iteration onsdag 21 september 11

sharding 4th iteration To get around the global write lock
and get higher write performance we moved to a sharded cluster. onsdag 21 september 11

sharding 4th iteration To get around the global write lock
and get higher write performance we moved to a sharded cluster. Outcome: higher write performance, lots of problems, lots of ops time spent debugging onsdag 21 september 11

#6 SHARDING IS NOT A SILVER BULLET and it’s buggy,
if you can, avoid it onsdag 21 september 11

#7 IT WILL FAIL design for it onsdag 21 september
11

moving things to separate clusters 5th iteration onsdag 21 september
11

moving things to separate clusters 5th iteration We saw very
different loads on the shards and realised we had databases with very different usage patterns, some that made autosharding not work. We moved these off the cluster. onsdag 21 september 11

moving things to separate clusters 5th iteration We saw very
different loads on the shards and realised we had databases with very different usage patterns, some that made autosharding not work. We moved these off the cluster. Outcome: a more balanced and stable cluster onsdag 21 september 11

#9 ONE DATABASE with one usage pattern PER CLUSTER onsdag
21 september 11

#10 MONITOR EVERYTHING look at your health graphs daily onsdag
21 september 11

monster machines 6th iteration onsdag 21 september 11

monster machines 6th iteration We got new problems removing data
and needed some room to breathe and think onsdag 21 september 11

and needed some room to breathe and think Solution: upgraded the servers to High- Memory Quadruple Extra Large (with cheese). onsdag 21 september 11

and needed some room to breathe and think Solution: upgraded the servers to High- Memory Quadruple Extra Large (with cheese). — I onsdag 21 september 11

#11 Don’t try to scale up SCALE OUT onsdag 21
september 11

#12 When you’re out of ideas CALL THE EXPERTS onsdag
21 september 11

partitioning (again) and pre-chunking 7th iteration onsdag 21 september 11

partitioning (again) and pre-chunking 7th iteration We rewrote the database
layer to write to a new database each day, and we created all chunks in advance. We also decreased the size of our documents by a lot. onsdag 21 september 11

partitioning (again) and pre-chunking 7th iteration We rewrote the database
layer to write to a new database each day, and we created all chunks in advance. We also decreased the size of our documents by a lot. Outcome: no more problems removing data. onsdag 21 september 11

#13 Smaller objects means a smaller database, and a smaller
database means LESS RAM NEEDED onsdag 21 september 11

#14 Give a lot of thought to your PRIMARY KEY

KTHXBAI @iconara architecturalatrocities.com burtcorp.com onsdag 21 september 11

Since we got time… onsdag 21 september 11

Safe mode Tips onsdag 21 september 11

Safe mode Tips Run every Nth insert in safe mode

Safe mode Tips Run every Nth insert in safe mode
This will give you warnings when bad things happen; like failovers onsdag 21 september 11

Avoid bulk inserts Tips onsdag 21 september 11

Avoid bulk inserts Tips Very dangerous if there’s a possibility
of duplicate key errors onsdag 21 september 11

EC2 Tips onsdag 21 september 11

EC2 Tips You have three copies of your data, do
you really need EBS? onsdag 21 september 11

you really need EBS? Instance store disks are included in the price and they have predictable performance. onsdag 21 september 11

you really need EBS? Instance store disks are included in the price and they have predictable performance. m1.xlarge comes with 1.7 TB of storage. onsdag 21 september 11

Shortcuts Around the Mistakes I've Made Scaling...

Shortcuts Around the Mistakes I've Made Scaling MongoDB

More Decks by Theo Hultberg

Other Decks in Programming

Featured

Transcript