MongoNYC 2012: Building a MongoDB Power Chat Server
MongoNYC 2012: Building a MongoDB Power Chat Server, Eliot Horowitz, Jared Rosoff, & Edouard Servan-Schreiber, 10gen. We will build an IRC server based on MongoDB.
• Do not need to store messages • Each message can be resent to all the clients • New clients cannot access the chat history irc1.10gen.cc / 6667 / #fun
2 Client IRC Service Message pushed onto bus Server reads message from bus Client sends a message Message forwarded to clients in room Message forwarded to clients in room irc1.10gen.cc / 6667 / #fun
set • Setting up MMS • Mongostat() • Delayed and hidden replication • Breakdowns • Secondary down • Primary down • Oplog runs out irc1.10gen.cc / 6667 / #fun
Delayed slave gives you time to recover from a human error “drop collection…” Otherwise all errors are replicated on the spot…. RS.add( “irc10:27017”, {slaveDelay: 900} ) irc1.10gen.cc / 6667 / #fun
Center Hidden Slave A hidden replica adds DR protection in case your data center crashes... RS.add( “irc11:27017”, {hidden: true} ) irc1.10gen.cc / 6667 / #fun
fit • Recovery is faster when last operation is more recent than tail of primary’s oplog Head Tail t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13 t14 t15 t16 irc1.10gen.cc / 6667 / #fun
t8 t9 t10 t11 t12 t13 t14 t15 t16 • Capped collection • Size determines how many operations fit • Recovery is faster when last operation is more recent than tail of primary’s oplog irc1.10gen.cc / 6667 / #fun
t9 t10 t11 t12 t13 t14 t15 t16 t18 • Capped collection • Size determines how many operations fit • Recovery is faster when last operation is more recent than tail of primary’s oplog irc1.10gen.cc / 6667 / #fun
t10 t11 t12 t13 t14 t15 t16 t18 t19 • Capped collection • Size determines how many operations fit • Recovery is faster when last operation is more recent than tail of primary’s oplog irc1.10gen.cc / 6667 / #fun
One Doc per Message { _id : ObjectId() , room : <roomid> , user : <userid> , time : <timestamp> , msg : “I believe third normal form is God…” } • Pros • Elegant • Documents do not grow • Inserts are simple • Cons • Querying per room or per user is poor performance irc1.10gen.cc / 6667 / #fun
Room { _id : ObjectId() , room : <roomid> , chat_history : [ { user : no3nf , time : ‘5/4/2012 9:01’, msg : “Down with 3NF!” } , { user : yes3nf , time : ‘5/4/2012 9:00’ , msg : “Long live 3NF!” } , …. ] } • Pros • Conceptually simple • Fast • Room traffic easy to isolate • Cons • Documents can grow with no bounds • 16MB hard limit • Appending to a long list takes a long time • Querying per user is non- trivial • Is that bad? Not necessarily irc1.10gen.cc / 6667 / #fun
by size • One doc per room/bucket of 100 msgs { _id : <room_id>@@@<bucket-id> , logs: [ { ts: ‘5/4/2012 9:01’, room: <room_id> msg : “<userid>@ <room_id> Down with 3NF!” } , { ts: ‘5/4/2012 9:02’, room: <room_id> msg : “<userid>@ <room_id> Long live 3NF!” } , { ts: ‘5/4/2012 9:03’, room: <room_id> msg : “<userid>@ <room_id> What am I doing here…” }, …. ] } • Message count maintained in the room object • Lookup the most recent bucket with <msg_count> div 10 • Bucket of 10 Messages • A bucket can span several days irc1.10gen.cc / 6667 / #fun
Recent room traffic easy to isolate and access • Documents have a max size modulo the variation in message length • Traffic spikes and lulls have no impact • All room traffic easy to assemble • Cons • Requires preallocation to avoid impact of growing documents • Inserting new messages is a little harder irc1.10gen.cc / 6667 / #fun
• Must rebuild indices • Copy data files from snapshot • Data and indices – faster time to recovery • Priming a replica from a backup to help catchup irc1.10gen.cc / 6667 / #fun
Operation Counters • How many requests is your database serving? Memory Utilization • How big is your dataset? • How much of it is resident in memory? • How much free memory do you have Page Faults • How often are page faults occuring • Does your data set fit in memory? Index Miss • How often are index scans hitting pages on disk? • Do your indexes fit in memory? Queue Depths • How often are clients waiting to run? • Is there lock contention? irc1.10gen.cc / 6667 / #fun
your servers. Discovers your mongo server instances & collects stats. Data reported over secure HTTPS connection to MMS Service Operated by 10gen. Nothing for you to set up other than agent installation Secure login. Supports multiple users / deployments. irc1.10gen.cc / 6667 / #fun
• User, System, IOWait, Steal • Load average & what it actually is Disk • iops • io latency • What io %util really means Memory • free –m irc1.10gen.cc / 6667 / #fun
for Message Bus • Single Node to Replica Set • Scalable Message Log • Replica Set to Sharded Cluster • Backup methods • Monitoring irc1.10gen.cc / 6667 / #fun