MoPub's migration from App Engine to MongoDB - Nafis Jamal, MoPub

Migrating from Google App Engine to AWS + MongoDB Naﬁs
Jamal (Co-Founder, VP Eng) MongoSV 2011

A little about ... • Mobile monetization platform ‣ iPhone
and Android applications developers use our service as a one-stop shop to manage their entire ads strategy • We’ve built for scale since day one about a year ago (and made lots of mistakes along the way :) ) ‣ Currently ~3K QPS ‣ Planning for 30K+ QPS (>1 billion ads per day) • 10 person engineering team

Challenges we face • Web serving scale (horizontal scaling) •
Low latency, high availability • Realtime statistics ‣ We can’t just serve ads we have to record many events along the way (lots of counters!) ‣ Many writes, few reads • Mobile user database

When it all began... • 3 founders hoping to bring
Angry Birds on to the platform • We needed something that would scale as we quickly brought on new customers (stair-step growth). • Zero ability to spend time on ops or sys admin • We chose Google App Engine • Python! (Guido works on this team!!) ‣ Anonymous “instances” (really singled threaded processes) with very little memory ‣ Autoscaling (pay per CPU-hour) ‣ Uber scalable, managed Datastore (built on BigTable)

Business Objects Account App AdUnit Campaign Creative Publisher Advertiser Objects
are siloed per account Ad here

What stats do we collect? Account App AdUnit Campaign Creative
Publisher Advertiser Lowest level interaction is adunit-creative Ad here

What stats do we collect? Account App AdUnit Campaign Creative
Publisher Advertiser We need to have easy access to all “cross sections” Ad here

GAE Stats Model class StatsModel(db.Model): pub = db.ReferenceProperty() # App,
AdUnit or * adv = db.ReferenceProperty() # Campaign, Creative or * acct = db.ReferenceProperty() # Account date = db.DateProperty() count = db.IntegerProperty() @classmethod def get_primary_key(cls, date, pub_id, adv_id, acct_id): """ Returns the `key_name` of the model based on pub_id, adv_id, acct_id and dt """ return "%s:%s:%s:%s" % (date, acct_id, pub_id, adv_id)

Realtime Stats in GAE • Google App Engine gave us
a good number of building blocks to build such a system ‣ Datastore - Expensive Writes - Slow writes (~1 write / second / “entity group”) - Fast reads (~3ms) ‣ Memcache ‣ TaskQueues • Cannot sum over objects or even native map/reduces ‣ All higher-level “rollups” must be pre-computed • Solution: Buffer into memcache then periodically ﬂush to the datastore

Realtime Stats in GAE API instances ... memecache Datastore taskqueue
TQ worker #1 TQ worker #n ... raw events raw events aggregation, rollups (account, time_bucket) memcache

Example Account Rovio id=account1 Angry Birds id=app1 Welcome id=adunit1 GameOver
id=adunit2 Coke id=camp1 CokeZero id=crtv1 Wings id=crtv2 RedBull id=camp2 Campaigns Creatives Apps AdUnits Account

id=adunit2 Coke id=camp1 CokeZero id=crtv1 Wings id=crtv2 RedBull id=camp2 Campaigns Creatives Apps AdUnits Account Ad here Ad there

• We create pseudo-files in memcache using a specific key
naming template • Each account-time_bucket (aka time window) has a dedicated pseudo-file • We write within each pseudo-file at a time • We flush the entire pseudo-file when syncing Pseudo-Files in Memcache

Example: time = 1 memcache 1 (adunit1, crtv1) . .
. acct1:tb0 acct1:tb0:1 "line" # <account>:<time_bucket> (adunit1, crtv1)

Example: time = 2 memecache 2 (adunit1, crtv1) (adunit1, crtv2)
. . . acct1:tb0 acct1:tb0:1 acct1:tb0:2 "line" # <account>:<time_bucket> (adunit1, crtv2) memcache

(adunit1, crtv1) . . . acct1:tb0 acct1:tb0:1 acct1:tb0:2 acct1:tb0:3 "line" # <account>:<time_bucket> (adunit1, crtv1) memcache

(adunit1, crtv1) (adunit2, crtv1) . . . acct1:tb0 acct1:tb0:1 acct1:tb0:2 acct1:tb0:3 acct1:tb0:4 "line" # <account>:<time_bucket> (adunit2, crtv1) memcache

(adunit1, crtv1) (adunit2, crtv1) (adunit2, crtv2) . . . acct1:tb0 acct1:tb0:1 acct1:tb0:2 acct1:tb0:3 acct1:tb0:4 acct1:tb0:5 "line" # <account>:<time_bucket> (adunit2, crtv2) memcache

Example: time = n memecache n (adunit1, crtv1) (adunit1, crtv2)
(adunit1, crtv1) (adunit2, crtv1) (adunit2, crtv2) . . . acct1:tb0 acct1:tb0:1 acct1:tb0:2 acct1:tb0:3 acct1:tb0:4 acct1:tb0:5 "line" # . . . (adunit2, crtv2) acct1:tb0:n <account>:<time_bucket> (adunit2, crtv2) memcache

Flush to DB # of lines memcache n (adunit1, crtv1)
. . . acct1:tb0 acct1:tb0:1 (adunit1, crtv1) (adunit1, crtv1) (adunit1, crtv1) (adunit1, crtv1) . . . (adunit1, crtv1) acct1:tb0:2 acct1:tb0:3 acct1:tb0:4 acct1:tb0:5 acct1:tb0:n (acct, tb0) taskqueue Aggregate, Rollup StatsModel(account=acct,pub=adunit1,adv=crtv1, count=5) StatsModel(account=acct,pub=adunit2,adv=crtv1, count=4) StatsModel(account=acct,pub=adunit1,adv=crtv2, count=2) StatsModel(account=acct,pub=adunit2,adv=crtv2, count=5) StatsModel(account=acct,pub=app1,adv=crtv1, count=9) StatsModel(account=acct,pub=app1,adv=crtv2, count=7) StatsModel(account=acct,pub=*,adv=*, count=16) . . . Datastore transaction read "page" with n lines from memcache

GAE Shortcomings • Transient downtime for writes • Read before
write (not ﬁre and forget) • Memcache is ﬂeeting => data loss • Objects have become too large for transactions • Writing to datastore, even if buffered is expensive! ‣ Pay for datastore CPU and application CPU ‣ Realtime stats systems accounts for 60% of our hefty GAE bill • Duct tape is starting to become threadbare

AWS + Mongo • We have control of each service
and the machines on which they run • Access to RAM! (simple pleasures) • Fire and forget increments! • Our entire machinery for recording multiple stats is a series of increment instructions sent to MongoDB (which handles all the buffering and ﬂushing)

First Attempt: Data Model One document per pub-adv per month.
Each document stores data for all days and hours within the month: class StatsDocument(mdb.Document): _id = mdb.StringField(primary_key=True) pub_id = mdb.StringField() adv_id = mdb.StringField() acct_id = mdb.StringField() date = mdb.YearMonthField(require=True) day_counts = mdb.MapField(field=int) # key is day of month @classmethod def get_primary_key(cls, date, pub_id, adv_id, acct_id): """ Returns the `key_name` of the model based on pub_id, adv_id, acct_id and dt """ return "%s:%s:%s:%s " % (date, acct_id, pub_id, adv_id)

id=adunit2 Coke id=camp1 CokeZero id=crtv1 Wings id=crtv2 RedBull id=camp2 Campaigns Creatives Apps AdUnits Account Ad here Ad there

First Attempt: Updates Document:(adunit1, crtv2) Document:(app1, crtv2) Document:(*, crtv2) Document:(adunit1,
camp2) Document:(app1, camp2) Document:(*, camp2) Document:(adunit1, *) Document:(app1, *) Document:(*, *) . . . update((adunit1,crtv2), $inc(count)), One logical increment, gets translated into 9 db increments

Initial Results • Pros: ‣ Scales with # of business
objects, don’t have to think about accounts getting too large ‣ Queries for higher level cross products (e.g. App-Campaign are precomputed and are a simple db fetch) ‣ Queries for stats for a given date range result in only reading the # of months total documents in the range • Cons: ‣ Lots of little documents ‣ Each increment is really 9 increments to 9 documents ‣ Random write access resulting in lots of cache misses and intense load on the cluster

Solution: New Data Model One document per account per day.
Each document stores all the lowest level stats. class StatsDocument(mdb.Document): _id = mdb.StringField(primary_key=True) acct_id = mdb.StringField() date = mdb.DateField(require=True) counts = mdb.MapField(field=int) # key is "<adunit>:<creative>" @classmethod def get_primary_key(cls, date, acct_id): """Returns the `_id` of the model based on acct_id and date""" return "%s:%s" % (date, acct_id)

Solution: New Data Model We created IndexDocument to store the
reverse indices to do the higher-level cross sections class StatsIndex(mdb.Document): _id = mdb.StringField(primary_key=True) # { ‘app1:camp1’:[‘adunit1:crtv1’,‘adunit2:crtv1’],... # ‘app1:camp2’:[‘adunit1:crtv2’,‘adunit2:crtv2’],...} indx = mdb.MapField(field=mdb.ListField()) @classmethod def get_primary_key(cls, acct_id): """Returns the `_id` of the model based on acct_id""" return acct_id

Solution: Updates StatsDocument:(account,date) (adunit1,crtv1) => 11 (adunit1,crtv2) => 10 (adunit2,crtv2)
=> 1 . . . update((acct1,adunit1,crtv2), $inc(count)), . . . StatsIndxDocument:(account) "app1:camp2" => $push(adunit1,crtv2) "*:camp1" => $push(adunit1,crtv2) "app1:*" => $push(adunit1,crtv2) "*:*" => $push(adunit1,crtv2) update index

Solution: Reads function get_stats(acct_id, app_id, campaign_id, date): # get and
read the index model indx_key = "%s:%s:%s" % (app_id,campaign_id) index_obj = StatsIndex.objects(_id=acct_id).\ only("indx.%s" % key).first() or StatsIndex() # lowest-level keys in the StatsModel count_keys = index_obj.indx.get(key, []) # fetch only the relevant parts of the stats model stats_model_key = StatsModel.get_primary_key(date, acct_id) stats_model = StatsModel.objects(_id=key).only(*count_keys).first() return sum([val in stats_model.counts.values()])

Results • Two machines handle all the writes (One could
do it but two for redundancy) • Mongo load is now minuscule • Each update is really two updates (StatsDocument and StatsIndex) • Each get is really two gets (StatsIndex then StatsDocument) • One document per account per day ‣ # of documents scale with the number of accounts ‣ We do the simple aggregation in python ‣ size of document scales with # of business objects within account ‣ We’ve got a lot of head room since our largest account has over 60K+ possible stat “cross-sections” as is only an object of 0.1MB (<< 64 MB)

Mongo Lessons Learned • It’s better to have fewer, larger
documents if there is a logical hierarchy than many smaller documents arranged ﬂatly, disparately. • Log everything because you will make mistakes! ‣ We require audit trails for billing, etc so we are accustomed to having this fail safe ‣ This has saved us on numerous occasions during development and we have just cut the cord to spare the rest of the system ‣ Allows us to pretty aggressively test and build new features • Buy more memory! ‣ It’s cheap! ‣ We’ve learned a few lessons the hard way but can solve a lot of challenges with $$$ by just throwing memory at the problem (in the process of doing this now)

Future Work • Mobile User store ‣ Record every interaction
of all users within our network ‣ ML: Map/Reduce jobs to glean data about our users and segment them ‣ Per-user targeting with minimum latency ‣ Relevancy => Better user experience => app developers make more money

Questions? naﬁ[email protected]

We need help! [email protected]

MoPub's migration from App Engine to MongoDB - ...

MoPub's migration from App Engine to MongoDB - Nafis Jamal, MoPub

More Decks by mongodb

Other Decks in Technology

Featured

Transcript