Slide 1

Slide 1 text

Crowd Sourcing with MongoDB How BillGuard Does Mongo!

Slide 2

Slide 2 text

Us Aviv Ben-Yosef @avivby David Brailovsky @davidbrai

Slide 3

Slide 3 text

BillGuard

Slide 4

Slide 4 text

People Powered Anti Virus for Bills

Slide 5

Slide 5 text

A lot of data

Slide 6

Slide 6 text

Crowd source knowledge between users

Slide 7

Slide 7 text

How we started

Slide 8

Slide 8 text

Everything in MySQL

Slide 9

Slide 9 text

1,000s of Transactions Per User

Slide 10

Slide 10 text

Crowd Sourcing for n00bs

Slide 11

Slide 11 text

No content

Slide 12

Slide 12 text

Recalculate in memory when server boots Works as long as you have one server and little data

Slide 13

Slide 13 text

Time to upgrade Boot time became problematic as data grew Needed to sync across multiple servers

Slide 14

Slide 14 text

Y U NOSQL?!

Slide 15

Slide 15 text

Cluster data requires different sharding from user data

Slide 16

Slide 16 text

Our models are dynamic and sparse

Slide 17

Slide 17 text

Why Mongo?

Slide 18

Slide 18 text

Document Oriented = Easiest paradigm shift + Felt natural

Slide 19

Slide 19 text

Sounds cool!

Slide 20

Slide 20 text

Our CEO loooves saying “Mongo DB”

Slide 21

Slide 21 text

How we made the move

Slide 22

Slide 22 text

What we use • Java: Morphia • Python: pymongo

Slide 23

Slide 23 text

MySQL -> Mongo Foreign Key

Slide 24

Slide 24 text

We were cautious You have to gain our trust

Slide 25

Slide 25 text

Aggregated recomputable data Not mission critical

Slide 26

Slide 26 text

2 hours later Mongo

Slide 27

Slide 27 text

1 month later Time for the big guns We made it “Highly Available”

Slide 28

Slide 28 text

Replica Set 2 regular instances 1 arbiter

Slide 29

Slide 29 text

A breeze to configure config = { "_id" : "bg", "members" : [ { "_id" : 0, "host" : "$MONGO_SERVER_A:27017" }, { "_id" : 1, "host" : "$MONGO_SERVER_B:27017" }, { "_id" : 2, "host" : "$MONGO_ARBITER:27017", "arbiterOnly": true } ] } rs.initiate(config); Now configure a master-master in MySQL...

Slide 30

Slide 30 text

Automatic promotions upon failure = Better sleep at night!

Slide 31

Slide 31 text

More big guns • Backing up the EBSs it sits on • Mongo Monitoring System (MMS)

Slide 32

Slide 32 text

Mongo Monitoring System nohup  python  agent.py  >   /LOG_DIRECTORY/agent.log  2>&1  &

Slide 33

Slide 33 text

Why we <3 Mongo

Slide 34

Slide 34 text

No Migrations • No downtime • Never worry about backwards compatibility

Slide 35

Slide 35 text

What about transactions?

Slide 36

Slide 36 text

Upserts and atomic • No need to lock • No need to rollback • Just write atomically!

Slide 37

Slide 37 text

Atomic use case - clusters • 2 users register with transactions from same merchant • no problem!

Slide 38

Slide 38 text

>  db.merchants.update(        {name:  "yamit  2000"},        {$inc:  {users:  1}},        true) >  db.merchants.find() {  "name"  :  "yamit  2000",  "users"  :  1  } >  db.merchants.update(        {name:  "yamit  2000"},        {$inc:  {users:  1}},        true) >  db.merchants.find() {  "name"  :  "yamit  2000",  "users"  :  2  }

Slide 39

Slide 39 text

Easily Indexable class MyEntity { @Id private ObjectId id; @Indexed private String widget; // ... } class MyEntityDao { public MyEntityDao() { // ... datastore.ensureIndexes(); } }

Slide 40

Slide 40 text

Optimal for sparse structures Example: Enrichment of Merchants

Slide 41

Slide 41 text

...and lotsa stuff • sweet queries: regex, awesome operators ($in, $inc, …) • map reduce (but no group by) • ids have timestamps and unique across servers

Slide 42

Slide 42 text

Pitfalls & Downsides

Slide 43

Slide 43 text

Write Concerns “Read what you write”

Slide 44

Slide 44 text

Data Analysts LOVE SQL

Slide 45

Slide 45 text

No good clients

Slide 46

Slide 46 text

User management is a pain

Slide 47

Slide 47 text

Questions?