Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Mongo Boulder 2011: Data Modeling for Performance

Michael Dwan
January 09, 2011
41

Mongo Boulder 2011: Data Modeling for Performance

Michael Dwan

January 09, 2011
Tweet

Transcript

  1. application spec • find business details (web + api) •

    search by category/keyword + geo (web + api) • update (api)
  2. why is this interesting? 15,000,000 businesses 30,000 partners 100,000 geo

    areas 2,300 categories 2,000,000 requests daily 24,000,000 urls in sitemap 100,000,000 tags
  3. a business... { "_id" : ObjectId("4ce838ef4a882579960001b9"), "name" : "Acme Glass

    Co", "tagline" : "Your trusty glass hole", "description" : "Glass repair...", "hours" : "Mon Fri 8 5", "url" : "http://acmeglasshole.biz", }
  4. a business... has many phone numbers {
 "_id" : ObjectId("4ce838ef4a882579960001b9"),


    "name" : "Acme Glass Co",
 "tagline" : "Your trusty glass hole",
 "description" : "Glass repair...",
 "hours" : "Mon Fri 8 5",
 "url" : "http://acmeglasshole.biz",
 }

  5. a business... has many phone numbers {
 "_id" : ObjectId("4ce838ef4a882579960001b9"),


    "name" : "Acme Glass Co",
 "tagline" : "Your trusty glass hole",
 "description" : "Glass repair...",
 "hours" : "Mon Fri 8 5",
 "url" : "http://acmeglasshole.biz",
 "phone_numbers" : [
 "5035550091",
 "8005555456"
 ]
 }

  6. a business... has coordinates "_id" : ObjectId("4ce838ef4a882579960001b9"),
 "name" : "Acme

    Glass Co",
 "tagline" : "Your trusty glass hole",
 "description" : "Glass repair...",
 "hours" : "Mon Fri 8 5",
 "url" : "http://acmeglasshole.biz",
 "phone_numbers" : [
 "5035550091",
 "8005555456"
 ]
 }

  7. a business... has coordinates "_id" : ObjectId("4ce838ef4a882579960001b9"),
 "name" : "Acme

    Glass Co",
 "tagline" : "Your trusty glass hole",
 "description" : "Glass repair...",
 "hours" : "Mon Fri 8 5",
 "url" : "http://acmeglasshole.biz",
 "phone_numbers" : [
 "5035550091",
 "8005555456"
 ],
 "coordinates" : [
 45.559294,
 -122.644053
 ]
 }

  8. a business... has many tags "url" : "http://acmeglasshole.biz",
 "phone_numbers" :

    [
 "5035550091",
 "8005555456"
 ],
 "coordinates" : [
 45.559294,
 -122.644053
 ]
 }

  9. a business... has many tags "url" : "http://acmeglasshole.biz",
 "phone_numbers" :

    [
 "5035550091",
 "8005555456"
 ],
 "coordinates" : [
 45.559294,
 -122.644053
 ],
 "tags" : [
 "glass",
 "mirrors",
 "flat glass"
 ]
 }

  10. a business... has an address "coordinates" : [
 45.559294,
 -122.644053


    ],
 "tags" : [
 "glass",
 "mirrors",
 "flat glass"
 ]
 }

  11. a business... has an address "coordinates" : [
 45.559294,
 -122.644053


    ],
 "tags" : [
 "glass",
 "mirrors",
 "flat glass"
 ],
 "location" : {
 "street_address" : "2035 NE Alberta St"
 }
 }
 
 

  12. a business... belongs to a state "coordinates" : [
 45.559294,


    -122.644053
 ],
 "tags" : [
 "glass",
 "mirrors",
 "flat glass"
 ],
 "location" : {
 "street_address" : "2035 NE Alberta St"
 }
 }
 
 

  13. a business... belongs to a state "glass",
 "mirrors",
 "flat glass"


    ],
 "location" : {
 "street_address" : "2035 NE Alberta St"
 }
 }
 
 

  14. a business... belongs to a state "glass",
 "mirrors",
 "flat glass"


    ],
 "location" : {
 "street_address" : "2035 NE Alberta St",
 "state" : {
 "_id" : ObjectId("4ce829379615522479000026"),
 "meta" : {
 "slug" : "or"
 },
 "display_name" : "Oregon"
 }
 }
 }

  15. a business... belongs to a city "state" : {
 "_id"

    : ObjectId("4ce829379615522479000026"),
 "meta" : {
 "slug" : "or"
 },
 "display_name" : "Oregon"
 }
 }
 }

  16. a business... belongs to a city "state" : {
 "_id"

    : ObjectId("4ce829379615522479000026"),
 "meta" : {
 "slug" : "or"
 },
 "display_name" : "Oregon"
 },
 "city" : {
 "_id" : ObjectId("4ce82abdd3dfaa10f8006faa"),
 "meta" : {
 "slug" : "portland",
 },
 "display_name" : "Portland, OR"
 },
 }
 }

  17. a business... belongs to a zip code },
 "display_name" :

    "Oregon"
 },
 "city" : {
 "_id" : ObjectId("4ce82abdd3dfaa10f8006faa"),
 "meta" : {
 "slug" : "portland",
 },
 "display_name" : "Portland, OR"
 },
 }
 }

  18. a business... belongs to a zip code },
 "display_name" :

    "Oregon"
 },
 "city" : {
 "_id" : ObjectId("4ce82abdd3dfaa10f8006faa"),
 "meta" : {
 "slug" : "portland",
 },
 "display_name" : "Portland, OR"
 },
 "zip" : {
 "_id" : ObjectId("4ce82c29d3dfaa116b006dfa"),
 "display_name" : "97211"
 }
 }
 }

  19. a category {
 "_id" : ObjectId("4ce82e64d3dfaa16360014eb"),
 "name" : "Auto Glass",


    "slug" : "3063-auto-glass",
 "tags" : [
 "windshields"
 ],
 ...
 }

  20. a business... belongs to a zip code "meta" : {


    "slug" : "or"
 },
 "display_name" : "Oregon"
 },
 "city" : {
 "_id" : ObjectId("4ce82abdd3dfaa10f8006faa"),
 "meta" : {
 "slug" : "portland",
 },
 "display_name" : "Portland, OR"
 },
 "zip" : {
 "_id" : ObjectId("4ce82c29d3dfaa116b006dfa"),
 "display_name" : "97211"
 }
 }
 }

  21. a business... belongs to many categories }
 },
 "categories" :

    [
 {
 "_id" : ObjectId("4ce82e50d3dfaa16360004f2"),
 "meta" : {
 "slug" : "282-glass",
 "tags" : [ "windows" ],
 },
 "display_name" : "Glass"
 },
 {
 "_id" : ObjectId("4ce82e64d3dfaa16360014eb"),
 "meta" : {
 "slug" : "3063-auto-glass",
 "tags" : [ "windshields" ],
 },
 "display_name" : "Auto Glass"
 }
 ]
 }

  22. find businesses by state/city/zip // find all within state
 db.businesses.find({


    "location.state._id": ObjectId("4ce82937961552247900000f")
 })
 ! ! ! ! ! ! ! ! 

  23. find businesses by state/city/zip // find all within state
 db.businesses.find({


    "location.state._id": ObjectId("4ce82937961552247900000f")
 })
 
 // find all within city
 db.businesses.find({
 "location.city._id": ObjectId("4ce82aa0d3dfaa10f8004a95")
 })
 ! ! ! 

  24. find businesses by state/city/zip // find all within state
 db.businesses.find({


    "location.state._id": ObjectId("4ce82937961552247900000f")
 })
 
 // find all within city
 db.businesses.find({
 "location.city._id": ObjectId("4ce82aa0d3dfaa10f8004a95")
 })
 
 // find all within zip
 db.businesses.find({
 "location.zip._id": ObjectId("4ce82b5ed3dfaa116b0026f0")
 })

  25. businesses by category // find by category id
 db.businesses.find({
 "categories._id":

    ObjectId("4ce82e50d3dfaa16360004f2")
 })
 // the index
 db.businesses.ensureIndex({
 "categories._id":1
 })
  26. #4 - find by category + location Businesses in the

    Plumbing category in Chicago, IL
  27. businesses by category + city // find by city id

    and category id
 db.businesses.find({
 "location.city._id": ObjectId("4ce82aa0d3dfaa10f8004a95"),
 "categories._id": ObjectId("4ce82e50d3dfaa16360004f2")
 })
  28. which index should we use? // city id
 {"location.city._id":1}
 //

    category id
 {"categories._id":1} ~ or ~ we need a compound index answer: both suck
  29. which order? db.businesses.ensureIndex({
 "location.city._id" : 1, "categories._id" : 1
 })


    db.businesses.ensureIndex({
 "categories._id" : 1, "location.city._id" : 1
 }) ~ or ~
  30. which order? db.businesses.ensureIndex({
 "location.city._id" : 1, "categories._id" : 1
 })


    db.businesses.ensureIndex({
 "categories._id" : 1, "location.city._id" : 1
 }) ~ or ~ 35,000 cities & 2,500 categories
  31. which order? db.businesses.ensureIndex({
 "location.city._id" : 1, "categories._id" : 1
 })


    db.businesses.ensureIndex({
 "categories._id" : 1, "location.city._id" : 1
 }) ~ or ~ answer: cities → categories 35,000 cities & 2,500 categories create one for zip codes and categories too!
  32. don’t we have 2 indexes on city id? {"location.city._id" :

    1} {"location.city._id" : 1, "categories._id" : 1}
  33. don’t we have 2 indexes on city id? answer: yes

    {"location.city._id" : 1} {"location.city._id" : 1, "categories._id" : 1}
  34. don’t we have 2 indexes on city id? answer: yes

    {"location.city._id" : 1} {"location.city._id" : 1, "categories._id" : 1} db.businesses.dropIndex("location.city._id_1")
  35. find businesses in city by keyword {
 "_id" : ObjectId("4ce838ef4a882579960001b9"),


    "name" : "Acme Glass Co",
 "keywords" : [
 "glass",
 "repair",
 "acme",
 ...
 ]
 }

  36. find businesses in city by keyword {
 "_id" : ObjectId("4ce838ef4a882579960001b9"),


    "name" : "Acme Glass Co",
 "keywords" : [
 "glass",
 "repair",
 "acme",
 ...
 ]
 }
 db.businesses.ensureIndex({
 "location.city._id":1,
 "keywords":1
 })

  37. find businesses in city by keyword {
 "_id" : ObjectId("4ce838ef4a882579960001b9"),


    "name" : "Acme Glass Co",
 "keywords" : [
 "glass",
 "repair",
 "acme",
 ...
 ]
 }
 db.businesses.ensureIndex({
 "location.city._id":1,
 "keywords":1
 })
 db.businesses.find({
 "location.city._id":ObjectId("4ce82aa0d3dfaa10f8004a95"),
 "keywords":/glass/i
 })

  38. chat with Kyle Banker me: we’re switching from postgres+solr to

    mongo kyle: oh wow, you can replace solr with mongo?
  39. chat with Kyle Banker me: we’re switching from postgres+solr to

    mongo kyle: oh wow, you can replace solr with mongo? me: with some creativity
  40. chat with Kyle Banker me: we’re switching from postgres+solr to

    mongo kyle: oh wow, you can replace solr with mongo? me: with some creativity kyle: seems like it’d still be hard to get just right
  41. chat with Kyle Banker me: we’re switching from postgres+solr to

    mongo kyle: oh wow, you can replace solr with mongo? me: with some creativity kyle: seems like it’d still be hard to get just right me: it works well
  42. chat with Kyle Banker me: we’re switching from postgres+solr to

    mongo kyle: oh wow, you can replace solr with mongo? me: with some creativity kyle: seems like it’d still be hard to get just right me: it works well kyle: gotcha
  43. I

  44. sitemaps • xml files containing each unique url ~ 24M

    • 50,000 urls per file, about 500 files • urls are generated from live data • http://companyx.com/sitemaps/1.xml
  45. partition by consistent hash >> "hello!".hash % 6 #=> 5

    >> "/ny/new-york/c/apartments".hash % 6 #=> 5 returns an integer between 0 and the number specified
  46. map/reduce 1.map each url in the site to a partition

    2.reduce all partitions to a single document containing all urls in that partition 3.save to a permanent collection
  47. map /il/chicago/c/pizza 4 /ny/new-york/c/apartments 1 nd/rugby/c/apartments 6 /14076500-bayside-marina 2 /13401000-comtrak-logistics-inc

    3 /12347500-allstate-auto-insurance 1 il/downers-grove/c/computer-web-design 6 /1009500-heidelberg-lodges 5 mn/redwood-falls/c/food-service 4 /14077000-bank-of-america 5 mn/savage/c/audio-visual-equipment 1 ... 1 2 3 4 5 6
  48. reduce {
 "total" : 2,
 "urls" : [
 "/12347500-allstate-auto-insurance",
 "/ny/new-york/c/apartments"


    ]
 }
 {
 "total" : 1,
 "urls" : [
 "/mn/savage/c/audio-visual-equipment"
 ]
 }
 {
 "_id" : 1,
 "value" : {
 "total" : 2,
 "urls" : [
 "/12347500-allstate-auto-insurance",
 "/mn/savage/c/audio-visual-equipment",
 "/ny/new-york/c/apartments"
 ]
 }
 }