$30 off During Our Annual Pro Sale. View Details »

Online Conference: Deep Dive with MongoDB

mongodb
July 11, 2012
6.3k

Online Conference: Deep Dive with MongoDB

July 11, 2012

This four hour online conference will introduce you to some MongoDB basics and get you up to speed with why and how you should choose MongoDB for your next project.

mongodb

July 11, 2012
Tweet

Transcript

  1. 1
    1
    Online Conference:
    Deep Dive with MongoDB

    View Slide

  2. 2
    2
    Building your first App
    with MongoDB

    View Slide

  3. 3
    3
    •  Quick introduction to mongoDB
    •  Data modeling in mongoDB, queries,
    geospatial, updates and map reduce.
    •  Using a location-based app as an
    example
    •  Example works in mongoDB JS shell

    View Slide

  4. 4
    4

    View Slide

  5. 5
    5
    MongoDB is a scalable, high-performance, open
    source, document-oriented database.
    •  Fast Querying
    •  In-place updates
    •  Full Index Support
    •  Replication /High Availability
    •  Auto-Sharding
    •  Aggregation; Map/Reduce
    •  GridFS

    View Slide

  6. 6
    6
    MongoDB is Implemented in C++
    •  Windows, Linux, Mac OS-X, Solaris
    Drivers are available in many languages
    10gen supported
    •  C, C# (.Net), C++, Erlang, Haskell, Java,
    JavaScript, Perl, PHP, Python, Ruby, Scala, nodejs!
    •  Multiple community supported drivers
    The image
    cannot be
    displayed. Your
    computer may
    not have enough
    memory to open
    the image, or the
    image may have
    been corrupted.
    Restart your
    computer, and
    then open the file
    again. If the red x

    View Slide

  7. 7
    7
    RDBMS MongoDB
    Table Collection
    Row(s) JSON Document
    Index Index
    Partition Shard
    Join Embedding/Linking
    Schema (implied Schema)

    View Slide

  8. 8
    8
    { _id : ObjectId("4c4ba5c0672c685e5e8aabf3"),
    author : "Asya",
    date : ISODate("2012-02-02T11:52:27.442Z"),
    text : "About MongoDB...",
    tags : [ "tech", "databases" ],
    comments : [{
    author : "Fred",
    date : ISODate("2012-02-03T17:22:21.124Z"),
    text : "Best Post Ever!"
    }],
    comment_count : 1
    }

    View Slide

  9. 9
    9
    • JSON has powerful, limited set of datatypes
    –  Mongo extends datatypes with Date, Int types, Id, …
    • MongoDB stores data in BSON
    • BSON is a binary representation of JSON
    –  Optimized for performance and navigational abilities
    –  Also compression
    See: bsonspec.org!

    View Slide

  10. 10
    10
    •  Intrinsic support for fast, iterative development
    •  Super low latency access to your data
    •  Very little CPU overhead
    •  No additional caching layer required
    •  Built in replication and horizontal scaling support

    View Slide

  11. 11
    11
    • Want to build an app where users can check in to
    a location
    • Leave notes or comments about that location

    View Slide

  12. 12
    12
    "As a user I want to be able to find other
    locations nearby"
    •  Need to store locations (Offices,
    Restaurants, etc)
    –  name, address, tags
    –  coordinates
    –  User generated content e.g. tips / notes

    View Slide

  13. 13
    13
    "As a user I want to be able to 'checkin' to a
    location"
    Checkins
    – User should be able to 'check in' to a location
    – Want to be able to generate statistics:
    •  Recent checkins
    •  Popular locations

    View Slide

  14. 14
    14
    users
    user1, user2
    loc1, loc2, loc3
    locations checkins
    checkin1, checkin2

    View Slide

  15. 15
    15
    > location_1 = {
    name: "Lotus Flower",
    address: "123 University Ave",
    city: "Palo Alto",
    post_code: 94012
    }

    View Slide

  16. 16
    16
    > location_1 = {
    name: "Lotus Flower",
    address: "123 University Ave",
    city: "Palo Alto",
    post_code: 94012
    }
    > db.locations.find({name: "Lotus Flower"})

    View Slide

  17. 17
    17
    > location_1 = {
    name: "Lotus Flower",
    address: "123 University Ave",
    city: "Palo Alto",
    post_code: 94012
    }
    > db.locations.ensureIndex({name: 1})
    > db.locations.find({name: "Lotus Flower"})

    View Slide

  18. 18
    18
    > location_2 = {
    name: "Lotus Flower",
    address: "123 University Ave",
    city: "Palo Alto",
    post_code: 94012,
    tags: ["restaurant", "dumplings"]
    }

    View Slide

  19. 19
    19
    > location_2 = {
    name: "Lotus Flower",
    address: "123 University Ave",
    city: "Palo Alto",
    post_code: 94012,
    tags: ["restaurant", "dumplings"]
    }
    > db.locations.ensureIndex({tags: 1})

    View Slide

  20. 20
    20
    > location_2 = {
    name: "Lotus Flower",
    address: "123 University Ave",
    city: "Palo Alto",
    post_code: 94012,
    tags: ["restaurant", "dumplings"]
    }
    > db.locations.ensureIndex({tags: 1})
    > db.locations.find({tags: "dumplings"})

    View Slide

  21. 21
    21
    > location_3 = {
    name: "Lotus Flower",
    address: "123 University Ave",
    city: "Palo Alto",
    post_code: 94012,
    tags: ["restaurant", "dumplings"],
    lat_long: [52.5184, 13.387]
    }

    View Slide

  22. 22
    22
    > location_3 = {
    name: "Lotus Flower",
    address: "123 University Ave",
    city: "Palo Alto",
    post_code: 94012,
    tags: ["restaurant", "dumplings"],
    lat_long: [52.5184, 13.387]
    }
    > db.locations.ensureIndex({lat_long: "2d"})

    View Slide

  23. 23
    23
    > location_3 = {
    name: "Lotus Flower",
    address: "123 University Ave",
    city: "Palo Alto",
    post_code: 94012,
    tags: ["restaurant", "dumplings"],
    lat_long: [52.5184, 13.387]
    }
    > db.locations.ensureIndex({lat_long: "2d"})
    > db.locations.find({lat_long: {$near:[52.53, 13.4]}})

    View Slide

  24. 24
    24
    // creating your indexes:
    > db.locations.ensureIndex({tags: 1})
    > db.locations.ensureIndex({name: 1})
    > db.locations.ensureIndex({lat_long: "2d"})
    // finding places:
    > db.locations.find({lat_long: {$near:[52.53, 13.4]}})
    // with regular expressions:
    > db.locations.find({name: /^Din/})
    // by tag:
    > db.locations.find({tag: "dumplings"})

    View Slide

  25. 25
    25
    Atomic operators:
    $set, $unset, $inc, $push, $pushAll, $pull,
    $pullAll, $bit

    View Slide

  26. 26
    26
    // initial data load:
    > db.locations.insert(location_3)
    // adding a tip with update:
    > db.locations.update(
    {name: "Lotus Flower"},
    {$push: {
    tips: {
    user: "Asya",
    date: "28/03/2012",
    tip: "The hairy crab dumplings are awesome!"}
    }})

    View Slide

  27. 27
    27
    > db.locations.findOne()
    { name: "Lotus Flower",
    address: "123 University Ave",
    city: "Palo Alto",
    post_code: 94012,
    tags: ["restaurant", "dumplings"],
    lat_long: [52.5184, 13.387],
    tips:[{
    user: "Asya",
    date: "28/03/2012",
    tip: "The hairy crab dumplings are awesome!"
    }]
    }

    View Slide

  28. 28
    28
    "As a user I want to be able to 'checkin' to a
    location"
    Checkins
    – User should be able to 'check in' to a location
    – Want to be able to generate statistics:
    •  Recent checkins
    •  Popular locations

    View Slide

  29. 29
    29
    > user_1 = {
    _id: "[email protected]",
    name: "Asya",
    twitter: "asya999",
    checkins: [
    {location: "Lotus Flower", ts: "28/03/2012"},
    {location: "Meridian Hotel", ts: "27/03/2012"}
    ]
    }
    > db.users.ensureIndex({checkins.location: 1})
    > db.users.find({checkins.location: "Lotus Flower"})

    View Slide

  30. 30
    30
    // find all users who've checked in here:
    > db.users.find({"checkins.location":"Lotus Flower"})

    View Slide

  31. 31
    31
    // find all users who've checked in here:
    > db.users.find({"checkins.location":"Lotus Flower"})
    // find the last 10 checkins here?
    > db.users.find({"checkins.location":"Lotus Flower"})
    .sort({"checkins.ts": -1}).limit(10)

    View Slide

  32. 32
    32
    // find all users who've checked in here:
    > db.users.find({"checkins.location":"Lotus Flower"})
    // find the last 10 checkins here: - Warning!
    > db.users.find({"checkins.location":"Lotus Flower"})
    .sort({"checkins.ts": -1}).limit(10)

    Hard to query for last 10

    View Slide

  33. 33
    33
    > user_2 = {
    _id: "[email protected]",
    name: "Asya",
    twitter: "asya999",
    }
    > checkin_1 = {
    location: location_id,
    user: user_id,
    ts: "20/03/2010"
    }
    > db.checkins.ensureIndex({user: 1})
    > db.checkins.find({user: user_id})

    View Slide

  34. 34
    34
    // find all users who've checked in here:
    > location_id = db.checkins.find({"name":"Lotus Flower"})
    > u_ids = db.checkins.find({location: location_id},
    {_id: -1, user: 1})
    > users = db.users.find({_id: {$in: u_ids}})
    // find the last 10 checkins here:
    > db.checkins.find({location: location_id})
    .sort({ts: -1}).limit(10)
    // count how many checked in today:
    > db.checkins.find({location: location_id,
    ts: {$gt: midnight}}
    ).count()

    View Slide

  35. 35
    35
    // Find most popular locations
    > agg = db.checkins.aggregate(
    {$match: {ts: {$gt: now_minus_3_hrs}}},
    {$group: {_id: "$location",
    numEntries: {$sum: 1}}}
    )
    > agg.result
    [{"_id": "Lotus Flower", "numEntries" : 17}]

    View Slide

  36. 36
    36
    // Find most popular locations
    > map_func = function() {
    emit(this.location, 1);
    }
    > reduce_func = function(key, values) {
    return Array.sum(values);
    }
    > db.checkins.mapReduce(map_func, reduce_func,
    {query: {ts: {$gt: now_minus_3_hrs}},
    out: "result"})
    > db.result.findOne()
    {"_id": "Lotus Flower", "value" : 17}

    View Slide

  37. 37
    37
    Deployment

    View Slide

  38. 38
    38
    P
    •  Single server
    - need a strong backup
    plan

    View Slide

  39. 39
    39
    •  Single server
    - need a strong backup
    plan
    •  Replica sets
    - High availability
    - Automatic failover
    P
    P S S

    View Slide

  40. 40
    40
    •  Single server
    - need a strong backup
    plan
    •  Replica sets
    - High availability
    - Automatic failover
    •  Sharded
    - Horizontally scale
    - Auto balancing
    P S S
    P S S
    P
    P S S

    View Slide

  41. 41
    41
    User  Data  Management   High  Volume  Data  Feeds    
    Content  Management   Opera9onal  Intelligence   E-­‐Commerce  

    View Slide

  42. 42
    42

    View Slide

  43. 43
    43
    @mongodb
    conferences, appearances, and meetups
    http://www.10gen.com/events
    http://bit.ly/mongofb
    Facebook | Twitter | LinkedIn
    http://linkd.in/joinmongo
    download at mongodb.org
    support, training, and this talk brought to you by

    View Slide

  44. 44
    44
    Schema Design with MongoDB

    View Slide

  45. Schema Design
    with MongoDB
    Antoine Girbal
    [email protected]
    @antoinegirbal

    View Slide

  46. So why model data?
    http://www.flickr.com/photos/42304632@N00/493639870/  

    View Slide

  47. Goals
    Avoid anomalies when inserting, updating or deleting
    Minimize redesign when extending the schema
    Avoid bias toward a particular query
    Make use of all SQL features
    In MongoDB
    Similar goals apply but rules are different
    Denormalization for optimization is an option: most features still exist, contrary to BLOBS
    Normalization

    View Slide

  48. Terminology
    RDBMS   MongoDB  
    Table   Collection  
    Row(s)   JSON  Document  
    Index   Index  
    Join   Embedding  &  Linking  
    Partition   Shard  
    Partition  Key   Shard  Key  

    View Slide

  49. Equivalent to a Table in SQL
    Cheap to create (max 24000)
    Collections don’t have a fixed schema
    Common for documents in a collection to share a schema
    Document schema can evolve
    Consider using multiple related collections tied together by a naming convention:
    e.g. LogData-2011-02-08
    Collections Basics

    View Slide

  50. Elements are name/value pairs, equivalent to column value in SQL
    elements can be nested
    Rich data types for values
    JSON for the human eye
    BSON for all internals
    16MB maximum size (many books..)
    What you see is what is stored
    Document basics

    View Slide

  51. Schema Design - Relational

    View Slide

  52. Schema Design - MongoDB

    View Slide

  53. Schema Design - MongoDB
    embedding  

    View Slide

  54. Schema Design - MongoDB
    embedding  
    linking  

    View Slide

  55. !
    Design documents that simply map to your application
    !
    > post = { author: "Hergé",!
    date: ISODate("2011-09-18T09:56:06.298Z"),!
    text: "Destination Moon",!
    tags: ["comic", "adventure"]!
    }!
    !
    > db.blogs.save(post)!
    Design Session

    View Slide

  56. > db.blogs.find()!
    !
    { _id: ObjectId("4c4ba5c0672c685e5e8aabf3"),!
    author: "Hergé", !
    date: ISODate("2011-09-18T09:56:06.298Z"), !
    text: "Destination Moon", !
    tags: [ "comic", "adventure" ]!
    }  
    Notes:
    •  ID must be unique, but can be anything you’d like
    •  MongoDB will generate a default ID if one is not supplied
    Find the document

    View Slide

  57. Secondary index for “author”
    // 1 means ascending, -1 means descending!
    > db.blogs.ensureIndex( { author: 1 } )!
    !
    > db.blogs.find( { author: 'Hergé' } ) !
    !
    { _id: ObjectId("4c4ba5c0672c685e5e8aabf3"),!
    date: ISODate("2011-09-18T09:56:06.298Z"),!
    author: "Hergé", !
    ... }!
    Add and index, find via Index

    View Slide

  58. > db.blogs.find( { author: "Hergé" } ).explain()
    {
    "cursor" : "BtreeCursor author_1",
    "nscanned" : 1,
    "nscannedObjects" : 1,
    "n" : 1,
    "millis" : 5,
    "indexBounds" : {
    "author" : [
    [
    "Hergé",
    "Hergé"
    ]
    ]
    }
    }
    Examine the query plan

    View Slide

  59. > db.blogs.find( { author: "Hergé" } ).explain()
    {
    "cursor" : "BtreeCursor author_1",
    "nscanned" : 1,
    "nscannedObjects" : 1,
    "n" : 1,
    "millis" : 5,
    "indexBounds" : {
    "author" : [
    [
    "Hergé",
    "Hergé"
    ]
    ]
    }
    }
    Examine the query plan

    View Slide

  60. Conditional operators:
    $ne, $in, $nin, $mod, $all, $size, $exists, $type, ..
    $lt, $lte, $gt, $gte, $ne...
    // find posts with any tags!
    > db.blogs.find( { tags: { $exists: true } } )!
    Query operators

    View Slide

  61. Conditional operators:
    $ne, $in, $nin, $mod, $all, $size, $exists, $type, ..
    $lt, $lte, $gt, $gte, $ne...
    // find posts with any tags!
    > db.blogs.find( { tags: { $exists: true } } )!
    Regular expressions:
    // posts where author starts with h!
    > db.blogs.find( { author: /^h/i } ) !
     
    Query operators

    View Slide

  62. Conditional operators:
    $ne, $in, $nin, $mod, $all, $size, $exists, $type, ..
    $lt, $lte, $gt, $gte, $ne...
    // find posts with any tags!
    > db.blogs.find( { tags: { $exists: true } } )!
    Regular expressions:
    // posts where author starts with h!
    > db.blogs.find( { author: /^h/i } ) !
    Counting:
    // number of posts written by Hergé!
    > db.blogs.find( { author: "Hergé" } ).count() !
    Query operators

    View Slide

  63. > new_comment =
    { author: "Kyle",
    date: new Date(),
    text: "great book" }
    > db.blogs.update(
    { text: "Destination Moon" },
    { "$push": { comments: new_comment },
    "$inc": { comments_count: 1 }
    } )
    Extending the Schema

    View Slide

  64. > db.blogs.find( { author: "Hergé"} )
    { _id : ObjectId("4c4ba5c0672c685e5e8aabf3"),
    author : "Hergé",
    date : ISODate("2011-09-18T09:56:06.298Z"),
    text : "Destination Moon",
    tags : [ "comic", "adventure" ],
    comments : [
    {
    author : "Kyle",
    date : ISODate("2011-09-19T09:56:06.298Z"),
    text : "great book"
    }
    ],
    comments_count: 1
    }
    Extending the Schema

    View Slide

  65. // create index on nested documents:
    > db.blogs.ensureIndex( { "comments.author": 1 } )
    > db.blogs.find( { "comments.author": "Kyle" } )
    Extending the Schema

    View Slide

  66. // create index on nested documents:
    > db.blogs.ensureIndex( { "comments.author": 1 } )
    > db.blogs.find( { "comments.author": "Kyle" } )
    // find last 5 posts:
    > db.blogs.find().sort( { date: -1 } ).limit(5)
    Extending the Schema

    View Slide

  67. // create index on nested documents:
    > db.blogs.ensureIndex( { "comments.author": 1 } )
    > db.blogs.find( { "comments.author": "Kyle" } )
    // find last 5 posts:
    > db.blogs.find().sort( { date: -1 } ).limit(5)
    // most commented post:
    > db.blogs.find().sort( { comments_count: -1 } ).limit(1)
    When sorting, check if you need an index
    Extending the Schema

    View Slide

  68. Patterns:
    •  Inheritance
    •  one to one
    •  one to many
    •  many to many
    Common Patterns

    View Slide

  69. Inheritance

    View Slide

  70. shapes table
    Single Table Inheritance -
    MongoDB
    id   type   area   radius   length   width  
    1   circle   3.14   1  
    2   square   4   2  
    3   rect   10   5   2  

    View Slide

  71. > db.shapes.find()
    { _id: "1", type: "c", area: 3.14, radius: 1}
    { _id: "2", type: "s", area: 4, length: 2}
    { _id: "3", type: "r", area: 10, length: 5, width: 2}
    Single Table Inheritance - MongoDB
    missing  values  not  
    stored!  

    View Slide

  72. > db.shapes.find()
    { _id: "1", type: "c", area: 3.14, radius: 1}
    { _id: "2", type: "s", area: 4, length: 2}
    { _id: "3", type: "r", area: 10, length: 5, width: 2}
    // find shapes where radius > 0
    > db.shapes.find( { radius: { $gt: 0 } } )
    Single Table Inheritance - MongoDB

    View Slide

  73. > db.shapes.find()
    { _id: "1", type: "c", area: 3.14, radius: 1}
    { _id: "2", type: "s", area: 4, length: 2}
    { _id: "3", type: "r", area: 10, length: 5, width: 2}
    // find shapes where radius > 0
    > db.shapes.find( { radius: { $gt: 0 } } )
    // create index
    > db.shapes.ensureIndex( { radius: 1 }, { sparse:true } )
    Single Table Inheritance - MongoDB
    index  only  values  
    present!  

    View Slide

  74. One to Many
    Either:  
     
    • Embedded  Array  /  Document:  
    •  improves  read  speed  
    •  simplifies  schema  
     
    • Normalize:  
    •  if  list  grows  significantly  
    •  if  sub  items  are  updated  often  
    •  if  sub  items  are  more  than  1  level  deep  and  need  updating  

    View Slide

  75. One to Many
    Embedded Array:!
    • $slice operator to return subset of comments!
    • some queries become harder (e.g find latest comments
    across all blogs)!
    !
    blogs: { !
    author : "Hergé",!
    date : ISODate("2011-09-18T09:56:06.298Z"), !
    comments : [!
    " {!
    " "author : "Kyle",!
    " "date : ISODate("2011-09-19T09:56:06.298Z"),!
    " "text : "great book"!
    " }!
    ]!
    }!

    View Slide

  76. One to Many
    Normalized (2 collections)!
    • most flexible!
    • more queries!
    !
    blogs: { _id: 1000, !
    author: "Hergé",!
    date: ISODate("2011-09-18T09:56:06.298Z") }!
    !
    comments : { _id : 1,!
    blogId: 1000,!
    author : "Kyle",!
    " " date : ISODate("2011-09-19T09:56:06.298Z") }!
    !
    > blog = db.blogs.find( { text: "Destination Moon" } );!
    !
    > db.ensureIndex( { blogId: 1 } ) // important!!
    > db.comments.find( { blogId: blog._id } );!

    View Slide

  77. Example:
    •  Product can be in many categories
    •  Category can have many products
    Many - Many

    View Slide

  78. // Each product list the IDs of the categories!
    products:!
    { _id: 10, name: "Destination Moon",!
    category_ids: [ 20, 30 ] }!
    !
    Many - Many

    View Slide

  79. // Each product list the IDs of the categories!
    products:!
    { _id: 10, name: "Destination Moon",!
    category_ids: [ 20, 30 ] }!
    !
    // Each category lists the IDs of the products!
    categories:!
    { _id: 20, name: "adventure", !
    product_ids: [ 10, 11, 12 ] }!
    !
    categories:!
    { _id: 21, name: "movie", !
    product_ids: [ 10 ] }!
    !
    Many - Many

    View Slide

  80. // Each product list the IDs of the categories!
    products:!
    { _id: 10, name: "Destination Moon",!
    category_ids: [ 20, 30 ] }!
    !
    // Each category lists the IDs of the products!
    categories:!
    { _id: 20, name: "adventure", !
    product_ids: [ 10, 11, 12 ] }!
    !
    categories:!
    { _id: 21, name: "movie", !
    product_ids: [ 10 ] }!
    !
    Cuts mapping table and 2 indexes, but:!
    •  potential consistency issue!
    •  lists can grow too large!
    Many - Many

    View Slide

  81. // Each product list the IDs of the categories!
    products:!
    { _id: 10, name: "Destination Moon",!
    category_ids: [ 20, 30 ] }!
    !
    // Association not stored on the categories!
    categories:!
    { _id: 20, !
    name: "adventure"}!
    !
    Alternative

    View Slide

  82. // Each product list the IDs of the categories!
    products:!
    { _id: 10, name: "Destination Moon",!
    category_ids: [ 20, 30 ] }!
    !
    // Association not stored on the categories!
    categories:!
    { _id: 20, !
    name: "adventure"}!
    !
    // All products for a given category!
    > db.products.ensureIndex( { category_ids: 1} ) // yes!!
    > db.products.find( { category_ids: 20 } )!
    !
    !
    Alternative

    View Slide

  83. Use cases:
    •  Trees
    •  Time Series
    Common Use Cases

    View Slide

  84. Hierarchical information
     
     
         
    Trees

    View Slide

  85. Full Tree in Document
     
    { retweet: [!
    { who: “Kyle”, text: “...”, !
    retweet: [!
    {who: “James”, text: “...”,!
    retweet: []} !
    ]}!
    ]!
    }!
     
    Pros: Single Document, Performance, Intuitive
    Cons: Hard to search or update, document can easily get too large
     
         
    Trees

    View Slide

  86. // Store all Ancestors of a node
    { _id: "a" }
    { _id: "b", tree: [ "a" ], retweet: "a" }
    { _id: "c", tree: [ "a", "b" ], retweet: "b" }
    { _id: "d", tree: [ "a", "b" ], retweet: "b" }
    { _id: "e", tree: [ "a" ], retweet: "a" }
    { _id: "f", tree: [ "a", "e" ], retweet: "e" }
    // find all direct retweets of "b"
    > db.tweets.find( { retweet: "b" } )
    Array of Ancestors A   B   C  
    D  
    E  
    F  

    View Slide

  87. // Store all Ancestors of a node
    { _id: "a" }
    { _id: "b", tree: [ "a" ], retweet: "a" }
    { _id: "c", tree: [ "a", "b" ], retweet: "b" }
    { _id: "d", tree: [ "a", "b" ], retweet: "b" }
    { _id: "e", tree: [ "a" ], retweet: "a" }
    { _id: "f", tree: [ "a", "e" ], retweet: "e" }
    // find all direct retweets of "b"
    > db.tweets.find( { retweet: "b" } )
    // find all retweets of "e" anywhere in tree
    > db.tweets.find( { tree: "e" } )
    Array of Ancestors A   B   C  
    D  
    E  
    F  

    View Slide

  88. // Store all Ancestors of a node
    { _id: "a" }
    { _id: "b", tree: [ "a" ], retweet: "a" }
    { _id: "c", tree: [ "a", "b" ], retweet: "b" }
    { _id: "d", tree: [ "a", "b" ], retweet: "b" }
    { _id: "e", tree: [ "a" ], retweet: "a" }
    { _id: "f", tree: [ "a", "e" ], retweet: "e" }
    // find all direct retweets of "b"
    > db.tweets.find( { retweet: "b" } )
    // find all retweets of "e" anywhere in tree
    > db.tweets.find( { tree: "e" } )
    // find tweet history of f:
    > tweets = db.tweets.findOne( { _id: "f" } ).tree
    > db.tweets.find( { _id: { $in : tweets } } )
    Array of Ancestors A   B   C  
    D  
    E  
    F  

    View Slide

  89. Store hierarchy as a path expression
    •  Separate each node by a delimiter, e.g. “,”
    •  Use text search for find parts of a tree
    •  search must be left-rooted and use an index!
    { retweets: [!
    { _id: "a", text: "initial tweet", !
    path: "a" },!
    { _id: "b", text: "reweet with comment",!
    path: "a,b" },!
    { _id: "c", text: "reply to retweet",!
    path : "a,b,c"} ] }!
    !
    // Find the conversations "a" started !
    > db.tweets.find( { path: /^a/i } )!
    // Find the conversations under a branch !
    > db.tweets.find( { path: /^a,b/i } )!
    Trees as Paths A   B   C  
    D  
    E  
    F  

    View Slide

  90. •  Records stats by
    •  Day, Hour, Minute
    •  Show time series
    Time Series

    View Slide

  91. // Time series buckets, hour and minute sub-docs
    { _id: "20111209-1231",
    ts: ISODate("2011-12-09T00:00:00.000Z")
    daily: 67,
    hourly: { 0: 3, 1: 14, 2: 19 ... 23: 72 },
    minute: { 0: 0, 1: 4, 2: 6 ... 1439: 0 }
    }
    // Add one to the last minute before midnight
    > db.votes.update(
    { _id: "20111209-1231",
    ts: ISODate("2011-12-09T00:00:00.037Z") },
    { $inc: { "hourly.23": 1 },
    $inc: { "minute.1439": 1 })
    Time Series

    View Slide

  92. •  Sequence of key/value pairs
    •  NOT a hash map
    •  Optimized to scan quickly
    BSON Storage
    ...  
    0   1   2   3   1439  
    What is the cost of update the minute before midnight?

    View Slide

  93. •  Can skip sub-documents
    BSON Storage
    ...  
    0   1   59   1439  
    How could this change the schema?
    0  
    ...  
    23  
    ...  
    1380  

    View Slide

  94. Use more of a Tree structure by nesting!
    // Time series buckets, each hour a sub-document
    { _id: "20111209-1231",
    ts: ISODate("2011-12-09T00:00:00.000Z")
    daily: 67,
    minute: { 0: { 0: 0, 1: 7, ... 59: 2 },
    ...
    23: { 0: 15, ... 59: 6 }
    }
    }
    // Add one to the last second before midnight
    > db.votes.update(
    { _id: "20111209-1231" },
    ts: ISODate("2011-12-09T00:00:00.000Z") },
    { $inc: { "minute.23.59": 1 } })
    Time Series

    View Slide

  95. Document to represent a shopping order:
    { _id: 1234,
    ts: ISODate("2011-12-09T00:00:00.000Z")
    customerId: 67,
    total_price: 1050,
    items: [{ sku: 123, quantity: 2, price: 50,
    name: “macbook”, thumbnail: “macbook.png” },
    { sku: 234, quantity: 1, price: 20,
    name: “iphone”, thumbnail: “iphone.png” },
    ...
    }
    }
    The item information is duplicated in every order that reference it.
    Mongo’s flexible schema makes it easy!
    Duplicate data

    View Slide

  96. Pros:
    only 1 query to get all information needed to display the order
    processing on the db is as fast as a BLOB
    can achieve much higher performance
    Cons:
    more storage used ... cheap enough
    updates are much more complicated ... just consider fields immutable
    Duplicate data

    View Slide

  97. Basic data design principles stay the same ...
    But MongoDB is more flexible and brings possibilities
    embed or duplicate data to speed up operations, cut down the number of collections and indexes
    watch for documents growing too large
    make sure to use the proper indexes for querying and sorting
    schema should feel natural to your application!
    Summary

    View Slide

  98. @mongodb  
    conferences,  appearances,  and  meetups  
    http://www.10gen.com/events  
    http://bit.ly/mongo_    
    Facebook                    |                  Twitter                  |                  LinkedIn  
    http://linkd.in/joinmongo  
    download at mongodb.org

    View Slide

  99. 99
    99
    Replication and Replica Sets

    View Slide

  100. 100
    100
    Why Have Replication?

    View Slide

  101. 101
    101
    •  High Availability (auto-failover)
    •  Read Scaling (extra copies to read from)
    •  Backups
    –  Online, Delayed Copy (fat finger)
    –  Point in Time (PiT) backups
    •  Use (hidden) replica for secondary workload
    –  Analytics
    –  Data-processing
    –  Integration with external systems

    View Slide

  102. 102
    102
    Planned
    –  Hardware upgrade
    –  O/S or file-system tuning
    –  Relocation of data to new file-system / storage
    –  Software upgrade
    Unplanned
    –  Hardware failure
    –  Data center failure
    –  Region outage
    –  Human error
    –  Application corruption

    View Slide

  103. 103
    103
    •  A cluster of N servers
    •  All writes to primary
    •  Reads can be to primary (default) or a
    secondary
    •  Any (one) node can be primary
    •  Consensus election of primary
    •  Automatic failover
    •  Automatic recovery

    View Slide

  104. 104
    104
    •  Replica Set is made up of 2 or more
    nodes
    Member 1
    Member 2
    Member 3

    View Slide

  105. 105
    105
    •  Election establishes the PRIMARY
    •  Data replication from PRIMARY to SECONDARY
    Member 1
    Member 2
    Primary
    Member 3

    View Slide

  106. 106
    106
    •  PRIMARY may fail
    •  Automatic election of new PRIMARY
    if majority exists
    Member 1
    Member 2
    DOWN
    Member 3
    negotiate
    new master

    View Slide

  107. 107
    107
    Member 1
    Member 2
    DOWN
    Member 3
    Primary
    negotiate
    new master
    •  New PRIMARY elected
    •  Replica Set re-established

    View Slide

  108. 108
    108
    •  Automatic recovery
    Member 1
    Member 3

    Primary
    Member 2
    Recovering

    View Slide

  109. 109
    109
    •  Replica Set re-established
    Member 1
    Member 3

    Primary
    Member 2

    View Slide

  110. 110
    110
    Understanding automatic
    failover

    View Slide

  111. 111
    111
    Primary
    Secondary
    Secondary
    As long as a partition
    can see a majority
    (>50%) of the cluster,
    then it will elect a
    primary.

    View Slide

  112. 112
    112
    Primary
    Failed
    Node
    Secondary
    66% of cluster visible.
    Primary is elected

    View Slide

  113. 113
    113
    Failed
    Node
    33% of cluster visible.
    Read only mode.
    Failed
    Node
    Secondary

    View Slide

  114. 114
    114
    Primary
    Secondary
    Secondary

    View Slide

  115. 115
    115
    Primary
    Secondary
    Secondary
    Primary
    Failed
    Node
    Secondary
    66% of cluster visible
    Primary is elected

    View Slide

  116. 116
    116
    Secondary
    33% of cluster visible
    Read only mode.
    Primary
    Secondary
    Failed
    Node
    Failed
    Node
    Secondary

    View Slide

  117. 117
    117
    Primary
    Secondary
    Secondary
    Secondary

    View Slide

  118. 118
    118
    Primary
    Secondary
    Secondary
    Secondary
    Failed
    Node
    Secondary
    Failed
    Node
    50% of cluster visible
    Read only mode.
    Secondary

    View Slide

  119. 119
    119
    Primary
    Secondary
    Failed
    Node
    Secondary
    Failed
    Node
    50% of cluster visible
    Read only mode.
    Secondary
    Secondary
    Secondary

    View Slide

  120. 120
    120
    Avoid single points of
    failure

    View Slide

  121. 121
    121

    View Slide

  122. 122
    122
    Primary
    Secondary
    Secondary
    Top of rack switch
    Rack falls over

    View Slide

  123. 123
    123
    Primary
    Secondary
    Secondary
    Loss of internet
    Building burns dow

    View Slide

  124. 124
    124
    Primary
    Secondary
    Secondary
    San Francisco
    Dallas

    View Slide

  125. 125
    125
    Primary
    Secondary
    Secondary
    San Francisco
    Dallas
    Priority 1
    Priority 1
    Priority 0
    Disaster recover data center. Will
    never become primary
    automatically.

    View Slide

  126. 126
    126
    Primary
    Secondary
    Secondary
    San Francisco
    Dallas
    New York

    View Slide

  127. 127
    127
    Fast recovery

    View Slide

  128. 128
    128
    Primary
    Arbiter
    Secondary
    Is this a good idea?

    View Slide

  129. 129
    129
    Primary
    Arbiter
    Secondary
    1

    View Slide

  130. 130
    130
    Primary
    Arbiter
    Secondary
    Primary
    Arbiter
    Secondary
    1 2

    View Slide

  131. 131
    131
    Primary
    Arbiter
    Secondary
    Primary
    Arbiter
    Secondary
    1 2
    Primary
    Arbiter
    Secondary
    3
    Secondary
    Full Sync
    Uh oh. Full Sync is going to use
    a lot of resources on the
    primary. So I may have
    downtime or degraded
    performance

    View Slide

  132. 132
    132
    Primary
    Secondary
    1
    Secondary

    View Slide

  133. 133
    133
    Primary
    Secondary
    Primary
    Secondary
    1 2
    Secondary Secondary

    View Slide

  134. 134
    134
    Primary
    Secondary
    Primary
    Secondary
    1 2
    Primary
    Secondary
    3
    Secondary
    Full Sync
    Sync can happen from
    secondary, which will not impact
    traffic on Primary.
    Secondary Secondary Secondary

    View Slide

  135. 135
    135
    •  Avoid single points of failure
    – Separate racks
    – Separate data centers
    •  Avoid long recovery downtime
    – Use journaling
    – Use 3+ replicas
    •  Keep your actives close
    – Use priority to control where failovers happen

    View Slide

  136. 136
    136
    Q&A after this session

    View Slide

  137. 137
    137
    Introducing MongoDB into your
    Organization

    View Slide

  138. 138
    138
    Introducing MongoDB into
    your Organization
    Edouard Servan-Schreiber, Ph.D.
    Director for Solution Architecture
    [email protected]
    @edouardss

    View Slide

  139. 139
    139
    •  You  are  using,  or  want  to  use,  MongoDB  
    –  What  benefits?  
    –  Poten9al  Use  cases  
    –  Steering  the  adop9on  of  MongoDB  
    •  Why  is  MongoDB  Safe  
    –  Execu9on  
    –  Opera9onal  
    –  Financial  
    •  Why  10gen?  
    –  People  
    –  Company  
    –  Future  

    View Slide

  140. 140
    140
    Your First MongoDB Project

    View Slide

  141. 141
    141
    Big Data!
    New
    Programming
    models
    New Hardware
    Architecture

    View Slide

  142. 142
    142
    Horizontally Scalable
    { author: “roger”,
    date: new Date(),
    text: “Spirited Away”,
    tags: [“Tezuka”, “Manga”]}
    Document
    Oriented
    High
    Performance
    -indexes
    -RAM
    Application"

    View Slide

  143. 143
    143
    User  Data  Management   High  Volume  Data  Feeds    
    Content  Management   Opera9onal  Intelligence   Product  Data  Mgt  

    View Slide

  144. 144
    144
    •  “NoSQL databases are proving
    valuable for scaling out cloud and on-
    premises uses of numerous content
    types, and document-oriented open-
    source solutions are emerging as one
    of the leading choices. “

    View Slide

  145. 145
    145
    •  Reassuring  the  Ops  Team  
    •  Reassuring  the  Business  Team  
    •  Start  with  low  stakes  –  learn  to  trust  
    •  Grow  towards  a  mission  cri9cal  use  case  
    •  LET  US  HELP  YOU!    è  [email protected]  

    View Slide

  146. 146
    146
    Execution

    View Slide

  147. 147
    147

    View Slide

  148. 148
    148
    { "
    _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), "
    author : "roger","
    date : "Sat Jul 24 2010 19:47:11", "
    text : "Spirited Away","
    tags : [ "Tezuka", "Manga" ],"
    comments : ["
    { author : ’’ Fred ","
    date : "Sat Jul 24 2010 20:51:03","
    text : "Best Movie Ever” } , "
    { author : ’’ Bill ","
    date : "Sat Jul 24 2010 21:13:23","
    text : ” No Way !! ” }"
    " ] "
    }"
    "

    View Slide

  149. Iteration

    View Slide

  150. 150
    150
    •  Start  
    •  Develop  
    •  Scale  

    View Slide

  151. 151
    151
    Operational

    View Slide

  152. 152
    152
    •  Elas9c  capacity  
    •  Data  center  outages  
    •  Upgrading  DB  versions  
    •  Upgrade  App  versions  
    •  Change/Evolve  schema/representa9on  

    View Slide

  153. 153
    153
    •  Data  Durability    
    –  Journal  
    –  Replicated  Writes  
    •  Data  Consistency  
    –  Single  Master  
    –  Shard  to  Scale  
    •  YOU  are  in  control!  

    View Slide

  154. 154
    154
    •  Millions  of  IO  ops/sec  
    •  Petabytes  of  data  
    •  Commodity  hardware  –  Virtual  hardware  

    View Slide

  155. 155
    155
    Economics

    View Slide

  156. 156
    156
    •  Less  code  
    •  More  produc9ve  coding  
    •  Easier  to  maintain  
    •  Con9ngency  plans  for  turnover  
    •  Commodity  hardware  
    •  No  upfront  license,  pay  for  value  over  9me  
    •  Cost  visibility  for  growth  of  usage  

    View Slide

  157. 157
    157
    §  Analyze  a  staggering  
    amount  of  data  for  a  system  
    build  on  con9nuous  stream  
    of  high-­‐quality  text  pulled  
    from  online  sources  
    §  Adding  too  much  data  too  
    quickly  resulted  in  outages;  
    tables  locked  for  tens  of  
    seconds  during  inserts  
    §  Ini9ally  launched  en9rely  on  
    MySQL  but  quickly  hit  
    performance  road  blocks  
     
    Problem
    Life  with  MongoDB  has  been  good  for  Wordnik.  Our  code  is  faster,  more  flexible  and  drama?cally  smaller.  
    Since  we  don’t  spend  ?me  worrying  about  the  database,  we  can  spend  more  ?me  wri?ng  code  for  our  
    applica?on.  
    §  Migrated  5  billion  records  in  
    a  single  day  with  zero  
    down9me  
    §  MongoDB  powers  every  
    website  requests:  20m  API  
    calls  per  day  
    §  Ability  to  eliminated  
    memcached  layer,  crea9ng  a  
    simplified  system  that  
    required  fewer  resources  
    and  was  less  prone  to  error.  
    Why MongoDB
    §  Reduced  code  by  75%  
    compared  to  MySQL  
    §  Fetch  9me  cut  from  400ms  
    to  60ms  
    §  Sustained  insert  speed  of  8k  
    words  per  second,  with  
    frequent  bursts  of  up  to  50k  
    per  second  
    §  Significant  cost  savings  and  
    15%  reduc9on  in  servers  
     
    Impact
    Wordnik uses MongoDB as the foundation for its “live” dictionary that stores its entire
    text corpus – 3.5T of data in 20 billion records
    Tony Tam, Vice President of Engineering and Technical Co-founder

    View Slide

  158. 158
    158
    Why 10gen ?

    View Slide

  159. 159
    159
    Dwight Merriman – CEO!
    Founder, CTO DoubleClick"
    Max Shireson – President!
    COO MarkLogic"
    9 Years at Oracle"
    Eliot Horowitz – CTO !
    Co-founder of Shopwiki,
    DoubleClick
    Erik Frieberg – VP Marketing!
    HP Software, Borland, BEA
    Ben Sabrin – VP of Sales !
    VP of Sales at Jboss, over 9
    years of Open Source
    experience

    View Slide

  160. 160
    160
    •  Community  and  Commercial  
    •  Dedicated  support  staff  across  the  globe  
    –  NY  
    –  CA  
    –  Dublin  
    –  London  
    –  Australia  

    View Slide

  161. 161
    161
    •  Union  Square  Ventures  
    •  Sequoia  Capital  
    •  Flybridge  Capital  
    •  NEA  
    •  $80M  raised  overall  
    •  Most  recent  round:  $42M  in  May…  

    View Slide

  162. 162
    162
    What’s in store…

    View Slide

  163. 163
    163
    •  Authen9ca9on  
    •  Data  encryp9on    
    –  At  rest  
    –  In  flight  
    •  Full  Text  Search  
    •  Global  Database  lock  ?  
    •  Monitoring  

    View Slide

  164. 164
    164
    Version  2.2  (now)  
     
    •  Database  level  locking  
    •  Aggrega9on  Framework  
    •  TTL  collec9ons  
    •  Geo-­‐aware  sharding  
    •  Read  Preferences  
    Version  2.4  (Q4  2012)  
     
    •  Kerberos/LDAP  authen9ca9on  
    •  Collec9on  level  locking  
    •  Full  Text  Search  
    •  Improved  Aggrega9on  
    Framework  

    View Slide

  165. 165
    165
    [email protected]
    Easy to start
    Easy to develop
    Easy to scale

    View Slide