Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Webinar Intro to Schema Design

mongodb
August 09, 2012
1.7k

Webinar Intro to Schema Design

MongoDB has been designed for versatility, but the techniques you might use to build, say, an analytics engine or a hierarchical data store might not be obvious. In this talk, we'll learn about MongoDB in practice by looking at hypothetical application designs (based on real-world designs, of course). Topics to be covered include schema design, indexing, transactions (gasp!), trees, what's fast, and what's not. Sprinkled with tips, tricks, shoots, ladders, and trap doors, you're guaranteed to learn something new in this interdisciplinary talk.

mongodb

August 09, 2012
Tweet

Transcript

  1. Schema  Design  by  Example  
     
    Kevin  Hanson  
    Solutions  Architect  
    @hungarianhc  
    [email protected]  
    Audio  should  start  immediatly  when  you  log  into  the  event  via  Audio  Broadcast.  
    If  you  are  having  issues  connecting,  please  dial    
    1-­‐877-­‐668-­‐4493  or  +1-­‐408-­‐600-­‐3600Access  code:  667  326  336  
    Global  dial-­‐in  numbers  can  be  found  on  the  Event  Info  tab  of  your  WebEx  Event  Center  screen.  There  is  
    a  Q&A  following  the  talk.  Please  enter  in  all  questions  in  the  WebEx  chat  box.A  recording  of  the  webinar  
    will  be  available  24  hours  after  the  event  is  complete.  
    1  

    View Slide

  2. •  MongoDB Data Model
    •  Blog Posts & Comments
    •  Geospatial Check-Ins
    •  Food For Thought
    Agenda

    View Slide

  3. {
    title: bWho Needs Rows?`,
    reasons: [
    { name: bscalability`,
    desc: bno more joins!` },
    { name: bhuman readable`,
    desc: bah this is nice...` }
    ],
    model: {
    relational: false,
    awesome: true
    }
    }
    MongoDB Data Model:
    Rich Documents

    View Slide

  4. { _id : ObjectId("4c4ba5c0672c685e5e8aabf3"),

    author : "roger",

    date : "Sat Jul 24 2010 19:47:11 GMT-0700 (PDT)",

    text : "Spirited Away",

    tags : [ "Tezuka", "Manga" ],

    comments : [


    {



    author : "Fred",



    date : "Sat Jul 24 2010 20:51:03 GMT-0700 (PDT)",



    text : "Best Movie Ever"


    }

    ]}

    Embedded Documents

    View Slide

  5. Parallels
    RDBMS   MongoDB  
    Table   Collection  
    Row   Document  
    Column   Field  
    Index   Index  
    Join   Embedding  &  Linking  
    Schema  Object  

    View Slide

  6. User
    • Name
    • Email Address
    Category
    • Name
    • Url
    Article
    • Name
    • Slug
    • Publish date
    • Text
    Tag
    • Name
    • Url
    Comment
    • Comment
    • Date
    • Author
    Relational

    View Slide

  7. User
    • Name
    • Email Address
    Article
    • Name
    • Slug
    • Publish date
    • Text
    • Author
    Tag[]
    • Value
    Comment[]
    • Comment
    • Date
    • Author
    Category[]
    • Value
    MongoDB

    View Slide

  8. Blog Posts and Comments

    View Slide

  9. How Should the Documents Look?
    What Are We Going to Do with the
    Data?

    View Slide

  10. 1) Fully Embedded
    {
    blog-title: bCommuting to Work`,
    blog-text: [
    bThis section is about airplanes`,
    bthis section is about trains`
    ],
    comments: [
    { author: bKevin Hanson`,
    comment: bdude, what about driving?` },
    { author: bJohn Smith`,
    comment: bthis blog is aWful!!11!!!!` }
    ],
    }

    View Slide

  11. 1) Fully Embedded
    Pros
    •  Can query the comments or the blog for results
    •  Cleanly encapsulated
    Cons
    •  What if we get too many comments? (16MB
    MongoDB doc size)
    •  What if we want our results to be comments, not
    blog posts?

    View Slide

  12. 2) Separating Blog & Comments
    {
    _id:
    ObjectId("4c4ba5c0672c685e
    5e8aabf3")
    comment-ref:
    ObjectId("4c4ba5c0672c685e
    5e8aabf4")
    blog-title: bCommuting to
    Work`,
    blog-text: [
    bThis section is about
    airplanes`,
    bthis section is about
    trains`
    ]
    }
    {
    _id:
    ObjectId("4c4ba5c0672c685e5e
    8aabf4")
    blog-ref:
    ObjectId("4c4ba5c0672c685e5e
    8aabf3")
    comments: [
    { author: ‘Kevin Hanson’,
    comment: ‘dude, what about
    driving?’ },
    { author: ‘John Smith’,
    comment: ‘this blog is
    aWful!!11!!!!’ }
    ],
    }

    View Slide

  13. 2) Separating Blog & Comments
    Pros
    •  Blog Post Size Stays Constant
    •  Can Search Sets of Comments
    Cons
    •  Too Many Comments? (same problem)
    •  Managing Document Links

    View Slide

  14. 3) Each Comment Gets Own Doc
    {
    blog-title: bCommuting to Work`,
    blog-text: [
    bThis section is about airplanes`,
    bthis section is about trains`]
    }
    {
    commenter: bKevin Hanson`,
    comment: bdude, what about driving?`
    }
    {
    commenter: bJohn Smith`,
    comment: bthis blog is aWful!!11!!!!`
    }

    View Slide

  15. 3) Each Comment Gets Own Doc
    Pros
    •  Can Query Individual Comments
    •  Never Need to Worry About Doc Size
    Cons
    •  Many Documents
    •  Standard Use Cases Become Complicated

    View Slide

  16. Managing Arrays
    Pushing to an Array Infinitely...
    •  Document Will Grow Larger than Allocated
    Space
    •  Document May Increase Max Doc Size of
    16MB
    Can this be avoided??
    •  Yes!
    •  A Hybrid of Linking and Embedding

    View Slide

  17. Geospatial Check-Ins

    View Slide

  18. We Need 3 Things
    Places Check-Ins Users

    View Slide

  19. Places
    Q: Current location
    A: Places near
    location
    User Generated
    Content
    Places

    View Slide

  20. Inserting a Place
     
    var  p  =  {  name:  “10gen  HQ”,  
                     address:  “578  Broadway,  7th  Floor”,  
                     city:  “New  York”,  
                     zip:  “10012”}  
     
    >  db.places.save(p)  

    View Slide

  21. Tags, Geo Coordinates, and Tips
                     {  name:  “10gen  HQ”,  
                     address:  “578  Broadway,  7th  Floor”,  
                     city:  “New  York”,  
                     zip:  10012,  
                     tags:  [“MongoDB”,  “business”],  
                     latlong:  [40.0,  72.0],  
                     tips:  [{user:  “kevin”,  time:  “3/15/2012”,tip:  
    “Make  sure  to  stop  by  for  office  hours!”}],}  

    View Slide

  22. Updating Tips
    db.places.update({name:"10gen  HQ"},  
     {$push  :{tips:        {user:"nosh",  time:
    3/15/2012,        tip:"stop  by  for  office  hours  
    on        Wednesdays  from  4-­‐6"}}})  
     

    View Slide

  23. Querying Places
    ★ Creating  
    Indexesdb.places.ensureIndex({tags:
    1})db.places.ensureIndex({name:
    1})db.places.ensureIndex({latlong:”2d”})Findi
    ng  Placesdb.places.find({latlong:{$near:
    [40,70]}})Regular  
    Expressionsdb.places.find({name:  /
    ^typeaheadstring/)Using  
    Tagsdb.places.find({tags:  “business”})  

    View Slide

  24. User Check-Ins
    Record User Check-Ins
    Check-Ins
    Users
    Stats
    Users
    Stats

    View Slide

  25. Users
    user1  =  {  name:  “Kevin  Hanson”  e-­‐mail:  
    [email protected]”,  check-­‐ins:  
    [4b97e62bf1d8c7152c9ccb74,    5a20e62bf1d8c736ab]  
    }  
    checkins  []  =  ObjectId  reference  to  Check-­‐Ins  
    Collection  

    View Slide

  26. Check-Ins
    checkin  =  {  place:    “10gen  HQ”,  ts:    9/20/2010  
    10:12:00,  userId:    
    }  
    Every  Check-­‐In  is  Two  Operations  
    •   Insert  a  Check-­‐In  Object  (check-­‐ins  collection)  
    •   Update  ($push)  user  object  with  check-­‐in  ID  (users  
    collection)  

    View Slide

  27. Simple Stats
    db.checkins.find({place: “10gen
    HQ”)db.checkins.find({place: “10gen HQ”})




    .sort({ts:-1}).limit(10)db.checkins.fin
    d({place: “10gen HQ”,




    ts: {$gt:
    midnight}}).count()

    View Slide

  28. Stats w/ MapReduce
    mapFunc  =  function()  {emit(this.place,  1);}reduceFunc  =  
    function(key,  values)  {return  Array.sum(values);}res  =  
    db.checkins.mapReduce(mapFunc,reduceFunc,    {query:  
    {timestamp:  {$gt:nowminus3hrs}}})res  =  [{_id:”10gen  HQ”,  
    value:  17},  …..,  ….]  
    ... or try using the new aggregation framework!
    Available in MongoDB 2.2!

    View Slide

  29. Food For Thought

    View Slide

  30. Data How the App Wants It
    Think  About  How  the  Application  Wants  the  Data,  
    Not  How  it  is  most  “Normalized”  
     
    Example:  Our  Business  Cards  

    View Slide

  31. @mongodb  
    http://bit.ly/mongox    
    Facebook                    |                  Twitter                  |                  LinkedIn  
    http://linkd.in/joinmongo  
    More info at http://www.mongodb.org/
    Kevin  Hanson  
    Solutions  Architect,  10gen  
    twitter:  @hungarianhc  
    [email protected]

    View Slide