Upgrade to Pro — share decks privately, control downloads, hide ads and more …

DIBI workshop intro

Avatar for rozza rozza
October 07, 2013

DIBI workshop intro

Avatar for rozza

rozza

October 07, 2013
Tweet

More Decks by rozza

Other Decks in Technology

Transcript

  1. •  Welcome! •  Who’s who? •  Introduction to MongoDB • 

    Tutorial & exercises •  High Availability tutorial if time Agenda
  2. About MongoDB •  Background –  Founded in 2007 –  First

    release of MongoDB in 2009 –  $231M in funding •  MongoDB –  Core server –  Native drivers •  Subscriptions, Consulting, Training •  Monitoring (MMS)
  3. RDBMS Strengths •  Data stored is very compact •  Rigid

    schemas have led to powerful query capabilities •  Data is optimised for joins and storage •  Robust ecosystem of tools, libraries, integrations •  40+ years old!
  4. Enter “Big Data” •  Gartner defines it with 3Vs • 

    Volume –  Vast amounts of data being collected •  Variety –  Evolving data –  Uncontrolled formats, no single schema –  Unknown at design time •  Velocity –  Inbound data speed –  Fast read/write operations –  Low latency
  5. Mapping Big Data to RDBMS •  Difficult to store uncontrolled

    data formats •  Scaling via big iron or custom data marts/partitioning schemes •  Schema must be known at design time •  Impedance mismatch with agile development and deployment techniques •  Doesn’t map well to native language constructs
  6. Goals •  Scale horizontally over commodity systems •  Incorporate what

    works for RDBMSs –  Rich data models, ad-hoc queries, full indexes •  Drop what doesn’t work well –  Multi-row transactions, complex joins •  Do not homogenize APIs •  Match agile development and deployment workflows
  7. Key Features •  Data stored as documents (JSON) –  Schema-free

    •  Full CRUD support (Create, Read, Update, Delete) –  Atomic in-place updates –  Ad-hoc queries: Equality, RegEx, Ranges, Geospatial •  Secondary indexes •  Replication – redundancy, failover •  Sharding – partitioning for read/write scalability
  8. Document Oriented, Schema Free {name: "will", eyes: "blue", birthplace: "NY",

    aliases: ["bill"], gender: "Male", boss: "ben"} {name: "tina", birthplace: "NCE", boss: "ben"} {name: "ross", boss: "ben"} {name: "ben", hat: "yes"} {name: "matt", pizza: "DiGiorno", age: 28}
  9. Extent allocation foo.0 foo.1 foo.2 00000000000 00000000000 00000000000 00000000000 00000000000

    00000000000 00000000000 00000000000 preallocated space 0000000000 0000 foo.$freelist foo.baz foo.bar foo.test allocated per namespace: ns details stored in foo.ns
  10. Seek = 5+ ms Read = really really fast Disk

    seeks and data locality User Comment Article
  11. MongoDB Security •  SSL –  Between your app and MongoDB

    –  Between nodes in MongoDB cluster •  Authorization at the database level –  Read Only / Read + Write / Administrator •  Roadmap –  2.4: Pluggable Authentication –  2.6: Cell level security
  12. _id •  _id is the primary key in MongoDB • 

    Automatically indexed •  Automatically created as an ObjectId if not provided •  Any unique immutable value could be used
  13. ObjectId •  ObjectId is a special 12 byte value • 

    Guaranteed to be unique across your cluster ObjectId("50ed3c5cab4ef39dc735664b") |-------------||---------||-----||----------| ts mac pid inc
  14. // find users with any tags > db.users.find( {tags: {$exists:

    true }} ) // find users matching a regular expression > db.users.find( {username: /^ro*/i } ) // count posts by author > db.users.find( {username: "Ross"} ).count() Query Operators Conditional Operators –  $all, $exists, $mod, $ne, $in, $nin, $nor, $or, $size, $type –  $lt, $lte, $gt, $gte"
  15. > tags = ["superuser", "db_admin"] > address = { street:

    "Scrutton Street", city: "London" } > db.users.update({}, {"$pushAll": {"tags": tags}, "$set": {"address": address}, "$inc": {"tag_count": 2}}) Update
  16. > db.users.findOne() { "_id" : ObjectId("50ed3c5cab4ef39dc735664b"), "address" : { "street"

    : "Zetland House", "city" : "London" }, "first_name" : "Ross", "last_name" : "Lawley", "tag_count" : 2, "tags" : [ "superuser", "db_admin" ], "username" : "ross" Read (Query)
  17. Atomic operators •  Scalar –  $set, $unset, $inc, •  Array

    –  $push, $pushAll, $pull, $pullAll, $addToSet"
  18. // 1 means ascending, -1 means descending > db.users.ensureIndex({username: 1})

    > db.users.find({username: "ross"}).explain() // Multi-key indexes > db.users.ensureIndex({tags: 1}) // index nested field > db.users.ensureIndex({"address.city": 1}) // Compound indexes > db.users.ensureIndex({ "username": 1, "address.city": 1 }) Secondary Indexes