Upgrade to Pro — share decks privately, control downloads, hide ads and more …

DIBI workshop intro

rozza
October 07, 2013

DIBI workshop intro

rozza

October 07, 2013
Tweet

More Decks by rozza

Other Decks in Technology

Transcript

  1. •  Welcome! •  Who’s who? •  Introduction to MongoDB • 

    Tutorial & exercises •  High Availability tutorial if time Agenda
  2. About MongoDB •  Background –  Founded in 2007 –  First

    release of MongoDB in 2009 –  $231M in funding •  MongoDB –  Core server –  Native drivers •  Subscriptions, Consulting, Training •  Monitoring (MMS)
  3. RDBMS Strengths •  Data stored is very compact •  Rigid

    schemas have led to powerful query capabilities •  Data is optimised for joins and storage •  Robust ecosystem of tools, libraries, integrations •  40+ years old!
  4. Enter “Big Data” •  Gartner defines it with 3Vs • 

    Volume –  Vast amounts of data being collected •  Variety –  Evolving data –  Uncontrolled formats, no single schema –  Unknown at design time •  Velocity –  Inbound data speed –  Fast read/write operations –  Low latency
  5. Mapping Big Data to RDBMS •  Difficult to store uncontrolled

    data formats •  Scaling via big iron or custom data marts/partitioning schemes •  Schema must be known at design time •  Impedance mismatch with agile development and deployment techniques •  Doesn’t map well to native language constructs
  6. Goals •  Scale horizontally over commodity systems •  Incorporate what

    works for RDBMSs –  Rich data models, ad-hoc queries, full indexes •  Drop what doesn’t work well –  Multi-row transactions, complex joins •  Do not homogenize APIs •  Match agile development and deployment workflows
  7. Key Features •  Data stored as documents (JSON) –  Schema-free

    •  Full CRUD support (Create, Read, Update, Delete) –  Atomic in-place updates –  Ad-hoc queries: Equality, RegEx, Ranges, Geospatial •  Secondary indexes •  Replication – redundancy, failover •  Sharding – partitioning for read/write scalability
  8. Document Oriented, Schema Free {name: "will", eyes: "blue", birthplace: "NY",

    aliases: ["bill"], gender: "Male", boss: "ben"} {name: "tina", birthplace: "NCE", boss: "ben"} {name: "ross", boss: "ben"} {name: "ben", hat: "yes"} {name: "matt", pizza: "DiGiorno", age: 28}
  9. Extent allocation foo.0 foo.1 foo.2 00000000000 00000000000 00000000000 00000000000 00000000000

    00000000000 00000000000 00000000000 preallocated space 0000000000 0000 foo.$freelist foo.baz foo.bar foo.test allocated per namespace: ns details stored in foo.ns
  10. Seek = 5+ ms Read = really really fast Disk

    seeks and data locality User Comment Article
  11. MongoDB Security •  SSL –  Between your app and MongoDB

    –  Between nodes in MongoDB cluster •  Authorization at the database level –  Read Only / Read + Write / Administrator •  Roadmap –  2.4: Pluggable Authentication –  2.6: Cell level security
  12. _id •  _id is the primary key in MongoDB • 

    Automatically indexed •  Automatically created as an ObjectId if not provided •  Any unique immutable value could be used
  13. ObjectId •  ObjectId is a special 12 byte value • 

    Guaranteed to be unique across your cluster ObjectId("50ed3c5cab4ef39dc735664b") |-------------||---------||-----||----------| ts mac pid inc
  14. // find users with any tags > db.users.find( {tags: {$exists:

    true }} ) // find users matching a regular expression > db.users.find( {username: /^ro*/i } ) // count posts by author > db.users.find( {username: "Ross"} ).count() Query Operators Conditional Operators –  $all, $exists, $mod, $ne, $in, $nin, $nor, $or, $size, $type –  $lt, $lte, $gt, $gte"
  15. > tags = ["superuser", "db_admin"] > address = { street:

    "Scrutton Street", city: "London" } > db.users.update({}, {"$pushAll": {"tags": tags}, "$set": {"address": address}, "$inc": {"tag_count": 2}}) Update
  16. > db.users.findOne() { "_id" : ObjectId("50ed3c5cab4ef39dc735664b"), "address" : { "street"

    : "Zetland House", "city" : "London" }, "first_name" : "Ross", "last_name" : "Lawley", "tag_count" : 2, "tags" : [ "superuser", "db_admin" ], "username" : "ross" Read (Query)
  17. Atomic operators •  Scalar –  $set, $unset, $inc, •  Array

    –  $push, $pushAll, $pull, $pullAll, $addToSet"
  18. // 1 means ascending, -1 means descending > db.users.ensureIndex({username: 1})

    > db.users.find({username: "ross"}).explain() // Multi-key indexes > db.users.ensureIndex({tags: 1}) // index nested field > db.users.ensureIndex({"address.city": 1}) // Compound indexes > db.users.ensureIndex({ "username": 1, "address.city": 1 }) Secondary Indexes