DIBI workshop intro - Speaker Deck

Slide 1

Slide 1 text

MongoDB technical introduction Ross Lawley - @RossC0 #dibi13

Slide 2

Slide 2 text

•  Welcome! •  Who’s who? •  Introduction to MongoDB •  Tutorial & exercises •  High Availability tutorial if time Agenda

Slide 3

Slide 3 text

About MongoDB •  Background –  Founded in 2007 –  First release of MongoDB in 2009 –  $231M in funding •  MongoDB –  Core server –  Native drivers •  Subscriptions, Consulting, Training •  Monitoring (MMS)

Slide 4

Slide 4 text

Relational Databases

Slide 5

Slide 5 text

Relational Databases

Slide 6

Slide 6 text

RDBMS Strengths •  Data stored is very compact •  Rigid schemas have led to powerful query capabilities •  Data is optimised for joins and storage •  Robust ecosystem of tools, libraries, integrations •  40+ years old!

Slide 7

Slide 7 text

Enter “Big Data” •  Gartner deﬁnes it with 3Vs •  Volume –  Vast amounts of data being collected •  Variety –  Evolving data –  Uncontrolled formats, no single schema –  Unknown at design time •  Velocity –  Inbound data speed –  Fast read/write operations –  Low latency

Slide 8

Slide 8 text

Mapping Big Data to RDBMS •  Difﬁcult to store uncontrolled data formats •  Scaling via big iron or custom data marts/partitioning schemes •  Schema must be known at design time •  Impedance mismatch with agile development and deployment techniques •  Doesn’t map well to native language constructs

Slide 9

Slide 9 text

MongoDB Features

Slide 10

Slide 10 text

Goals •  Scale horizontally over commodity systems •  Incorporate what works for RDBMSs –  Rich data models, ad-hoc queries, full indexes •  Drop what doesn’t work well –  Multi-row transactions, complex joins •  Do not homogenize APIs •  Match agile development and deployment workﬂows

Slide 11

Slide 11 text

Key Features •  Data stored as documents (JSON) –  Schema-free •  Full CRUD support (Create, Read, Update, Delete) –  Atomic in-place updates –  Ad-hoc queries: Equality, RegEx, Ranges, Geospatial •  Secondary indexes •  Replication – redundancy, failover •  Sharding – partitioning for read/write scalability

Slide 12

Slide 12 text

Document Oriented, Schema Free {name: "will", eyes: "blue", birthplace: "NY", aliases: ["bill"], gender: "Male", boss: "ben"} {name: "tina", birthplace: "NCE", boss: "ben"} {name: "ross", boss: "ben"} {name: "ben", hat: "yes"} {name: "matt", pizza: "DiGiorno", age: 28}

Slide 13

Slide 13 text

BSON – bsonspec.org

Slide 14

Slide 14 text

Extent allocation foo.0 foo.1 foo.2 00000000000 00000000000 00000000000 00000000000 00000000000 00000000000 00000000000 00000000000 preallocated space 0000000000 0000 foo.$freelist foo.baz foo.bar foo.test allocated per namespace: ns details stored in foo.ns

Slide 15

Slide 15 text

Record Allocation Deleted Record (Size, Offset, Next) BSON Data Header (Size, Offset, Next, Prev) Padding ... ...

Slide 16

Slide 16 text

Seek = 5+ ms Read = really really fast Disk seeks and data locality User Comment Article

Slide 17

Slide 17 text

Article User Comment Comment Comment Comment Comment Disk seeks and data locality

Slide 18

Slide 18 text

MongoDB Security •  SSL –  Between your app and MongoDB –  Between nodes in MongoDB cluster •  Authorization at the database level –  Read Only / Read + Write / Administrator •  Roadmap –  2.4: Pluggable Authentication –  2.6: Cell level security

Slide 19

Slide 19 text

Working with MongoDB

Slide 20

Slide 20 text

user = { username: "ross", first_name: "Ross", last_name: "Lawley"} > db.users.insert(user) Create (Insert)

Slide 21

Slide 21 text

> db.users.findOne() { "_id" : ObjectId("50ed3c5cab4ef39dc735664b"), "username" : "ross", "first_name" : "Ross", "last_name" : "Lawley" } Read (Query)

Slide 22

Slide 22 text

_id •  _id is the primary key in MongoDB •  Automatically indexed •  Automatically created as an ObjectId if not provided •  Any unique immutable value could be used

Slide 23

Slide 23 text

ObjectId •  ObjectId is a special 12 byte value •  Guaranteed to be unique across your cluster ObjectId("50ed3c5cab4ef39dc735664b") |-------------||---------||-----||----------| ts mac pid inc

Slide 24

Slide 24 text

// find users with any tags > db.users.find( {tags: {$exists: true }} ) // find users matching a regular expression > db.users.find( {username: /^ro*/i } ) // count posts by author > db.users.find( {username: "Ross"} ).count() Query Operators Conditional Operators –  $all, $exists, $mod, $ne, $in, $nin, $nor, $or, $size, $type –  $lt, $lte, $gt, $gte"

Slide 25

Slide 25 text

> tags = ["superuser", "db_admin"] > address = { street: "Scrutton Street", city: "London" } > db.users.update({}, {"$pushAll": {"tags": tags}, "$set": {"address": address}, "$inc": {"tag_count": 2}}) Update

Slide 26

Slide 26 text

> db.users.findOne() { "_id" : ObjectId("50ed3c5cab4ef39dc735664b"), "address" : { "street" : "Zetland House", "city" : "London" }, "first_name" : "Ross", "last_name" : "Lawley", "tag_count" : 2, "tags" : [ "superuser", "db_admin" ], "username" : "ross" Read (Query)

Slide 27

Slide 27 text

Atomic operators •  Scalar –  $set, $unset, $inc, •  Array –  $push, $pushAll, $pull, $pullAll, $addToSet"

Slide 28

Slide 28 text

// 1 means ascending, -1 means descending > db.users.ensureIndex({username: 1}) > db.users.find({username: "ross"}).explain() // Multi-key indexes > db.users.ensureIndex({tags: 1}) // index nested field > db.users.ensureIndex({"address.city": 1}) // Compound indexes > db.users.ensureIndex({ "username": 1, "address.city": 1 }) Secondary Indexes

Slide 29

Slide 29 text

Enough talk, Lets get started!