MongoDB

MongoDB Pooria Azimi 1

Outline • What is NoSQL? • What is MongoDB? •
Who uses MongoDB? • JSON & BSON • Demo • Query Documents • Demo 2

• Cursors • Demo • Aggregation Framework • Map/Reduce •
MongoDB Drivers • (Good) MongoDB Use Cases 3 Outline

• Indexes • GridFS • DBaaS & Monitoring • ACID
• Mongoish PostgreSQL? • Learn More 4 Outline

What is NoSQL? 5

What is NoSQL? ‘Non-relational’ or ‘structured databases’ might be better
names. Informally, Anything but Oracle, MS SQL Server, MySQL, PostgreSQL, DB2, and other familiar relational databases (which happen to use SQL) 5

NoSQL 6

NoSQL • No ﬁxed table schema 6

NoSQL • No ﬁxed table schema • Complex data structures
6

• Scale horizontally 6

• Scale horizontally • Avoid joins and foreign keys 6

• Scale horizontally • Avoid joins and foreign keys • … 6

NoSQL Categories 7

NoSQL Categories • Key/value storage: Voldemort, Dynamo, Redis 7

NoSQL Categories • Key/value storage: Voldemort, Dynamo, Redis • Column
oriented: BigTable, Cassandra, HBase 7

oriented: BigTable, Cassandra, HBase • Document oriented: CouchDB, MongoDB 7

oriented: BigTable, Cassandra, HBase • Document oriented: CouchDB, MongoDB • Graph: Neo4j, InfoGrid 7

What’s Wrong With RDBMSes?* 8 * According to NoSQL proponents.
Might not actually be true!

What’s Wrong With RDBMSes?* • RDBMSes were originally built to
minimize disk footprint (by utilizing normalization and foreign keys), but now disk space is cheap – doubling disk space costs a fraction of doubling processor speed. 8 * According to NoSQL proponents. Might not actually be true!

minimize disk footprint (by utilizing normalization and foreign keys), but now disk space is cheap – doubling disk space costs a fraction of doubling processor speed. • Sometimes it’s hard to model data into proper relations, and most applications are not as relational as we hope. 8 * According to NoSQL proponents. Might not actually be true!

minimize disk footprint (by utilizing normalization and foreign keys), but now disk space is cheap – doubling disk space costs a fraction of doubling processor speed. • Sometimes it’s hard to model data into proper relations, and most applications are not as relational as we hope. • They’re not very good at scaling, and their distributed applications introduce signiﬁcant delay and network trafﬁc. 8 * According to NoSQL proponents. Might not actually be true!

When to Use RDBMS?* 9 * According to NoSQL proponents.

When to Use RDBMS?* • Great need for enforcing data
integrity 9 * According to NoSQL proponents. Might not actually be true!

integrity • Data is normalized (no orphans or duplicates) 9 * According to NoSQL proponents. Might not actually be true!

integrity • Data is normalized (no orphans or duplicates) • Result: • More tables • More relations • More keys • More indexes 9 * According to NoSQL proponents. Might not actually be true!

When to Use NoSQL?* 10 * According to NoSQL proponents.

When to Use NoSQL?* • Scalability problem (billions of writes/updates/queries
per day). 10 * According to NoSQL proponents. Might not actually be true!

per day). • Data does not have complex relations (like Facebook’s messages, which is key:value pairs – for the most part, anyway) 10 * According to NoSQL proponents. Might not actually be true!

per day). • Data does not have complex relations (like Facebook’s messages, which is key:value pairs – for the most part, anyway) • Real-time applications (like Facebook’s messages, which has to build inverted index on <keyword, message*> in real-time) 10 * According to NoSQL proponents. Might not actually be true!

per day). • Data does not have complex relations (like Facebook’s messages, which is key:value pairs – for the most part, anyway) • Real-time applications (like Facebook’s messages, which has to build inverted index on <keyword, message*> in real-time) • Automatic sharding and distributed applications (on a grid, with thousands of nodes and using map/reduce for aggregating data) 10 * According to NoSQL proponents. Might not actually be true!

per day). • Data does not have complex relations (like Facebook’s messages, which is key:value pairs – for the most part, anyway) • Real-time applications (like Facebook’s messages, which has to build inverted index on <keyword, message*> in real-time) • Automatic sharding and distributed applications (on a grid, with thousands of nodes and using map/reduce for aggregating data) • Need for versioning (with timestamps), without signiﬁcant overhead 10 * According to NoSQL proponents. Might not actually be true!

per day). • Data does not have complex relations (like Facebook’s messages, which is key:value pairs – for the most part, anyway) • Real-time applications (like Facebook’s messages, which has to build inverted index on <keyword, message*> in real-time) • Automatic sharding and distributed applications (on a grid, with thousands of nodes and using map/reduce for aggregating data) • Need for versioning (with timestamps), without signiﬁcant overhead • Prototyping (you don’t have to migrate database schema during development) 10 * According to NoSQL proponents. Might not actually be true!

NoSQL Databases 11

BigTable 12

BigTable • A high performance, sparse, distributed, multi-dimensional, sorted map,
built on top of Google File System (GFS), designed to handle petabytes of data, on a cluster of hundreds of thousands of nodes 12

BigTable • A high performance, sparse, distributed, multi-dimensional, sorted map,
built on top of Google File System (GFS), designed to handle petabytes of data, on a cluster of hundreds of thousands of nodes • Many Google services use BigTable: • Gmail, Google Reader • Google Maps, Google Earth • Google Code, Blogger • YouTube, Orkut • Google Search History 12 Source: http://research.google.com/archive/bigtable.html

HBase 13

HBase • A BigTable-like database running on top of Hadoop.
Currently a top priority project at Apache foundation. 13

HBase • A BigTable-like database running on top of Hadoop.
Currently a top priority project at Apache foundation. • Used by: • Facebook: Messages • Twitter: People search • Yahoo! • Adobe • ... thousands of other real-time applications 13 Source: http://wiki.apache.org/hadoop/Hbase/PoweredBy

• Facebook’s clone of Amazon Dynamo (storage engine of S3),
that powered Facebook’s messaging platform 14

that powered Facebook’s messaging platform • An Scalable, High-availability, fault tolerant, elastic and durable structured key-value storage with no single point of failure. 14

that powered Facebook’s messaging platform • An Scalable, High-availability, fault tolerant, elastic and durable structured key-value storage with no single point of failure. • Open sourced in 2008 14

that powered Facebook’s messaging platform • An Scalable, High-availability, fault tolerant, elastic and durable structured key-value storage with no single point of failure. • Open sourced in 2008 • Donated to Apache foundation in 2009 (They moved to HBase in late 2010) – currently a top priority project. 14

that powered Facebook’s messaging platform • An Scalable, High-availability, fault tolerant, elastic and durable structured key-value storage with no single point of failure. • Open sourced in 2008 • Donated to Apache foundation in 2009 (They moved to HBase in late 2010) – currently a top priority project. • Currently being used at: • Digg, Netﬂix, Cisco, Twitter, ... 14

• A document-oriented NoSQL database written in Erlang. Currently a
top priority project at Apache foundation. 15

• A document-oriented NoSQL database written in Erlang. Currently a
top priority project at Apache foundation. • Characteristics: • ACID, CRUD, highly distributed, eventually consistent, map/reduce, fault tolerant, RESTful APIs, ... • Many corporations and websites use CouchDB: http://wiki.apache.org/couchdb/CouchDB_in_the_wild 15

• In-memory, persistent, key-value store 16

• In-memory, persistent, key-value store • Unbelievably fast and efﬁcient
16

• Complex data structures (scalars, strings, list, hash, set, sorted set) 16

• Complex data structures (scalars, strings, list, hash, set, sorted set) • Pub/Sub – Expiring keys 16

• Complex data structures (scalars, strings, list, hash, set, sorted set) • Pub/Sub – Expiring keys • Supports transactions 16

• Complex data structures (scalars, strings, list, hash, set, sorted set) • Pub/Sub – Expiring keys • Supports transactions • Completely scriptable (Lua) 16

• Complex data structures (scalars, strings, list, hash, set, sorted set) • Pub/Sub – Expiring keys • Supports transactions • Completely scriptable (Lua) • High availability – Replication 16

Other NoSQL Databases 17

Other NoSQL Databases A list of all NoSQL databases is
available at: http://nosql-database.org/ 17

Other NoSQL Databases A list of all NoSQL databases is
available at: http://nosql-database.org/ Other notable NoSQL databases: • Voldemort (key/value storage) • neo4j (spatial and graphs) 17

• A high performance, scalable, document- oriented database 18

• A high performance, scalable, document- oriented database • Development
started at 2007 by 10gen 18

• A high performance, scalable, document- oriented database • Development
started at 2007 by 10gen • First public release at 2009 18

MongoDB is… 19

MongoDB is… • NoSQL 19

MongoDB is… • NoSQL • Schemaless 19

MongoDB is… • NoSQL • Schemaless • Map/Reduce 19

MongoDB is… • NoSQL • Schemaless • Map/Reduce • Document-oriented
19

• Not ACID-compliant 19

• Not ACID-compliant • Automatic sharding 19

• Not ACID-compliant • Automatic sharding • BSON (Binary JSON) style documents 19

Who Uses MongoDB? 20

• Content Management System: • Craigslist – used for archiving
old posts • SAP • VIACOM • whitehouse.gov • Operational Intelligence: • GitHub • intuit – 500,000+ websites 21

• User Data Management: • foursquare – 3 billion check-ins,
5 million per day • Disney – backend for their online gaming networks • IGN • SourceForge • Viber • Cisco • ebay • O2 • DISQUS 22

• High Volume Data feeds: • bit.ly – shortening 130M+
urls per day • CERN LHC – for data aggregation system • The New York Times • Forbes • Guardian • Stripe • ShareThis • Wordnik – 3.5TB of data across 20B records 23

• Metadata Management: • Springer • GILT • Shutterﬂy •
MTV Networks • Grooveshark • lulu 24

• Metadata Management: • Springer • GILT • Shutterﬂy •
MTV Networks • Grooveshark • lulu 24 Source: http://www.mongodb.org/about/production-deployments and http://www.slideshare.net/mongodb/nosql-now-2012-mongodb-use-cases

Growth 25

Growth 25 Source: http://www.ﬂickr.com/photos/sog/5909237401/

26 JSON (& BSON)

27 http://json.org http://docs.mongodb.org/manual/core/document/

user = { 'username': 'eve', 'age': 24, 'address': { 'city':
'new york', 'state': 'ny' }, 'privileges': [ 'user', { 'moderator': ['forum1', 'forum2', 'forum3'] } ], 'dead': false } 28

{ _id: 1, name: { first: 'John', last: 'Backus' },
birth: new Date('Dec 03, 1924'), death: new Date('Mar 17, 2007'), contribs: [ 'Fortran', 'ALGOL', 'Backus-Naur Form', 'FP' ], awards: [ { award: 'National Medal of Science', year: 1975, by: 'National Science Foundation' }, { award: 'Turing Award', year: 1977, by: 'ACM' } ] } 29

• BSON is JSON + a few additions 30

• BSON is JSON + a few additions • New
data types: • ObjectID • Timestamp • Date 30

31 • BSON documents can be quite large - the
hard limit is 16MB!

hard limit is 16MB! • BSON documents are used extensively in MongoDB:

hard limit is 16MB! • BSON documents are used extensively in MongoDB: • Record documents

hard limit is 16MB! • BSON documents are used extensively in MongoDB: • Record documents • Query speciﬁcation documents

hard limit is 16MB! • BSON documents are used extensively in MongoDB: • Record documents • Query speciﬁcation documents • Update speciﬁcation documents

hard limit is 16MB! • BSON documents are used extensively in MongoDB: • Record documents • Query specification documents • Update specification documents • Index specification documents

hard limit is 16MB! • BSON documents are used extensively in MongoDB: • Record documents • Query specification documents • Update specification documents • Index specification documents • Sort order specification documents

hard limit is 16MB! • BSON documents are used extensively in MongoDB: • Record documents • Query specification documents • Update specification documents • Index specification documents • Sort order specification documents • …

ObjectID ObjectId is a 12-byte BSON type, constructed using: •
a 4-byte value representing the seconds since the Unix epoch, • a 3-byte machine identiﬁer, • a 2-byte process id, and • a 3-byte counter, starting with a random value. 32 http://docs.mongodb.org/manual/reference/object-id/

> x = ObjectId() ObjectId("519260aed64d21e6a46eeba1") > y = ObjectId("519260aed64d21e6a46eeabb") ObjectId("519260aed64d21e6a46eeabb")
> y.getTimestamp() ISODate("2013-05-14T16:05:02Z") 33

Date getDate getDay getFullYear getHours getMilliseconds getMinutes getMonth getSeconds getTime
getTimezoneOffset getUTCDate getUTCDay getUTCFullYear getUTCHours getUTCMilliseconds getUTCMinutes getUTCMonth getUTCSeconds getYear setDate setFullYear setHours setMilliseconds setMinutes setMonth setSeconds setTime setUTCDate setUTCFullYear setUTCHours setUTCMilliseconds setUTCMinutes setUTCMonth setUTCSeconds setYear toDateString toGMTString toISOString tojson toLocaleDateString toLocaleString toLocaleTimeString toString toTimeString toUTCString 34

MongoDB Terminology 35 http://docs.mongodb.org/manual/reference/sql-comparison/

MongoDB Terminology 35 Relational DBs MongoDB database database index index
table collection row (tuple) (BSON) document column (BSON) ﬁeld primary key _id ﬁeld foreign key – http://docs.mongodb.org/manual/reference/sql-comparison/

> show dbs > use test > show collections >
help > db.help() > db.users.help() > db.users.find > db.users.find().help 36

DEMO 37

Query Documents 38

General syntax: db.collection.find( <query>, <projection> ) Example: db.users.find( { age:
{ $gte: 25 } }, { age: 1, name: 1 } ) 39

MongoDB query syntax: db.users.find( { age: { $gte: 25 }
}, { age: 1, name: 1 } ) SQL: SELECT _id, age, name FROM users WHERE age > 25; 40

MongoDB: db.users.find( {} ) db.users.find() SQL: SELECT * FROM users;
41

MongoDB: db.users.find( { age: { $gte: 25 } }, {}
) db.users.find( { age: { $gte: 25 } } ) SQL: SELECT * FROM users WHERE age > 25; 42

MongoDB: db.users.find( {}, { name: 0 } ) SQL: SELECT
_id, age, dob, height, weight FROM users; 43

Simple Equality Checks db.inventory.find( { type: "snacks" } ) db.inventory.find(
{ type: { $in: [ 'food', 'snacks' ] } } ) SELECT * FROM inventory WHERE type IN (‘food’, ‘snacks’); 44

Simple Equality Checks db.inventory.find( { type: 'food', price: { $lt:
9.95 } } ) db.inventory.find( { type: 'food', $or: [ { qty: { $gt: 100 } }, { price: { $lt: 9.95 } } ] } ) SELECT * FROM inventory WHERE type = ‘food‘ AND ( qty > 100 OR price < 9.95 ); 45

Subdocuments db.inventory.find( { producer: { company: 'ABC123', address: '123 Street'
} } ) db.inventory.find( { 'producer.company': 'ABC123' } ) 46

Arrays db.inventory.find( { tags: [ 'fruit', 'food', 'citrus' ] }
) db.inventory.find( { tags: 'fruit' } ) db.inventory.find( { 'tags.0' : 'fruit' } ) db.inventory.find( { memos: { $elemMatch: { memo : 'on time', by: 'shipping' } } } ) 47

Operators $gt, $gte, $lt, $lte $and, $or, $nor, $not $regex,
$where $exists, $size $all, $type $elemMatch, $in, $nin, $ne 48 http://docs.mongodb.org/manual/reference/operator/

Regex db.users.find( { name: /^P/ } ) SELECT * FROM
users WHERE name LIKE “P%”; db.users.find( { email: /@gmail.com$/ } ) SELECT * FROM users WHERE name LIKE “P%”; 49

Regex db.users.find( { username: /^P/i } ) db.users.find( { email:
/^[\w\d\-_+]{0,25}@[\w]{0,5}mail.com$/ } ) 50

.distinct() db.inventory.distinct( ‘name’, { type: "snacks" } ) SELECT DISTINCT(name)
FROM inventory WHERE type = “snacks”; 51

.ﬁndOne() db.inventory.findOne( { type: "snacks" } ) SELECT * FROM
inventory WHERE type = “snacks” LIMIT 1; 52

db.inventory.find( { tags: [ 'fruit', 'food', 'citrus' ] } ).count()
SELECT COUNT(*) FROM inventory WHERE tags IN (‘fruit’, ‘food’, ‘citrus’); 53 .count()

db.inventory.find( { tags: [ 'fruit', 'food', 'citrus' ] } ).sort(
{ purchased_at: 1, price: -1} ) SELECT * FROM inventory WHERE tags IN (‘fruit’, ‘food’, ‘citrus’) ORDER BY purchased_at ASC, price DESC; 54 .sort()

db.inventory.find( { tags: [ 'fruit', 'food', 'citrus' ] } ).sort(
{ purchased_at: 1, price: -1} ).limit(30) SELECT * FROM inventory WHERE tags IN (‘fruit’, ‘food’, ‘citrus’) ORDER BY purchased_at ASC, price DESC LIMIT 30; 55 .limit()

db.inventory.find().limit(30).skip(30*5 - 1) SELECT * FROM inventory LIMIT 30 OFFSET
149; (MySQL) SELECT * FROM inventory LIMIT 149, 30; 56 .skip()

DEMO 57

CRUD 58 http://docs.mongodb.org/manual/core/create/ http://docs.mongodb.org/manual/core/read/ http://docs.mongodb.org/manual/core/update/ http://docs.mongodb.org/manual/core/delete/

CRUD 59 http://docs.mongodb.org/manual/core/create/ http://docs.mongodb.org/manual/core/read/ http://docs.mongodb.org/manual/core/update/ http://docs.mongodb.org/manual/core/delete/

Cursors 60

var cursor = db.users.find( { 'age' : { $gte :
20 } } ); while (cursor.hasNext()) { curr_doc = cursor.next(); print(curr_doc.age); } 61

var index = 0; while (cursor.hasNext()) { u = cursor.next();
print((index++) + ") " + u.username + " : " + u.age*2); } 1) 8731040 : 46 2) 8731041 : 45 3) 8731042 : 46 4) 8731043 : 44 62

DEMO 63

Aggregation Framework 64

65 • Aggregation Framework is MongoDB’s GROUP BY

65 • Aggregation Framework is MongoDB’s GROUP BY • Perform
complex queries

complex queries • Simpler than Map/Reduce

complex queries • Simpler than Map/Reduce • Can $project the output to:

complex queries • Simpler than Map/Reduce • Can $project the output to: • Add or compute new ﬁelds

complex queries • Simpler than Map/Reduce • Can $project the output to: • Add or compute new ﬁelds • Create virtual sub-objects

complex queries • Simpler than Map/Reduce • Can $project the output to: • Add or compute new ﬁelds • Create virtual sub-objects • Extract sub-ﬁelds into top-level objects

• Like Unix shells, documents pass through multiple pipeline operators
• Pipeline operators can produce zero, one or multiple “new” documents 66 Pipeline

Pipeline Operators $project $match $limit $skip $unwind $group $sort $geoNear
67 http://docs.mongodb.org/manual/reference/aggregation/#pipeline

SQL to Aggregation Framework Mapping Chart 68 http://docs.mongodb.org/manual/reference/sql-aggregation-comparison/

SQL to Aggregation Framework Mapping Chart 68 SQL MongoDB Aggregation
WHERE $match GROUP BY $group HAVING $match SELECT $project ORDER BY $sort LIMIT $limit SUM $sum COUNT $sum http://docs.mongodb.org/manual/reference/sql-aggregation-comparison/

Examples 69

70 { "_id": "10280", "city": "NEW YORK", "state": "NY", "pop":
5574, "loc": [ -74.016323, 40.710537 ] } Document Format http://docs.mongodb.org/manual/tutorial/aggregation-examples/#states-with-populations-over-10-million

71 db.zipcodes.aggregate( { $group : { _id : "$state", totalPop
: { $sum : "$pop" } } }, { $match : { totalPop : { $gte : 10*1000 } } } ) States with Populations Over 10 Million

$group by _id 72 doc3 { … } doc7 {
… } doc14 { … } doc9 { … } doc12 { … } NY TX RI … doc19 { … } doc23 { … } doc24 { … }

$sum 73 NY TX totalPop: 19570 totalPop: 26059 RI IL
totalPop: 1050 totalPop: 12875 NH totalPop: 1320 PY totalPop: 12763

$match 74 NY TX totalPop: 19570 totalPop: 26059 RI IL
totalPop: 1050 totalPop: 12875 NH totalPop: 1320 PY totalPop: 12763

75 { _id: ‘NY’, totalPop: 19570 }, { _id: ‘TX’,
totalPop: 26059 }, { _id: ‘IL’, totalPop: 12875 }, { _id: ‘PY’, totalPop: 12763 }

In SQL SELECT state, SUM(pop) AS totalPop FROM zips GROUP
BY state HAVING pop > (10*1000) 76

77 { "_id": "10280", "city": "NEW YORK", "state": "NY", "pop":
5574, "loc": [ -74.016323, 40.710537 ] } Document Format http://docs.mongodb.org/manual/tutorial/aggregation-examples/#largest-and-smallest-cities-by-state

78 db.zipcodes.aggregate( { $group: { _id: { state: "$state", city:
"$city" }, pop: { $sum: "$pop" } } }, { $sort: { pop: 1 } }, { $group: { _id : "$_id.state", biggestCity: { $last: "$_id.city" }, biggestPop: { $last: "$pop" }, smallestCity: { $first: "$_id.city" }, smallestPop: { $first: "$pop" } } }, { $project: { _id: 0, state: "$_id", biggestCity: { name: "$biggestCity", pop: "$biggestPop" }, smallestCity: { name: "$smallestCity", pop: "$smallestPop" } } } ) Largest and Smallest Cities by State

1st $group 79 { "_id" : { "state" : "CO",
"city" : "EDGEWATER" }, "pop" : 13154 }

2nd $group 80 { "_id" : "WA", "biggestCity" : "SEATTLE",
"biggestPop" : 520096, "smallestCity" : "BENGE", "smallestPop" : 2 }

$project 81 { "state" : "RI", "biggestCity" : { "name"
: "CRANSTON", "pop" : 176404 }, "smallestCity" : { "name" : "CLAYVILLE", "pop" : 45 } }

82 db.zipcodes.aggregate( { $group: { _id: { state: "$state", city:
"$city" }, pop: { $sum: "$pop" } } }, { $sort: { pop: 1 } }, { $group: { _id : "$_id.state", biggestCity: { $last: "$_id.city" }, biggestPop: { $last: "$pop" }, smallestCity: { $first: "$_id.city" }, smallestPop: { $first: "$pop" } } }, { $project: { _id: 0, state: "$_id", biggestCity: { name: "$biggestCity", pop: "$biggestPop" }, smallestCity: { name: "$smallestCity", pop: "$smallestPop" } } } ) Largest and Smallest Cities by State

83 A, a pop: 114 A, b pop: 14 A,
c pop: 93 B, a pop: 18 B, b pop: 44 B, c pop: 64 B, d pop: 65 B, e pop: 23 B, f pop: 112 C, a pop: 65 C, b pop: 13 D, a pop: 65 D, b pop: 87 D, c pop: 142 D, e pop: 123 D, f pop: 98 E, a pop: 23 E, b pop: 61 E, c pop: 27 E, d pop: 51 E,e pop: 92 E, f pop: 3 E, g pop: 64 E, h pop: 57 1st $group

$sort 84 A, a pop: 114 A, b pop: 14
A, c pop: 93 B, a pop: 18 B, b pop: 44 B, c pop: 64 B, d pop: 65 B, e pop: 23 B, f pop: 112 C, a pop: 65 C, b pop: 13 D, a pop: 65 D, b pop: 87 D, c pop: 142 D, e pop: 123 D, f pop: 98 E, a pop: 23 E, b pop: 61 E, c pop: 27 E, d pop: 51 E,e pop: 92 E, f pop: 3 E, g pop: 64 E, h pop: 57

2nd $group (1) E, f pop: 3 85 A, a
pop: 114 A, b pop: 14 A, c pop: 93 B, a pop: 18 B, e pop: 23 C, a pop: 65 C, b pop: 13 D, a pop: 65 D, b pop: 87 D, c pop: 142 D, e pop: 123 D, f pop: 98 B, b pop: 44 B, c pop: 64 B, d pop: 65 B, f pop: 112 E, a pop: 23 E, c pop: 27 E, d pop: 51 E, h pop: 57 E, b pop: 61 E, g pop: 64 E,e pop: 92 A B C D E

2nd $group (2) E, f pop: 3 86 A, a
pop: 114 A, b pop: 14 A, c pop: 93 B, a pop: 18 B, e pop: 23 C, a pop: 65 C, b pop: 13 D, a pop: 65 D, b pop: 87 D, c pop: 142 D, e pop: 123 D, f pop: 98 B, b pop: 44 B, c pop: 64 B, d pop: 65 B, f pop: 112 E, a pop: 23 E, c pop: 27 E, d pop: 51 E, h pop: 57 E, b pop: 61 E, g pop: 64 E,e pop: 92 A B C D E

2nd $group (3) SmallestPop E, f pop: 3 87 BiggestPop
A, a pop: 114 SmallestPop A, b pop: 14 SmallestPop B, a pop: 18 BiggestPop C, a pop: 65 SmallestPop C, b pop: 13 SmallestPop D, a pop: 65 BiggestPop D, c pop: 142 BiggestPop B, f pop: 112 A B C D E BiggestPop E,e pop: 92

88 { _id : "jane", joined : ISODate("2011-03-02"), likes :
["golf", "racquetball"] } { _id : "joe", joined : ISODate("2012-07-02"), likes : ["tennis", "golf", "swimming"] } Document Format http://docs.mongodb.org/manual/tutorial/aggregation-examples/#aggregation-with-user-preference-data

89 db.users.aggregate( [ { $project : { name: { $toUpper:
"$_id"} , _id: 0 } }, { $sort : { name : 1 } } ] ) Normalize and Sort Documents

90 { "name" : "JANE" }, { "name" : "JILL"
}, { "name" : "JOE" }

91 db.users.aggregate( [ { $project : { month_joined : {
$month : "$joined" }, name : "$_id", _id : 0 }, { $sort : { month_joined : 1 } } ] ) Return Usernames Ordered by Join Month

92 { "month_joined" : 1, "name" : "ruth" }, {
"month_joined" : 1, "name" : "harold" }, { "month_joined" : 1, "name" : "kate" } { "month_joined" : 2, "name" : "jill" }

93 db.users.aggregate( [ { $project : { month_joined : {
$month : "$joined" } } } , { $group : { _id : { month_joined:"$month_joined" } , number : { $sum : 1 } } }, { $sort : { "$_id.month_joined" : 1 } } ] ) Return Total Number of Joins per Month

94 { "_id" : { "month_joined" : 1 }, "number"
: 3 }, { "_id" : { "month_joined" : 2 }, "number" : 9 }, { "_id" : { "month_joined" : 3 }, "number" : 5 }

95 db.users.aggregate( [ { $unwind : "$likes" }, { $group
: { _id : "$likes" , number : { $sum : 1 } } }, { $sort : { number : -1 } }, { $limit : 5 } ] ) Return the Five Most Common “Likes”

96 { "_id" : "golf", "number" : 33 }, {
"_id" : "racquetball", "number" : 31 }, { "_id" : "swimming", "number" : 24 }, { "_id" : "handball", "number" : 19 }, { "_id" : "tennis", "number" : 18 }

Map/Reduce 97

• Map • Called once per document • Can emit
zero, one or more “new” documents (<key, value>) • Reduce • Called once per key emitted • Processes <key, Array[value]> and reduces all the values into a single one • Finalize • Optional – Rounds up all the reduced data 98

Drivers and Client Libraries 99

100 http://docs.mongodb.org/ecosystem/drivers/java and http://api.mongodb.org/java/current/ import com.mongodb.*; import java.util.Arrays; MongoClient mongoClient
= new MongoClient( "localhost" , 27017 ); DB db = mongoClient.getDB( "mydb" ); // Getting a List Of Collections Set<String> colls = db.getCollectionNames(); for (String s : colls) { System.out.println(s); } DBCollection coll = db.getCollection("testCollection"); BasicDBObject doc = new BasicDBObject("name", "MongoDB"). append("type", "database"). append("count", 1). append("info", new BasicDBObject("x", 203).append("y", 102)); coll.insert(doc); System.out.println(coll.getCount()); Java

101 http://docs.mongodb.org/ecosystem/drivers/java and http://api.mongodb.org/java/current/ // Using a Cursor to Get
All the Documents DBCursor cursor = coll.find(); try { while(cursor.hasNext()) { System.out.println(cursor.next()); } } finally { cursor.close(); } // Getting A Single Document with A Query BasicDBObject query = new BasicDBObject("i", 71); cursor = coll.find(query); try { while(cursor.hasNext()) { System.out.println(cursor.next()); } } finally { cursor.close(); } Java

PHP 102 http://docs.mongodb.org/ecosystem/drivers/php and http://ir2.php.net/mongo/ require 'rubygems' require 'mongo' include
Mongo @client = MongoClient.new('localhost', 27017) @db = @client['sample-db'] @coll = @db['test'] @coll.remove 3.times do |i| @coll.insert({'a' => i+1}) end puts "There are #{@coll.count} records. Here they are:" @coll.find.each { |doc| puts doc.inspect } <?php // connect $m = new MongoClient(); // select a database $db = $m->comedy; // select a collection (analogous to a relational database's table) $collection = $db->cartoons; // add a record $document = array( "title" => "Calvin and Hobbes", "author" => "Bill Watterson" ); $collection->insert($document); // add another record, with a different "shape" $document = array( "title" => "XKCD", "online" => true ); $collection->insert($document); // find everything in the collection $cursor = $collection->find(); // iterate through the results foreach ($cursor as $document) { echo $document["title"] . "\n"; } ?>

Ruby require 'rubygems' require 'mongo' include Mongo @client = MongoClient.new('localhost',
27017) @db = @client['sample-db'] @coll = @db['test'] @coll.remove 3.times do |i| @coll.insert({'a' => i+1}) end puts "There are #{@coll.count} records. Here they are:" @coll.find.each { |doc| puts doc.inspect } 103 http://docs.mongodb.org/ecosystem/drivers/ruby and http://api.mongodb.org/ruby/current/

(Good) Use Cases 104

Important Factors • Existing skill set and tooling • Existing
architecture and infrastructure • Growth expectation 105 http://www.palominodb.com/blog/2012/03/06/when-mongodb-right-choice-your-business-we-explore-detailed-use-cases

• Prototyping • Fast, rapid schema changes • Logging •
Asynchronous logs • Capped collections • Flexible structure (schemaless) 106

• Archiving • Old data might be in a different
format • Craigslist • Content Management • SAP / Wordnik • Queue Management 107

Index 108

• Index any property • Index properties of subdocuments and
sub- subdocuments • Arrays! • Compound, reverse, unique, hashed, sparse, geospatial and text index types 109

> db.inventory.find( { type: 'food' } ).explain() { "cursor" :
"BasicCursor", "isMultiKey" : false, "n" : 5, "nscannedObjects" : 4000006, "nscanned" : 4000006, "nscannedObjectsAllPlans" : 4000006, "nscannedAllPlans" : 4000006, "scanAndOrder" : false, "indexOnly" : false, "nYields" : 2, "nChunkSkips" : 0, "millis" : 1591, "indexBounds" : { }, "server" : "mongodb0.example.net:27017" } 110

// ascending db.inventory.ensureIndex( { type: 1 } ) // descending
db.inventory.ensureIndex( { created_at: -1 } ) // non-blocking in the background db.inventory.ensureIndex( { type: 1 }, { background: true } ) // compound db.inventory.ensureIndex( { name: 1, type: 1 } ) // sparse db.collection.ensureIndex( { a: 1 }, { sparse: true } ) 111

> db.inventory.find( { type: 'food' } ).explain() { "cursor" :
"BtreeCursor type_1", "isMultiKey" : false, "n" : 5, "nscannedObjects" : 5, "nscanned" : 5, "nscannedObjectsAllPlans" : 5, "nscannedAllPlans" : 5, "scanAndOrder" : false, "indexOnly" : false, "nYields" : 0, "nChunkSkips" : 0, "millis" : 0, "indexBounds" : { "type" : [ [ "food", "food" ] ] }, "server" : "mongodbo0.example.net:27017" } 112

GridFS 113

• Can store large objects (larger than 16MB) • Files
are divided into 256KB chunks, and can be re-assembled fully or partially • No need to load the whole ﬁle into memory (can “skip” to the middle of a video) • Uses fs.chunks and fs.files collections by default 114 http://docs.mongodb.org/manual/core/gridfs/

// returns default GridFS bucket (i.e. "fs" collection) GridFS myFS
= new GridFS(myDatabase); // saves the file to "fs" GridFS bucket myFS.createFile(new File("/tmp/largething.mpg")); // returns GridFS bucket named "contracts" GridFS myContracts = new GridFS(myDatabase, "contracts"); // retrieve GridFS object "smithco" GridFSDBFile file = myContracts.findOne("smithco"); // saves the GridFS file to the file system file.writeTo(new File("/tmp/smithco.pdf")); 115 Java

116 # Write a file on disk to the Grid
file = File.open('image.jpg') grid = Mongo::Grid.new(db) id = grid.put(file) # Retrieve the file file = grid.get(id) file.read # Get all the file's metadata file.filename file.content_type file.metadata Ruby

DBaaS 117

118 https://www.mongohq.com/pricing

119 https://mongolab.com/products/pricing/

Monitoring 120

$ mongostat MongoHub.app http://blog.mongohq.com/blog/2012/10/10/new-mongohq/ 121

ACID 123

http://css.dzone.com/articles/how-acid-mongodb ACID 124 http://en.wikipedia.org/wiki/ACID

http://css.dzone.com/articles/how-acid-mongodb ACID ⎕ Atomicity requires that each transaction is executed
in its entirety, or fail without any change being applied. 124 http://en.wikipedia.org/wiki/ACID

in its entirety, or fail without any change being applied. ⎕ Consistency requires that the database only passes from a valid state to the next one, without intermediate points. 124 http://en.wikipedia.org/wiki/ACID

in its entirety, or fail without any change being applied. ⎕ Consistency requires that the database only passes from a valid state to the next one, without intermediate points. ⎕ Isolation requires that if transactions are executed concurrently, the result is equivalent to their serial execution. A transaction cannot see the partial result of the application of another one. 124 http://en.wikipedia.org/wiki/ACID

in its entirety, or fail without any change being applied. ⎕ Consistency requires that the database only passes from a valid state to the next one, without intermediate points. ⎕ Isolation requires that if transactions are executed concurrently, the result is equivalent to their serial execution. A transaction cannot see the partial result of the application of another one. ⎕ Durability means that the the result of a committed transaction is permanent, even if the database crashes immediately or in the event of a power loss. 124 http://en.wikipedia.org/wiki/ACID

http://css.dzone.com/articles/how-acid-mongodb ACID ☒ Atomicity requires that each transaction is executed
in its entirety, or fail without any change being applied. ⎕ Consistency requires that the database only passes from a valid state to the next one, without intermediate points. ⎕ Isolation requires that if transactions are executed concurrently, the result is equivalent to their serial execution. A transaction cannot see the partial result of the application of another one. ⎕ Durability means that the the result of a committed transaction is permanent, even if the database crashes immediately or in the event of a power loss. 125 http://en.wikipedia.org/wiki/ACID

in its entirety, or fail without any change being applied. ☒ Consistency requires that the database only passes from a valid state to the next one, without intermediate points. ⎕ Isolation requires that if transactions are executed concurrently, the result is equivalent to their serial execution. A transaction cannot see the partial result of the application of another one. ⎕ Durability means that the the result of a committed transaction is permanent, even if the database crashes immediately or in the event of a power loss. 126 http://en.wikipedia.org/wiki/ACID

in its entirety, or fail without any change being applied. ☒ Consistency requires that the database only passes from a valid state to the next one, without intermediate points. ☑ Isolation requires that if transactions are executed concurrently, the result is equivalent to their serial execution. A transaction cannot see the partial result of the application of another one. ⎕ Durability means that the the result of a committed transaction is permanent, even if the database crashes immediately or in the event of a power loss. 127 http://en.wikipedia.org/wiki/ACID

in its entirety, or fail without any change being applied. ☒ Consistency requires that the database only passes from a valid state to the next one, without intermediate points. ☑ Isolation requires that if transactions are executed concurrently, the result is equivalent to their serial execution. A transaction cannot see the partial result of the application of another one. ☒ Durability means that the the result of a committed transaction is permanent, even if the database crashes immediately or in the event of a power loss. 128 http://en.wikipedia.org/wiki/ACID

Atomic Operations • $set • $unset • $inc • $push
• $pushAll • $pop • $pull • $pullAll • $addToSet • $rename 129

Mongoish PostgreSQL? 130

• PostgreSQL can store “schemaless” data: • XML • hstore
• JSON 131

• A hierarchical data structure speciﬁc to PostgreSQL • Maps
string keys to string values, or other hstore values • h->"a" (get value for key a) • h?"a" (does h contain key a?) • h@>"a->2" (does key a contain 2?) 132 hstore

• Validates JSON data (when storing) • Expression indexing •
PL/V8 133 JSON https://news.ycombinator.com/item?id=5467865 https://wiki.postgresql.org/images/b/b4/Pg-as-nosql-pgday-fosdem-2013.pdf

Docs & Tutorials 135

Docs & Tutorials • MongoDB Docs http://docs.mongodb.org/manual/ • MongoTips http://www.mongotips.com
• “The Little MongoDB Book” http://openmymind.net/mongodb.pdf • “Why MongoDB Is Awesome?” http://www.slideshare.net/jnunemaker/why-mongodb-is-awesome 135

MongoDB

MongoDB

More Decks by Pooria Azimi

Other Decks in Programming

Featured

Transcript