Upgrade to Pro — share decks privately, control downloads, hide ads and more …

MongoDB

 MongoDB

Pooria Azimi

May 16, 2013
Tweet

More Decks by Pooria Azimi

Other Decks in Programming

Transcript

  1. Outline • What is NoSQL? • What is MongoDB? •

    Who uses MongoDB? • JSON & BSON • Demo • Query Documents • Demo 2
  2. • Cursors • Demo • Aggregation Framework • Map/Reduce •

    MongoDB Drivers • (Good) MongoDB Use Cases 3 Outline
  3. • Indexes • GridFS • DBaaS & Monitoring • ACID

    • Mongoish PostgreSQL? • Learn More 4 Outline
  4. What is NoSQL? ‘Non-relational’ or ‘structured databases’ might be better

    names. Informally, Anything but Oracle, MS SQL Server, MySQL, PostgreSQL, DB2, and other familiar relational databases (which happen to use SQL) 5
  5. NoSQL • No fixed table schema • Complex data structures

    • Scale horizontally • Avoid joins and foreign keys 6
  6. NoSQL • No fixed table schema • Complex data structures

    • Scale horizontally • Avoid joins and foreign keys • … 6
  7. NoSQL Categories • Key/value storage: Voldemort, Dynamo, Redis • Column

    oriented: BigTable, Cassandra, HBase • Document oriented: CouchDB, MongoDB 7
  8. NoSQL Categories • Key/value storage: Voldemort, Dynamo, Redis • Column

    oriented: BigTable, Cassandra, HBase • Document oriented: CouchDB, MongoDB • Graph: Neo4j, InfoGrid 7
  9. What’s Wrong With RDBMSes?* • RDBMSes were originally built to

    minimize disk footprint (by utilizing normalization and foreign keys), but now disk space is cheap – doubling disk space costs a fraction of doubling processor speed. 8 * According to NoSQL proponents. Might not actually be true!
  10. What’s Wrong With RDBMSes?* • RDBMSes were originally built to

    minimize disk footprint (by utilizing normalization and foreign keys), but now disk space is cheap – doubling disk space costs a fraction of doubling processor speed. • Sometimes it’s hard to model data into proper relations, and most applications are not as relational as we hope. 8 * According to NoSQL proponents. Might not actually be true!
  11. What’s Wrong With RDBMSes?* • RDBMSes were originally built to

    minimize disk footprint (by utilizing normalization and foreign keys), but now disk space is cheap – doubling disk space costs a fraction of doubling processor speed. • Sometimes it’s hard to model data into proper relations, and most applications are not as relational as we hope. • They’re not very good at scaling, and their distributed applications introduce significant delay and network traffic. 8 * According to NoSQL proponents. Might not actually be true!
  12. When to Use RDBMS?* • Great need for enforcing data

    integrity 9 * According to NoSQL proponents. Might not actually be true!
  13. When to Use RDBMS?* • Great need for enforcing data

    integrity • Data is normalized (no orphans or duplicates) 9 * According to NoSQL proponents. Might not actually be true!
  14. When to Use RDBMS?* • Great need for enforcing data

    integrity • Data is normalized (no orphans or duplicates) • Result: • More tables • More relations • More keys • More indexes 9 * According to NoSQL proponents. Might not actually be true!
  15. When to Use NoSQL?* • Scalability problem (billions of writes/updates/queries

    per day). 10 * According to NoSQL proponents. Might not actually be true!
  16. When to Use NoSQL?* • Scalability problem (billions of writes/updates/queries

    per day). • Data does not have complex relations (like Facebook’s messages, which is key:value pairs – for the most part, anyway) 10 * According to NoSQL proponents. Might not actually be true!
  17. When to Use NoSQL?* • Scalability problem (billions of writes/updates/queries

    per day). • Data does not have complex relations (like Facebook’s messages, which is key:value pairs – for the most part, anyway) • Real-time applications (like Facebook’s messages, which has to build inverted index on <keyword, message*> in real-time) 10 * According to NoSQL proponents. Might not actually be true!
  18. When to Use NoSQL?* • Scalability problem (billions of writes/updates/queries

    per day). • Data does not have complex relations (like Facebook’s messages, which is key:value pairs – for the most part, anyway) • Real-time applications (like Facebook’s messages, which has to build inverted index on <keyword, message*> in real-time) • Automatic sharding and distributed applications (on a grid, with thousands of nodes and using map/reduce for aggregating data) 10 * According to NoSQL proponents. Might not actually be true!
  19. When to Use NoSQL?* • Scalability problem (billions of writes/updates/queries

    per day). • Data does not have complex relations (like Facebook’s messages, which is key:value pairs – for the most part, anyway) • Real-time applications (like Facebook’s messages, which has to build inverted index on <keyword, message*> in real-time) • Automatic sharding and distributed applications (on a grid, with thousands of nodes and using map/reduce for aggregating data) • Need for versioning (with timestamps), without significant overhead 10 * According to NoSQL proponents. Might not actually be true!
  20. When to Use NoSQL?* • Scalability problem (billions of writes/updates/queries

    per day). • Data does not have complex relations (like Facebook’s messages, which is key:value pairs – for the most part, anyway) • Real-time applications (like Facebook’s messages, which has to build inverted index on <keyword, message*> in real-time) • Automatic sharding and distributed applications (on a grid, with thousands of nodes and using map/reduce for aggregating data) • Need for versioning (with timestamps), without significant overhead • Prototyping (you don’t have to migrate database schema during development) 10 * According to NoSQL proponents. Might not actually be true!
  21. BigTable • A high performance, sparse, distributed, multi-dimensional, sorted map,

    built on top of Google File System (GFS), designed to handle petabytes of data, on a cluster of hundreds of thousands of nodes 12
  22. BigTable • A high performance, sparse, distributed, multi-dimensional, sorted map,

    built on top of Google File System (GFS), designed to handle petabytes of data, on a cluster of hundreds of thousands of nodes • Many Google services use BigTable: • Gmail, Google Reader • Google Maps, Google Earth • Google Code, Blogger • YouTube, Orkut • Google Search History 12 Source: http://research.google.com/archive/bigtable.html
  23. HBase • A BigTable-like database running on top of Hadoop.

    Currently a top priority project at Apache foundation. 13
  24. HBase • A BigTable-like database running on top of Hadoop.

    Currently a top priority project at Apache foundation. • Used by: • Facebook: Messages • Twitter: People search • Yahoo! • Adobe • ... thousands of other real-time applications 13 Source: http://wiki.apache.org/hadoop/Hbase/PoweredBy
  25. 14

  26. • Facebook’s clone of Amazon Dynamo (storage engine of S3),

    that powered Facebook’s messaging platform 14
  27. • Facebook’s clone of Amazon Dynamo (storage engine of S3),

    that powered Facebook’s messaging platform • An Scalable, High-availability, fault tolerant, elastic and durable structured key-value storage with no single point of failure. 14
  28. • Facebook’s clone of Amazon Dynamo (storage engine of S3),

    that powered Facebook’s messaging platform • An Scalable, High-availability, fault tolerant, elastic and durable structured key-value storage with no single point of failure. • Open sourced in 2008 14
  29. • Facebook’s clone of Amazon Dynamo (storage engine of S3),

    that powered Facebook’s messaging platform • An Scalable, High-availability, fault tolerant, elastic and durable structured key-value storage with no single point of failure. • Open sourced in 2008 • Donated to Apache foundation in 2009 (They moved to HBase in late 2010) – currently a top priority project. 14
  30. • Facebook’s clone of Amazon Dynamo (storage engine of S3),

    that powered Facebook’s messaging platform • An Scalable, High-availability, fault tolerant, elastic and durable structured key-value storage with no single point of failure. • Open sourced in 2008 • Donated to Apache foundation in 2009 (They moved to HBase in late 2010) – currently a top priority project. • Currently being used at: • Digg, Netflix, Cisco, Twitter, ... 14
  31. 15

  32. • A document-oriented NoSQL database written in Erlang. Currently a

    top priority project at Apache foundation. 15
  33. • A document-oriented NoSQL database written in Erlang. Currently a

    top priority project at Apache foundation. • Characteristics: • ACID, CRUD, highly distributed, eventually consistent, map/reduce, fault tolerant, RESTful APIs, ... • Many corporations and websites use CouchDB: http://wiki.apache.org/couchdb/CouchDB_in_the_wild 15
  34. 16

  35. • In-memory, persistent, key-value store • Unbelievably fast and efficient

    • Complex data structures (scalars, strings, list, hash, set, sorted set) 16
  36. • In-memory, persistent, key-value store • Unbelievably fast and efficient

    • Complex data structures (scalars, strings, list, hash, set, sorted set) • Pub/Sub – Expiring keys 16
  37. • In-memory, persistent, key-value store • Unbelievably fast and efficient

    • Complex data structures (scalars, strings, list, hash, set, sorted set) • Pub/Sub – Expiring keys • Supports transactions 16
  38. • In-memory, persistent, key-value store • Unbelievably fast and efficient

    • Complex data structures (scalars, strings, list, hash, set, sorted set) • Pub/Sub – Expiring keys • Supports transactions • Completely scriptable (Lua) 16
  39. • In-memory, persistent, key-value store • Unbelievably fast and efficient

    • Complex data structures (scalars, strings, list, hash, set, sorted set) • Pub/Sub – Expiring keys • Supports transactions • Completely scriptable (Lua) • High availability – Replication 16
  40. Other NoSQL Databases A list of all NoSQL databases is

    available at: http://nosql-database.org/ 17
  41. Other NoSQL Databases A list of all NoSQL databases is

    available at: http://nosql-database.org/ Other notable NoSQL databases: • Voldemort (key/value storage) • neo4j (spatial and graphs) 17
  42. 18

  43. • A high performance, scalable, document- oriented database • Development

    started at 2007 by 10gen • First public release at 2009 18
  44. MongoDB is… • NoSQL • Schemaless • Map/Reduce • Document-oriented

    • Not ACID-compliant • Automatic sharding 19
  45. MongoDB is… • NoSQL • Schemaless • Map/Reduce • Document-oriented

    • Not ACID-compliant • Automatic sharding • BSON (Binary JSON) style documents 19
  46. • Content Management System: • Craigslist – used for archiving

    old posts • SAP • VIACOM • whitehouse.gov • Operational Intelligence: • GitHub • intuit – 500,000+ websites 21
  47. • User Data Management: • foursquare – 3 billion check-ins,

    5 million per day • Disney – backend for their online gaming networks • IGN • SourceForge • Viber • Cisco • ebay • O2 • DISQUS 22
  48. • High Volume Data feeds: • bit.ly – shortening 130M+

    urls per day • CERN LHC – for data aggregation system • The New York Times • Forbes • Guardian • Stripe • ShareThis • Wordnik – 3.5TB of data across 20B records 23
  49. • Metadata Management: • Springer • GILT • Shutterfly •

    MTV Networks • Grooveshark • lulu 24 Source: http://www.mongodb.org/about/production-deployments and http://www.slideshare.net/mongodb/nosql-now-2012-mongodb-use-cases
  50. user = { 'username': 'eve', 'age': 24, 'address': { 'city':

    'new york', 'state': 'ny' }, 'privileges': [ 'user', { 'moderator': ['forum1', 'forum2', 'forum3'] } ], 'dead': false } 28
  51. { _id: 1, name: { first: 'John', last: 'Backus' },

    birth: new Date('Dec 03, 1924'), death: new Date('Mar 17, 2007'), contribs: [ 'Fortran', 'ALGOL', 'Backus-Naur Form', 'FP' ], awards: [ { award: 'National Medal of Science', year: 1975, by: 'National Science Foundation' }, { award: 'Turing Award', year: 1977, by: 'ACM' } ] } 29
  52. 30

  53. • BSON is JSON + a few additions • New

    data types: • ObjectID • Timestamp • Date 30
  54. 31

  55. 31 • BSON documents can be quite large - the

    hard limit is 16MB! • BSON documents are used extensively in MongoDB:
  56. 31 • BSON documents can be quite large - the

    hard limit is 16MB! • BSON documents are used extensively in MongoDB: • Record documents
  57. 31 • BSON documents can be quite large - the

    hard limit is 16MB! • BSON documents are used extensively in MongoDB: • Record documents • Query specification documents
  58. 31 • BSON documents can be quite large - the

    hard limit is 16MB! • BSON documents are used extensively in MongoDB: • Record documents • Query specification documents • Update specification documents
  59. 31 • BSON documents can be quite large - the

    hard limit is 16MB! • BSON documents are used extensively in MongoDB: • Record documents • Query specification documents • Update specification documents • Index specification documents
  60. 31 • BSON documents can be quite large - the

    hard limit is 16MB! • BSON documents are used extensively in MongoDB: • Record documents • Query specification documents • Update specification documents • Index specification documents • Sort order specification documents
  61. 31 • BSON documents can be quite large - the

    hard limit is 16MB! • BSON documents are used extensively in MongoDB: • Record documents • Query specification documents • Update specification documents • Index specification documents • Sort order specification documents • …
  62. ObjectID ObjectId is a 12-byte BSON type, constructed using: •

    a 4-byte value representing the seconds since the Unix epoch, • a 3-byte machine identifier, • a 2-byte process id, and • a 3-byte counter, starting with a random value. 32 http://docs.mongodb.org/manual/reference/object-id/
  63. Date getDate getDay getFullYear getHours getMilliseconds getMinutes getMonth getSeconds getTime

    getTimezoneOffset getUTCDate getUTCDay getUTCFullYear getUTCHours getUTCMilliseconds getUTCMinutes getUTCMonth getUTCSeconds getYear setDate setFullYear setHours setMilliseconds setMinutes setMonth setSeconds setTime setUTCDate setUTCFullYear setUTCHours setUTCMilliseconds setUTCMinutes setUTCMonth setUTCSeconds setYear toDateString toGMTString toISOString tojson toLocaleDateString toLocaleString toLocaleTimeString toString toTimeString toUTCString 34
  64. MongoDB Terminology 35 Relational DBs MongoDB database database index index

    table collection row (tuple) (BSON) document column (BSON) field primary key _id field foreign key – http://docs.mongodb.org/manual/reference/sql-comparison/
  65. MongoDB Terminology 35 Relational DBs MongoDB database database index index

    table collection row (tuple) (BSON) document column (BSON) field primary key _id field foreign key – http://docs.mongodb.org/manual/reference/sql-comparison/
  66. MongoDB Terminology 35 Relational DBs MongoDB database database index index

    table collection row (tuple) (BSON) document column (BSON) field primary key _id field foreign key – http://docs.mongodb.org/manual/reference/sql-comparison/
  67. MongoDB Terminology 35 Relational DBs MongoDB database database index index

    table collection row (tuple) (BSON) document column (BSON) field primary key _id field foreign key – http://docs.mongodb.org/manual/reference/sql-comparison/
  68. MongoDB Terminology 35 Relational DBs MongoDB database database index index

    table collection row (tuple) (BSON) document column (BSON) field primary key _id field foreign key – http://docs.mongodb.org/manual/reference/sql-comparison/
  69. MongoDB Terminology 35 Relational DBs MongoDB database database index index

    table collection row (tuple) (BSON) document column (BSON) field primary key _id field foreign key – http://docs.mongodb.org/manual/reference/sql-comparison/
  70. MongoDB Terminology 35 Relational DBs MongoDB database database index index

    table collection row (tuple) (BSON) document column (BSON) field primary key _id field foreign key – http://docs.mongodb.org/manual/reference/sql-comparison/
  71. MongoDB Terminology 35 Relational DBs MongoDB database database index index

    table collection row (tuple) (BSON) document column (BSON) field primary key _id field foreign key – http://docs.mongodb.org/manual/reference/sql-comparison/
  72. > show dbs > use test > show collections >

    help > db.help() > db.users.help() > db.users.find > db.users.find().help 36
  73. MongoDB query syntax: db.users.find( { age: { $gte: 25 }

    }, { age: 1, name: 1 } ) SQL: SELECT _id, age, name FROM users WHERE age > 25; 40
  74. MongoDB: db.users.find( { age: { $gte: 25 } }, {}

    ) db.users.find( { age: { $gte: 25 } } ) SQL: SELECT * FROM users WHERE age > 25; 42
  75. MongoDB: db.users.find( {}, { name: 0 } ) SQL: SELECT

    _id, age, dob, height, weight FROM users; 43
  76. Simple Equality Checks db.inventory.find( { type: "snacks" } ) db.inventory.find(

    { type: { $in: [ 'food', 'snacks' ] } } ) SELECT * FROM inventory WHERE type IN (‘food’, ‘snacks’); 44
  77. Simple Equality Checks db.inventory.find( { type: 'food', price: { $lt:

    9.95 } } ) db.inventory.find( { type: 'food', $or: [ { qty: { $gt: 100 } }, { price: { $lt: 9.95 } } ] } ) SELECT * FROM inventory WHERE type = ‘food‘ AND ( qty > 100 OR price < 9.95 ); 45
  78. Subdocuments db.inventory.find( { producer: { company: 'ABC123', address: '123 Street'

    } } ) db.inventory.find( { 'producer.company': 'ABC123' } ) 46
  79. Arrays db.inventory.find( { tags: [ 'fruit', 'food', 'citrus' ] }

    ) db.inventory.find( { tags: 'fruit' } ) db.inventory.find( { 'tags.0' : 'fruit' } ) db.inventory.find( { memos: { $elemMatch: { memo : 'on time', by: 'shipping' } } } ) 47
  80. Operators $gt, $gte, $lt, $lte $and, $or, $nor, $not $regex,

    $where $exists, $size $all, $type $elemMatch, $in, $nin, $ne 48 http://docs.mongodb.org/manual/reference/operator/
  81. Regex db.users.find( { name: /^P/ } ) SELECT * FROM

    users WHERE name LIKE “P%”; db.users.find( { email: /@gmail.com$/ } ) SELECT * FROM users WHERE name LIKE “P%”; 49
  82. Regex db.users.find( { username: /^P/i } ) db.users.find( { email:

    /^[\w\d\-_+]{0,25}@[\w]{0,5}mail.com$/ } ) 50
  83. .findOne() db.inventory.findOne( { type: "snacks" } ) SELECT * FROM

    inventory WHERE type = “snacks” LIMIT 1; 52
  84. db.inventory.find( { tags: [ 'fruit', 'food', 'citrus' ] } ).count()

    SELECT COUNT(*) FROM inventory WHERE tags IN (‘fruit’, ‘food’, ‘citrus’); 53 .count()
  85. db.inventory.find( { tags: [ 'fruit', 'food', 'citrus' ] } ).sort(

    { purchased_at: 1, price: -1} ) SELECT * FROM inventory WHERE tags IN (‘fruit’, ‘food’, ‘citrus’) ORDER BY purchased_at ASC, price DESC; 54 .sort()
  86. db.inventory.find( { tags: [ 'fruit', 'food', 'citrus' ] } ).sort(

    { purchased_at: 1, price: -1} ).limit(30) SELECT * FROM inventory WHERE tags IN (‘fruit’, ‘food’, ‘citrus’) ORDER BY purchased_at ASC, price DESC LIMIT 30; 55 .limit()
  87. db.inventory.find().limit(30).skip(30*5 - 1) SELECT * FROM inventory LIMIT 30 OFFSET

    149; (MySQL) SELECT * FROM inventory LIMIT 149, 30; 56 .skip()
  88. var cursor = db.users.find( { 'age' : { $gte :

    20 } } ); while (cursor.hasNext()) { curr_doc = cursor.next(); print(curr_doc.age); } 61
  89. var index = 0; while (cursor.hasNext()) { u = cursor.next();

    print((index++) + ") " + u.username + " : " + u.age*2); } 1) 8731040 : 46 2) 8731041 : 45 3) 8731042 : 46 4) 8731043 : 44 62
  90. 65

  91. 65 • Aggregation Framework is MongoDB’s GROUP BY • Perform

    complex queries • Simpler than Map/Reduce
  92. 65 • Aggregation Framework is MongoDB’s GROUP BY • Perform

    complex queries • Simpler than Map/Reduce • Can $project the output to:
  93. 65 • Aggregation Framework is MongoDB’s GROUP BY • Perform

    complex queries • Simpler than Map/Reduce • Can $project the output to: • Add or compute new fields
  94. 65 • Aggregation Framework is MongoDB’s GROUP BY • Perform

    complex queries • Simpler than Map/Reduce • Can $project the output to: • Add or compute new fields • Create virtual sub-objects
  95. 65 • Aggregation Framework is MongoDB’s GROUP BY • Perform

    complex queries • Simpler than Map/Reduce • Can $project the output to: • Add or compute new fields • Create virtual sub-objects • Extract sub-fields into top-level objects
  96. • Like Unix shells, documents pass through multiple pipeline operators

    • Pipeline operators can produce zero, one or multiple “new” documents 66 Pipeline
  97. Pipeline Operators $project $match $limit $skip $unwind $group $sort $geoNear

    67 http://docs.mongodb.org/manual/reference/aggregation/#pipeline
  98. SQL to Aggregation Framework Mapping Chart 68 SQL MongoDB Aggregation

    WHERE $match GROUP BY $group HAVING $match SELECT $project ORDER BY $sort LIMIT $limit SUM $sum COUNT $sum http://docs.mongodb.org/manual/reference/sql-aggregation-comparison/
  99. SQL to Aggregation Framework Mapping Chart 68 SQL MongoDB Aggregation

    WHERE $match GROUP BY $group HAVING $match SELECT $project ORDER BY $sort LIMIT $limit SUM $sum COUNT $sum http://docs.mongodb.org/manual/reference/sql-aggregation-comparison/
  100. SQL to Aggregation Framework Mapping Chart 68 SQL MongoDB Aggregation

    WHERE $match GROUP BY $group HAVING $match SELECT $project ORDER BY $sort LIMIT $limit SUM $sum COUNT $sum http://docs.mongodb.org/manual/reference/sql-aggregation-comparison/
  101. SQL to Aggregation Framework Mapping Chart 68 SQL MongoDB Aggregation

    WHERE $match GROUP BY $group HAVING $match SELECT $project ORDER BY $sort LIMIT $limit SUM $sum COUNT $sum http://docs.mongodb.org/manual/reference/sql-aggregation-comparison/
  102. SQL to Aggregation Framework Mapping Chart 68 SQL MongoDB Aggregation

    WHERE $match GROUP BY $group HAVING $match SELECT $project ORDER BY $sort LIMIT $limit SUM $sum COUNT $sum http://docs.mongodb.org/manual/reference/sql-aggregation-comparison/
  103. SQL to Aggregation Framework Mapping Chart 68 SQL MongoDB Aggregation

    WHERE $match GROUP BY $group HAVING $match SELECT $project ORDER BY $sort LIMIT $limit SUM $sum COUNT $sum http://docs.mongodb.org/manual/reference/sql-aggregation-comparison/
  104. SQL to Aggregation Framework Mapping Chart 68 SQL MongoDB Aggregation

    WHERE $match GROUP BY $group HAVING $match SELECT $project ORDER BY $sort LIMIT $limit SUM $sum COUNT $sum http://docs.mongodb.org/manual/reference/sql-aggregation-comparison/
  105. SQL to Aggregation Framework Mapping Chart 68 SQL MongoDB Aggregation

    WHERE $match GROUP BY $group HAVING $match SELECT $project ORDER BY $sort LIMIT $limit SUM $sum COUNT $sum http://docs.mongodb.org/manual/reference/sql-aggregation-comparison/
  106. SQL to Aggregation Framework Mapping Chart 68 SQL MongoDB Aggregation

    WHERE $match GROUP BY $group HAVING $match SELECT $project ORDER BY $sort LIMIT $limit SUM $sum COUNT $sum http://docs.mongodb.org/manual/reference/sql-aggregation-comparison/
  107. 70 { "_id": "10280", "city": "NEW YORK", "state": "NY", "pop":

    5574, "loc": [ -74.016323, 40.710537 ] } Document Format http://docs.mongodb.org/manual/tutorial/aggregation-examples/#states-with-populations-over-10-million
  108. 71 db.zipcodes.aggregate( { $group : { _id : "$state", totalPop

    : { $sum : "$pop" } } }, { $match : { totalPop : { $gte : 10*1000 } } } ) States with Populations Over 10 Million
  109. $group by _id 72 doc3 { … } doc7 {

    … } doc14 { … } doc9 { … } doc12 { … } NY TX RI … doc19 { … } doc23 { … } doc24 { … }
  110. $sum 73 NY TX totalPop: 19570 totalPop: 26059 RI IL

    totalPop: 1050 totalPop: 12875 NH totalPop: 1320 PY totalPop: 12763
  111. $match 74 NY TX totalPop: 19570 totalPop: 26059 RI IL

    totalPop: 1050 totalPop: 12875 NH totalPop: 1320 PY totalPop: 12763
  112. 75 { _id: ‘NY’, totalPop: 19570 }, { _id: ‘TX’,

    totalPop: 26059 }, { _id: ‘IL’, totalPop: 12875 }, { _id: ‘PY’, totalPop: 12763 }
  113. 77 { "_id": "10280", "city": "NEW YORK", "state": "NY", "pop":

    5574, "loc": [ -74.016323, 40.710537 ] } Document Format http://docs.mongodb.org/manual/tutorial/aggregation-examples/#largest-and-smallest-cities-by-state
  114. 78 db.zipcodes.aggregate( { $group: { _id: { state: "$state", city:

    "$city" }, pop: { $sum: "$pop" } } }, { $sort: { pop: 1 } }, { $group: { _id : "$_id.state", biggestCity: { $last: "$_id.city" }, biggestPop: { $last: "$pop" }, smallestCity: { $first: "$_id.city" }, smallestPop: { $first: "$pop" } } }, { $project: { _id: 0, state: "$_id", biggestCity: { name: "$biggestCity", pop: "$biggestPop" }, smallestCity: { name: "$smallestCity", pop: "$smallestPop" } } } ) Largest and Smallest Cities by State
  115. 1st $group 79 { "_id" : { "state" : "CO",

    "city" : "EDGEWATER" }, "pop" : 13154 }
  116. 2nd $group 80 { "_id" : "WA", "biggestCity" : "SEATTLE",

    "biggestPop" : 520096, "smallestCity" : "BENGE", "smallestPop" : 2 }
  117. $project 81 { "state" : "RI", "biggestCity" : { "name"

    : "CRANSTON", "pop" : 176404 }, "smallestCity" : { "name" : "CLAYVILLE", "pop" : 45 } }
  118. 82 db.zipcodes.aggregate( { $group: { _id: { state: "$state", city:

    "$city" }, pop: { $sum: "$pop" } } }, { $sort: { pop: 1 } }, { $group: { _id : "$_id.state", biggestCity: { $last: "$_id.city" }, biggestPop: { $last: "$pop" }, smallestCity: { $first: "$_id.city" }, smallestPop: { $first: "$pop" } } }, { $project: { _id: 0, state: "$_id", biggestCity: { name: "$biggestCity", pop: "$biggestPop" }, smallestCity: { name: "$smallestCity", pop: "$smallestPop" } } } ) Largest and Smallest Cities by State
  119. 83 A, a pop: 114 A, b pop: 14 A,

    c pop: 93 B, a pop: 18 B, b pop: 44 B, c pop: 64 B, d pop: 65 B, e pop: 23 B, f pop: 112 C, a pop: 65 C, b pop: 13 D, a pop: 65 D, b pop: 87 D, c pop: 142 D, e pop: 123 D, f pop: 98 E, a pop: 23 E, b pop: 61 E, c pop: 27 E, d pop: 51 E,e pop: 92 E, f pop: 3 E, g pop: 64 E, h pop: 57 1st $group
  120. $sort 84 A, a pop: 114 A, b pop: 14

    A, c pop: 93 B, a pop: 18 B, b pop: 44 B, c pop: 64 B, d pop: 65 B, e pop: 23 B, f pop: 112 C, a pop: 65 C, b pop: 13 D, a pop: 65 D, b pop: 87 D, c pop: 142 D, e pop: 123 D, f pop: 98 E, a pop: 23 E, b pop: 61 E, c pop: 27 E, d pop: 51 E,e pop: 92 E, f pop: 3 E, g pop: 64 E, h pop: 57
  121. 2nd $group (1) E, f pop: 3 85 A, a

    pop: 114 A, b pop: 14 A, c pop: 93 B, a pop: 18 B, e pop: 23 C, a pop: 65 C, b pop: 13 D, a pop: 65 D, b pop: 87 D, c pop: 142 D, e pop: 123 D, f pop: 98 B, b pop: 44 B, c pop: 64 B, d pop: 65 B, f pop: 112 E, a pop: 23 E, c pop: 27 E, d pop: 51 E, h pop: 57 E, b pop: 61 E, g pop: 64 E,e pop: 92 A B C D E
  122. 2nd $group (2) E, f pop: 3 86 A, a

    pop: 114 A, b pop: 14 A, c pop: 93 B, a pop: 18 B, e pop: 23 C, a pop: 65 C, b pop: 13 D, a pop: 65 D, b pop: 87 D, c pop: 142 D, e pop: 123 D, f pop: 98 B, b pop: 44 B, c pop: 64 B, d pop: 65 B, f pop: 112 E, a pop: 23 E, c pop: 27 E, d pop: 51 E, h pop: 57 E, b pop: 61 E, g pop: 64 E,e pop: 92 A B C D E
  123. 2nd $group (3) SmallestPop E, f pop: 3 87 BiggestPop

    A, a pop: 114 SmallestPop A, b pop: 14 SmallestPop B, a pop: 18 BiggestPop C, a pop: 65 SmallestPop C, b pop: 13 SmallestPop D, a pop: 65 BiggestPop D, c pop: 142 BiggestPop B, f pop: 112 A B C D E BiggestPop E,e pop: 92
  124. 88 { _id : "jane", joined : ISODate("2011-03-02"), likes :

    ["golf", "racquetball"] } { _id : "joe", joined : ISODate("2012-07-02"), likes : ["tennis", "golf", "swimming"] } Document Format http://docs.mongodb.org/manual/tutorial/aggregation-examples/#aggregation-with-user-preference-data
  125. 89 db.users.aggregate( [ { $project : { name: { $toUpper:

    "$_id"} , _id: 0 } }, { $sort : { name : 1 } } ] ) Normalize and Sort Documents
  126. 91 db.users.aggregate( [ { $project : { month_joined : {

    $month : "$joined" }, name : "$_id", _id : 0 }, { $sort : { month_joined : 1 } } ] ) Return Usernames Ordered by Join Month
  127. 92 { "month_joined" : 1, "name" : "ruth" }, {

    "month_joined" : 1, "name" : "harold" }, { "month_joined" : 1, "name" : "kate" } { "month_joined" : 2, "name" : "jill" }
  128. 93 db.users.aggregate( [ { $project : { month_joined : {

    $month : "$joined" } } } , { $group : { _id : { month_joined:"$month_joined" } , number : { $sum : 1 } } }, { $sort : { "$_id.month_joined" : 1 } } ] ) Return Total Number of Joins per Month
  129. 94 { "_id" : { "month_joined" : 1 }, "number"

    : 3 }, { "_id" : { "month_joined" : 2 }, "number" : 9 }, { "_id" : { "month_joined" : 3 }, "number" : 5 }
  130. 95 db.users.aggregate( [ { $unwind : "$likes" }, { $group

    : { _id : "$likes" , number : { $sum : 1 } } }, { $sort : { number : -1 } }, { $limit : 5 } ] ) Return the Five Most Common “Likes”
  131. 96 { "_id" : "golf", "number" : 33 }, {

    "_id" : "racquetball", "number" : 31 }, { "_id" : "swimming", "number" : 24 }, { "_id" : "handball", "number" : 19 }, { "_id" : "tennis", "number" : 18 }
  132. • Map • Called once per document • Can emit

    zero, one or more “new” documents (<key, value>) • Reduce • Called once per key emitted • Processes <key, Array[value]> and reduces all the values into a single one • Finalize • Optional – Rounds up all the reduced data 98
  133. 100 http://docs.mongodb.org/ecosystem/drivers/java and http://api.mongodb.org/java/current/ import com.mongodb.*; import java.util.Arrays; MongoClient mongoClient

    = new MongoClient( "localhost" , 27017 ); DB db = mongoClient.getDB( "mydb" ); // Getting a List Of Collections Set<String> colls = db.getCollectionNames(); for (String s : colls) { System.out.println(s); } DBCollection coll = db.getCollection("testCollection"); BasicDBObject doc = new BasicDBObject("name", "MongoDB"). append("type", "database"). append("count", 1). append("info", new BasicDBObject("x", 203).append("y", 102)); coll.insert(doc); System.out.println(coll.getCount()); Java
  134. 101 http://docs.mongodb.org/ecosystem/drivers/java and http://api.mongodb.org/java/current/ // Using a Cursor to Get

    All the Documents DBCursor cursor = coll.find(); try { while(cursor.hasNext()) { System.out.println(cursor.next()); } } finally { cursor.close(); } // Getting A Single Document with A Query BasicDBObject query = new BasicDBObject("i", 71); cursor = coll.find(query); try { while(cursor.hasNext()) { System.out.println(cursor.next()); } } finally { cursor.close(); } Java
  135. PHP 102 http://docs.mongodb.org/ecosystem/drivers/php and http://ir2.php.net/mongo/ require 'rubygems' require 'mongo' include

    Mongo @client = MongoClient.new('localhost', 27017) @db = @client['sample-db'] @coll = @db['test'] @coll.remove 3.times do |i| @coll.insert({'a' => i+1}) end puts "There are #{@coll.count} records. Here they are:" @coll.find.each { |doc| puts doc.inspect } <?php // connect $m = new MongoClient(); // select a database $db = $m->comedy; // select a collection (analogous to a relational database's table) $collection = $db->cartoons; // add a record $document = array( "title" => "Calvin and Hobbes", "author" => "Bill Watterson" ); $collection->insert($document); // add another record, with a different "shape" $document = array( "title" => "XKCD", "online" => true ); $collection->insert($document); // find everything in the collection $cursor = $collection->find(); // iterate through the results foreach ($cursor as $document) { echo $document["title"] . "\n"; } ?>
  136. Ruby require 'rubygems' require 'mongo' include Mongo @client = MongoClient.new('localhost',

    27017) @db = @client['sample-db'] @coll = @db['test'] @coll.remove 3.times do |i| @coll.insert({'a' => i+1}) end puts "There are #{@coll.count} records. Here they are:" @coll.find.each { |doc| puts doc.inspect } 103 http://docs.mongodb.org/ecosystem/drivers/ruby and http://api.mongodb.org/ruby/current/
  137. Important Factors • Existing skill set and tooling • Existing

    architecture and infrastructure • Growth expectation 105 http://www.palominodb.com/blog/2012/03/06/when-mongodb-right-choice-your-business-we-explore-detailed-use-cases
  138. • Prototyping • Fast, rapid schema changes • Logging •

    Asynchronous logs • Capped collections • Flexible structure (schemaless) 106
  139. • Archiving • Old data might be in a different

    format • Craigslist • Content Management • SAP / Wordnik • Queue Management 107
  140. • Index any property • Index properties of subdocuments and

    sub- subdocuments • Arrays! • Compound, reverse, unique, hashed, sparse, geospatial and text index types 109
  141. > db.inventory.find( { type: 'food' } ).explain() { "cursor" :

    "BasicCursor", "isMultiKey" : false, "n" : 5, "nscannedObjects" : 4000006, "nscanned" : 4000006, "nscannedObjectsAllPlans" : 4000006, "nscannedAllPlans" : 4000006, "scanAndOrder" : false, "indexOnly" : false, "nYields" : 2, "nChunkSkips" : 0, "millis" : 1591, "indexBounds" : { }, "server" : "mongodb0.example.net:27017" } 110
  142. // ascending db.inventory.ensureIndex( { type: 1 } ) // descending

    db.inventory.ensureIndex( { created_at: -1 } ) // non-blocking in the background db.inventory.ensureIndex( { type: 1 }, { background: true } ) // compound db.inventory.ensureIndex( { name: 1, type: 1 } ) // sparse db.collection.ensureIndex( { a: 1 }, { sparse: true } ) 111
  143. > db.inventory.find( { type: 'food' } ).explain() { "cursor" :

    "BtreeCursor type_1", "isMultiKey" : false, "n" : 5, "nscannedObjects" : 5, "nscanned" : 5, "nscannedObjectsAllPlans" : 5, "nscannedAllPlans" : 5, "scanAndOrder" : false, "indexOnly" : false, "nYields" : 0, "nChunkSkips" : 0, "millis" : 0, "indexBounds" : { "type" : [ [ "food", "food" ] ] }, "server" : "mongodbo0.example.net:27017" } 112
  144. • Can store large objects (larger than 16MB) • Files

    are divided into 256KB chunks, and can be re-assembled fully or partially • No need to load the whole file into memory (can “skip” to the middle of a video) • Uses fs.chunks and fs.files collections by default 114 http://docs.mongodb.org/manual/core/gridfs/
  145. // returns default GridFS bucket (i.e. "fs" collection) GridFS myFS

    = new GridFS(myDatabase); // saves the file to "fs" GridFS bucket myFS.createFile(new File("/tmp/largething.mpg")); // returns GridFS bucket named "contracts" GridFS myContracts = new GridFS(myDatabase, "contracts"); // retrieve GridFS object "smithco" GridFSDBFile file = myContracts.findOne("smithco"); // saves the GridFS file to the file system file.writeTo(new File("/tmp/smithco.pdf")); 115 Java
  146. 116 # Write a file on disk to the Grid

    file = File.open('image.jpg') grid = Mongo::Grid.new(db) id = grid.put(file) # Retrieve the file file = grid.get(id) file.read # Get all the file's metadata file.filename file.content_type file.metadata Ruby
  147. 122

  148. http://css.dzone.com/articles/how-acid-mongodb ACID ⎕ Atomicity requires that each transaction is executed

    in its entirety, or fail without any change being applied. 124 http://en.wikipedia.org/wiki/ACID
  149. http://css.dzone.com/articles/how-acid-mongodb ACID ⎕ Atomicity requires that each transaction is executed

    in its entirety, or fail without any change being applied. ⎕ Consistency requires that the database only passes from a valid state to the next one, without intermediate points. 124 http://en.wikipedia.org/wiki/ACID
  150. http://css.dzone.com/articles/how-acid-mongodb ACID ⎕ Atomicity requires that each transaction is executed

    in its entirety, or fail without any change being applied. ⎕ Consistency requires that the database only passes from a valid state to the next one, without intermediate points. ⎕ Isolation requires that if transactions are executed concurrently, the result is equivalent to their serial execution. A transaction cannot see the partial result of the application of another one. 124 http://en.wikipedia.org/wiki/ACID
  151. http://css.dzone.com/articles/how-acid-mongodb ACID ⎕ Atomicity requires that each transaction is executed

    in its entirety, or fail without any change being applied. ⎕ Consistency requires that the database only passes from a valid state to the next one, without intermediate points. ⎕ Isolation requires that if transactions are executed concurrently, the result is equivalent to their serial execution. A transaction cannot see the partial result of the application of another one. ⎕ Durability means that the the result of a committed transaction is permanent, even if the database crashes immediately or in the event of a power loss. 124 http://en.wikipedia.org/wiki/ACID
  152. http://css.dzone.com/articles/how-acid-mongodb ACID ☒ Atomicity requires that each transaction is executed

    in its entirety, or fail without any change being applied. ⎕ Consistency requires that the database only passes from a valid state to the next one, without intermediate points. ⎕ Isolation requires that if transactions are executed concurrently, the result is equivalent to their serial execution. A transaction cannot see the partial result of the application of another one. ⎕ Durability means that the the result of a committed transaction is permanent, even if the database crashes immediately or in the event of a power loss. 125 http://en.wikipedia.org/wiki/ACID
  153. http://css.dzone.com/articles/how-acid-mongodb ACID ☒ Atomicity requires that each transaction is executed

    in its entirety, or fail without any change being applied. ☒ Consistency requires that the database only passes from a valid state to the next one, without intermediate points. ⎕ Isolation requires that if transactions are executed concurrently, the result is equivalent to their serial execution. A transaction cannot see the partial result of the application of another one. ⎕ Durability means that the the result of a committed transaction is permanent, even if the database crashes immediately or in the event of a power loss. 126 http://en.wikipedia.org/wiki/ACID
  154. http://css.dzone.com/articles/how-acid-mongodb ACID ☒ Atomicity requires that each transaction is executed

    in its entirety, or fail without any change being applied. ☒ Consistency requires that the database only passes from a valid state to the next one, without intermediate points. ☑ Isolation requires that if transactions are executed concurrently, the result is equivalent to their serial execution. A transaction cannot see the partial result of the application of another one. ⎕ Durability means that the the result of a committed transaction is permanent, even if the database crashes immediately or in the event of a power loss. 127 http://en.wikipedia.org/wiki/ACID
  155. http://css.dzone.com/articles/how-acid-mongodb ACID ☒ Atomicity requires that each transaction is executed

    in its entirety, or fail without any change being applied. ☒ Consistency requires that the database only passes from a valid state to the next one, without intermediate points. ☑ Isolation requires that if transactions are executed concurrently, the result is equivalent to their serial execution. A transaction cannot see the partial result of the application of another one. ☒ Durability means that the the result of a committed transaction is permanent, even if the database crashes immediately or in the event of a power loss. 128 http://en.wikipedia.org/wiki/ACID
  156. Atomic Operations • $set • $unset • $inc • $push

    • $pushAll • $pop • $pull • $pullAll • $addToSet • $rename 129
  157. • A hierarchical data structure specific to PostgreSQL • Maps

    string keys to string values, or other hstore values • h->"a" (get value for key a) • h?"a" (does h contain key a?) • h@>"a->2" (does key a contain 2?) 132 hstore
  158. • Validates JSON data (when storing) • Expression indexing •

    PL/V8 133 JSON https://news.ycombinator.com/item?id=5467865 https://wiki.postgresql.org/images/b/b4/Pg-as-nosql-pgday-fosdem-2013.pdf
  159. 134

  160. Docs & Tutorials • MongoDB Docs http://docs.mongodb.org/manual/ • MongoTips http://www.mongotips.com

    • “The Little MongoDB Book” http://openmymind.net/mongodb.pdf • “Why MongoDB Is Awesome?” http://www.slideshare.net/jnunemaker/why-mongodb-is-awesome 135
  161. Docs & Tutorials • MongoDB Docs http://docs.mongodb.org/manual/ • MongoTips http://www.mongotips.com

    • “The Little MongoDB Book” http://openmymind.net/mongodb.pdf • “Why MongoDB Is Awesome?” http://www.slideshare.net/jnunemaker/why-mongodb-is-awesome 135
  162. Docs & Tutorials • MongoDB Docs http://docs.mongodb.org/manual/ • MongoTips http://www.mongotips.com

    • “The Little MongoDB Book” http://openmymind.net/mongodb.pdf • “Why MongoDB Is Awesome?” http://www.slideshare.net/jnunemaker/why-mongodb-is-awesome 135
  163. Docs & Tutorials • MongoDB Docs http://docs.mongodb.org/manual/ • MongoTips http://www.mongotips.com

    • “The Little MongoDB Book” http://openmymind.net/mongodb.pdf • “Why MongoDB Is Awesome?” http://www.slideshare.net/jnunemaker/why-mongodb-is-awesome 135