Slide 1

Slide 1 text

Devoxx Workshop Brendan McAdams @rit 10gen, Inc. [email protected] Tuesday, November 15, 11

Slide 2

Slide 2 text

Introductions • Brendan McAdams - [email protected] • 10gen Engineer • Open Source: MongoDB Core Contributor • Java Driver • Scala Support • Official Scala Driver (Casbah) • Scala Community Integration • Hadoop + MongoDB Integration • 10gen (Commercial MongoDB): Support, Training, Consulting • MongoDB User for ~3 years (Early 2009) Tuesday, November 15, 11

Slide 3

Slide 3 text

Agenda • MongoDB Overview • Who, What, When, Where, Why? • CRUD & the MongoDB Shell • Indexing & Query Optimization • Scalability & MongoDB - Replica Sets & Sharding • MongoDB Drivers • How to talk to MongoDB from your Application Tuesday, November 15, 11

Slide 4

Slide 4 text

Ask Questions! • Stop me if I’m going to fast, too slow ... or speaking incoherently Tuesday, November 15, 11

Slide 5

Slide 5 text

What Is MongoDB? Tuesday, November 15, 11

Slide 6

Slide 6 text

So We’ve Built an Application with a Database Tuesday, November 15, 11

Slide 7

Slide 7 text

So We’ve Built an Application with a Database How do we integrate that database with our application’s object hierarchy? Tuesday, November 15, 11

Slide 8

Slide 8 text

I know! Let’s use an ORM! Tuesday, November 15, 11

Slide 9

Slide 9 text

I know! Let’s use an ORM! Congratulations: Now we’ve got 2 problems! (or is it n+1?) Tuesday, November 15, 11

Slide 10

Slide 10 text

Let’s Face It ... SQL Sucks. Tuesday, November 15, 11

Slide 11

Slide 11 text

Let’s Face It ... SQL Sucks. For some problems at least. Tuesday, November 15, 11

Slide 12

Slide 12 text

Stuffing an object graph into a relational model is like fitting a square peg into a round hole. Tuesday, November 15, 11

Slide 13

Slide 13 text

Sure, we can use an ORM. But who are we really fooling? Tuesday, November 15, 11

Slide 14

Slide 14 text

Sure, we can use an ORM. But who are we really fooling? ... and who/what are we going to wake up next to in the morning? Tuesday, November 15, 11

Slide 15

Slide 15 text

This is the SQL Model mysql> select * from book; +----+----------------------------------------------------------+ | id | title | +----+----------------------------------------------------------+ | 1 | The Demon-Haunted World: Science as a Candle in the Dark | | 2 | Cosmos | | 3 | Programming in Scala | +----+----------------------------------------------------------+ 3 rows in set (0.00 sec) mysql> select * from bookauthor; +---------+-----------+ | book_id | author_id | +---------+-----------+ | 1 | 1 | | 2 | 1 | | 3 | 2 | | 3 | 3 | | 3 | 4 | +---------+-----------+ 5 rows in set (0.00 sec) mysql> select * from author; +----+-----------+------------+-------------+-------------+---------------+ | id | last_name | first_name | middle_name | nationality | year_of_birth | +----+-----------+------------+-------------+-------------+---------------+ | 1 | Sagan | Carl | Edward | NULL | 1934 | | 2 | Odersky | Martin | NULL | DE | 1958 | | 3 | Spoon | Lex | NULL | NULL | NULL | | 4 | Venners | Bill | NULL | NULL | NULL | +----+-----------+------------+-------------+-------------+---------------+ 4 rows in set (0.00 sec) Tuesday, November 15, 11

Slide 16

Slide 16 text

Joins are great and all ... Tuesday, November 15, 11

Slide 17

Slide 17 text

Joins are great and all ... • Potentially organizationally messy Tuesday, November 15, 11

Slide 18

Slide 18 text

Joins are great and all ... • Potentially organizationally messy • Structure of a single object is NOT immediately clear to someone glancing at the shell data Tuesday, November 15, 11

Slide 19

Slide 19 text

Joins are great and all ... • Potentially organizationally messy • Structure of a single object is NOT immediately clear to someone glancing at the shell data • We have to flatten our object out into three tables Tuesday, November 15, 11

Slide 20

Slide 20 text

Joins are great and all ... • Potentially organizationally messy • Structure of a single object is NOT immediately clear to someone glancing at the shell data • We have to flatten our object out into three tables • 7 separate inserts just to add “Programming in Scala” Tuesday, November 15, 11

Slide 21

Slide 21 text

Joins are great and all ... • Potentially organizationally messy • Structure of a single object is NOT immediately clear to someone glancing at the shell data • We have to flatten our object out into three tables • 7 separate inserts just to add “Programming in Scala” • Once we turn the relational data back into objects ... Tuesday, November 15, 11

Slide 22

Slide 22 text

Joins are great and all ... • Potentially organizationally messy • Structure of a single object is NOT immediately clear to someone glancing at the shell data • We have to flatten our object out into three tables • 7 separate inserts just to add “Programming in Scala” • Once we turn the relational data back into objects ... • We still need to convert it to data for our frontend Tuesday, November 15, 11

Slide 23

Slide 23 text

Joins are great and all ... • Potentially organizationally messy • Structure of a single object is NOT immediately clear to someone glancing at the shell data • We have to flatten our object out into three tables • 7 separate inserts just to add “Programming in Scala” • Once we turn the relational data back into objects ... • We still need to convert it to data for our frontend • I don’t know about you, but I have better things to do with my time. Tuesday, November 15, 11

Slide 24

Slide 24 text

The Same Data in MongoDB > db.books.find().forEach(printjson) { "_id" : ObjectId("4dfa6baa9c65dae09a4bbda3"), "title" : "The Demon-Haunted World: Science as a Candle in the Dark", "author" : [ { "first_name" : "Carl", "last_name" : "Sagan", "middle_name" : "Edward", "year_of_birth" : 1934 } ] } { "_id" : ObjectId("4dfa6baa9c65dae09a4bbda4"), "title" : "Cosmos", "author" : [ { "first_name" : "Carl", "last_name" : "Sagan", "middle_name" : "Edward", "year_of_birth" : 1934 } ] } Tuesday, November 15, 11

Slide 25

Slide 25 text

The Same Data in MongoDB (Part 2) { "_id" : ObjectId("4dfa6baa9c65dae09a4bbda5"), "title" : "Programming in Scala", "author" : [ { "first_name" : "Martin", "last_name" : "Odersky", "nationality" : "DE", "year_of_birth" : 1958 }, { "first_name" : "Lex", "last_name" : "Spoon" }, { "first_name" : "Bill", "last_name" : "Venners" } ] } Tuesday, November 15, 11

Slide 26

Slide 26 text

Access to the embedded objects is integral > db.books.find({"author.first_name": "Martin", "author.last_name": "Odersky"}) { "_id" : ObjectId("4dfa6baa9c65dae09a4bbda5"), "title" : "Programming in Scala", "author" : [ { "first_name" : "Martin", "last_name" : "Odersky", "nationality" : "DE", "year_of_birth" : 1958 }, { "first_name" : "Lex", "last_name" : "Spoon" }, { "first_name" : "Bill", "last_name" : "Venners" } ] } Tuesday, November 15, 11

Slide 27

Slide 27 text

As is manipulation of the embedded data > db.books.update({"author.first_name": "Bill", "author.last_name": "Venners"}, ... {$set: {"author.$.company": "Artima, Inc."}}) > db.books.update({"author.first_name": "Martin", "author.last_name": "Odersky"}, ... {$set: {"author.$.company": "Typesafe, Inc."}}) > db.books.findOne({"title": /Scala$/}) { "_id" : ObjectId("4dfa6baa9c65dae09a4bbda5"), "author" : [ { "company" : "Typesafe, Inc.", "first_name" : "Martin", "last_name" : "Odersky", "nationality" : "DE", "year_of_birth" : 1958 }, { "first_name" : "Lex", "last_name" : "Spoon" }, { "company" : "Artima, Inc.", "first_name" : "Bill", "last_name" : "Venners" } ], "title" : "Programming in Scala" } Tuesday, November 15, 11

Slide 28

Slide 28 text

NoSQL Really Means... non-relational, next-generation operational datastores and databases Tuesday, November 15, 11

Slide 29

Slide 29 text

NoSQL Really Means... non-relational, next-generation operational datastores and databases ... focus on the “non-relational” bit. Tuesday, November 15, 11

Slide 30

Slide 30 text

Horizontally Scalable Architectures no joins + no complex transactions Tuesday, November 15, 11

Slide 31

Slide 31 text

no joins + no complex transactions Tuesday, November 15, 11

Slide 32

Slide 32 text

no joins + no complex transactions Tuesday, November 15, 11

Slide 33

Slide 33 text

New Data Models no joins + no complex transactions Tuesday, November 15, 11

Slide 34

Slide 34 text

Best Use Cases Web Applications “Scaling Out” High Volume Traffic Caching Tuesday, November 15, 11

Slide 35

Slide 35 text

Less Suited For highly transactional applications problems which require SQL ad-hoc business intelligence Tuesday, November 15, 11

Slide 36

Slide 36 text

Scale out write read shard1 rep_a1 rep_b1 rep_c2 shard2 rep_a2 rep_b2 rep_c2 shard3 rep_a3 rep_b3 rep_c3 mongos  /   config  server mongos  /   config  server mongos  /   config  server Tuesday, November 15, 11

Slide 37

Slide 37 text

•MongoDB revolves around memory mapped files Memory Tuesday, November 15, 11

Slide 38

Slide 38 text

•(200 gigs of MongoDB files creates 200 gigs of virtual memory) •OS controls what data in RAM •When a piece of data isn't found, a page fault occurs (Expensive + Locking!) •OS goes to disk to fetch the data •Indexes are part of the Regular Database files •Deployment Trick: Pre-Warm your Database (PreWarming your cache) to prevent cold start slowdown Operating System map files on the Filesystem to Virtual Memory Tuesday, November 15, 11

Slide 39

Slide 39 text

•For production: Use a 64 bit OS and a 64 bit MongoDB Build •32 Bit has a 2 gig limit; imposed by the operating systems for memory mapped files •Clients can be 32 bit •MongoDB Supports (little endian only) •Linux, FreeBSD, OS X (on Intel, not PowerPC) •Windows •Solaris (Intel only, Joyent offers a cloud service which works for Mongo) A Few Words on OS Choice Tuesday, November 15, 11

Slide 40

Slide 40 text

•MongoDB is filesystem neutral •ext3, ext4 and XFS are most used •BUT.... •ext4, XFS or any other filesystem with posix_fallocate() are preferred and best • Turn off “atime” (/etc/fstab : add noatime, nodiratime) Filesystems “sort of” Matter Tuesday, November 15, 11

Slide 41

Slide 41 text

_id if not specified drivers will add default: ObjectId("4bface1a2231316e04f3c434") timestamp machine id process id counter http://www.mongodb.org/display/DOCS/Object+IDs Tuesday, November 15, 11

Slide 42

Slide 42 text

BSON Encoding { _id: ObjectId(XXXXXXXXXXXX), hello: “world”} \x27\x00\x00\x00\x07_id\x00 X X X X X X X X X X X X X X \x02 h e l l o \x00\x06\x00 \x00\x00 w o r l d \x00\x00 http://bsonspec.org Tuesday, November 15, 11

Slide 43

Slide 43 text

bsonspec.org Tuesday, November 15, 11

Slide 44

Slide 44 text

Extent Allocation foo.0 foo.1 foo.2 00000000000 00000000000 00000000000 00000000000 00000000000 00000000000 00000000000 preallocated space 00000000000 0000 foo.$freelist foo.baz foo.bar foo.test allocated per namespace: ns details stored in foo.ns Tuesday, November 15, 11

Slide 45

Slide 45 text

Record Allocation Deleted Record (Size, Offset, Next) BSON Data Header (Size, Offset, Next, Prev) Padding ... ... Tuesday, November 15, 11

Slide 46

Slide 46 text

Insert Message (TCP / IP ) message length message id response id op code (insert) \x68\x00\x00\x00 \xXX\xXX\xXX\xXX \x00\x00\x00\x00 \xd2\x07\x00\x00 reserved collection name document(s) \x00\x00\x00\x00 f o o . t e s t \x00 BSON Data http://www.mongodb.org/display/DOCS/Mongo+Wire+Protocol Tuesday, November 15, 11

Slide 47

Slide 47 text

Modeling Schemas in MongoDB • Think “Documents” not “Rows” • Documents are arranged in “collections” just as “rows” are organized in “tables” • Documents can be embedded in one another, or in Lists, etc. • Favor embedding over referencing • No datastore enforced foreign key relationships Tuesday, November 15, 11

Slide 48

Slide 48 text

MapReduce • MongoDB offers an implementation of MapReduce for calculating data aggregation • Functions are written in JavaScript • Support for “Incremental” Jobs to build on prior job outputs as data changes w/o rerunning whole dataset • Limitations of JavaScript engines reduce parallelism* Tuesday, November 15, 11

Slide 49

Slide 49 text

Geospatial Indexing “Where the hell am I?” • Search by (2D) Geospatial proximity with MongoDB • One GeoIndex per collection • Can index on an array or a subdocument • Searches against the index can treat the dataset as flat (map-like), Spherical (like a globe), and complex (box/ rectangle, circles, concave polygons and convex poylgons) Tuesday, November 15, 11

Slide 50

Slide 50 text

GridFS File Storage • Specification for storing large files in MongoDB; supported in all official drivers as reference implementation • Works around BSON document size limits by breaking files into chunks • Two collections: ‘fs.files’ for metadata, ‘fs.chunks’ for individual file pieces • Sharding: Individual file chunks don’t shard, but files themselves will • Experimental Modules for Lighttpd and Nginx • On JVM: get back java.io.File handles on GridFS read! Tuesday, November 15, 11

Slide 51

Slide 51 text

CRUD & the JavaScript Shell Tuesday, November 15, 11

Slide 52

Slide 52 text

db.test.find({hello: “world”}) Tuesday, November 15, 11

Slide 53

Slide 53 text

Shell Functions • Leaving off the () in the shell prints the function: > db.coll.find function (query, fields, limit, skip) { return new DBQuery(this._mongo, this._db, this, this._fullName, this._massageObject(query), fields, limit, skip); } Tuesday, November 15, 11

Slide 54

Slide 54 text

Query Language {first_name: “Brendan”, age: {$gte: 20, $lt: 40}} “query by example” plus $ modifiers: http://www.mongodb.org/display/DOCS/Advanced+Queries Tuesday, November 15, 11

Slide 55

Slide 55 text

Advanced Queries db.posts.find({$where: “this.author == ‘brendan’ || this.title == ‘foo’”}) $gt, $lt, $gte, $lte, $ne, $all, $in, $nin, $or, $not, $mod, $size, $exists, $type, $elemMatch Tuesday, November 15, 11

Slide 56

Slide 56 text

Cursors > var c = db.test.find({x: 20}).skip(20).limit(10)> c.next() > c.next() ... $gt, $lt, $gte, $lte, $ne, $all, $in, $nin, $or, $not, $mod, $size, $exists, $type, $elemMatch query first N results + cursor id getMore w/ cursor id next N results + cursor id or 0 ... Tuesday, November 15, 11

Slide 57

Slide 57 text

Commands drop, count, copydb, findAndModify, serverStatus, etc .... http://www.mongodb.org/display/DOCS/Commands Tuesday, November 15, 11

Slide 58

Slide 58 text

http://www.mongodb.org/display/DOCS/Commands db.$cmd.find({drop: "foo"}).limit(-1); db.foo.drop(); db.foo.runCommand({drop: "foo"}); db.$cmd.findOne({drop: "foo"}); = = = Tuesday, November 15, 11

Slide 59

Slide 59 text

Indexing and Query Optimization Tuesday, November 15, 11

Slide 60

Slide 60 text

Cursors find({x: 10, y: “foo”}) scan index on x index on y remember terminate Tuesday, November 15, 11

Slide 61

Slide 61 text

Posts With A Tag db.posts.find({“tags”: “mongodb”}) ... Queried Fast! (multi-key indexes) db.posts.ensureIndex({“tags”: 1}) Tuesday, November 15, 11

Slide 62

Slide 62 text

Indexing / Querying on Embedded Docs (dot notation) db.posts.ensureIndex({“comments.author”: 1}) db.posts.find({“comments.author”: “eliot”}) Tuesday, November 15, 11

Slide 63

Slide 63 text

•Surprise: Queries which don't hit indexes make heavy use of CPU & Disk •Deployment Trick: Avoid counting & computing on the fly by caching & precomputing data Full Tablescans Tuesday, November 15, 11

Slide 64

Slide 64 text

•Working set should be, as much as possible, in memory •Your entire dataset need not be! Working set is crucial!!! Tuesday, November 15, 11

Slide 65

Slide 65 text

•Working set should be, as much as possible, in memory •Your entire dataset need not be! Working set is crucial!!! Tuesday, November 15, 11

Slide 66

Slide 66 text

Indexes • Index on “Foo, Bar, Baz” works for “Foo”, “Foo, Bar” and “Foo, Bar, Baz” • The Query Optimizer figures out the order but can’t do things in reverse • You can pass hints to force a specific index: db.collection.find({username: ‘foo’, city: ‘New York’}).hint({‘username’: 1}) • Missing Values are indexed as “null” • This includes unique indexes • system.indexes Tuesday, November 15, 11

Slide 67

Slide 67 text

Lots of Other Fancy Indexes • Geospatial • Where on Earth? • Where On Scrabble? • Where On ? • Covered Indexes • Answering Queries Entirely From Index (1.8+) • Sparse Indexes • Reduce Storage of Null Fields • Can’t answer “not in index” queries • Combine with Unique! Tuesday, November 15, 11

Slide 68

Slide 68 text

DB Profiling is your Friend • Ensure your queries are being executed correctly • Enable profiling • db.setProfilingLevel(n) • n=1: slow operations, n=2: all operations • Viewing profile information • db.system.profile.find({info: /test.foo/}) •http://www.mongodb.org/display/DOCS/Database+Profiler • Query execution plan: • db.xx.find({..}).explain() •http://www.mongodb.org/display/DOCS/Optimization • Make sure your Queries are properly indexed. • Deployment Trick: Start mongod with --notablescan to disable tablescans Tuesday, November 15, 11

Slide 69

Slide 69 text

MongoDB and Java Tuesday, November 15, 11

Slide 70

Slide 70 text

MongoDB from Java 1 import com.mongodb.*; 2 import org.bson.types.ObjectId; 3 4 import java.net.*; 5 import java.util.*; 6 7 class Book { 8 9 public Book(ObjectId id, List author, 10 String isbn, Price price, int publicationYear, 11 List tags, String title, String publisher, 12 String edition) { 13 _id = id; 14 _author = author; 15 _isbn = isbn; 16 _price = price; 17 _publicationYear = publicationYear; 18 _tags = tags; 19 _title = title; 20 _publisher = publisher; 21 _edition = edition; 22 } 23 24 public Book(DBObject doc) { 25 _id = (ObjectId) doc.get( "_id" ); 26 BasicDBList authors = (BasicDBList) doc.get( "author" ); 27 _author = new ArrayList( ); 28 Tuesday, November 15, 11

Slide 71

Slide 71 text

MongoDB from Java 28 29 System.out.println( "authors: " + authors ); 30 for (Object e : authors.toArray()) { 31 _author.add( new Author( (String) e ) ); 32 } 33 34 _isbn = (String) doc.get( "isbn" ); 35 _price = new Price( (DBObject) doc.get( "price" ) ); 36 // 37 _publicationYear = ((Number) doc.get( "publicationYear" )).intValue(); 38 _tags = new ArrayList( ); 39 for (Object t : ((BasicDBList) doc.get( "tags" )).toArray() ) { 40 _tags.add( (String) t ); 41 } 42 43 _title = (String) doc.get( "title" ); 44 _publisher = (String) doc.get( "publisher" ); 45 _edition = (String) doc.get( "edition" ); 46 } 47 48 public List getAuthor(){ 49 return _author; 50 } 51 52 public void setAuthor( List author ){ 53 _author = author; 54 } 55 Tuesday, November 15, 11

Slide 72

Slide 72 text

MongoDB from Java 56 public String getEdition(){ 57 return _edition; 58 } 59 60 public void setEdition( String edition ){ 61 _edition = edition; 62 } 63 64 public ObjectId getId(){ 65 return _id; 66 } 67 68 public void setId( ObjectId id ){ 69 _id = id; 70 } 71 72 public String getIsbn(){ 73 return _isbn; 74 } 75 76 public void setIsbn( String isbn ){ 77 _isbn = isbn; 78 } 79 80 public Price getPrice(){ 81 return _price; 82 } 83 Tuesday, November 15, 11

Slide 73

Slide 73 text

MongoDB from Java 84 public void setPrice( Price price ){ 85 _price = price; 86 } 87 88 public int getPublicationYear(){ 89 return _publicationYear; 90 } 91 92 public void setPublicationYear( int publicationYear ){ 93 _publicationYear = publicationYear; 94 } 95 96 public String getPublisher(){ 97 return _publisher; 98 } 99 100 public void setPublisher( String publisher ){ 101 _publisher = publisher; 102 } 103 104 public List getTags(){ 105 return _tags; 106 } 107 108 public void setTags( List tags ){ 109 _tags = tags; 110 } 111 Tuesday, November 15, 11

Slide 74

Slide 74 text

MongoDB from Java 112 public String getTitle(){ 113 return _title; 114 } 115 116 public void setTitle( String title ){ 117 _title = title; 118 } 119 120 121 private ObjectId _id; 122 private List _author; 123 private String _isbn; 124 private Price _price; 125 126 private int _publicationYear; 127 private List _tags; 128 private String _title; 129 private String _publisher; 130 private String _edition; 131 132 public String toString(){ 133 return "Book{" + 134 "_author=" + _author + 135 ", _id=" + _id + 136 ", _isbn='" + _isbn + '\'' + 137 ", _price=" + _price + 138 ", _publicationYear=" + _publicationYear + Tuesday, November 15, 11

Slide 75

Slide 75 text

MongoDB from Java 139 ", _tags=" + _tags + 140 ", _title='" + _title + '\'' + 141 ", _publisher='" + _publisher + '\'' + 142 ", _edition='" + _edition + '\'' + 143 '}'; 144 } 145 146 private DBObject toDBObject(){ 147 BasicDBList authors = new BasicDBList(); 148 for (Author a : _author) { 149 authors.add( a.getName() ); 150 } 151 return BasicDBObjectBuilder.start( ).add( "author", authors ). 152 add( "_id", _id ). 153 add( "isbn", _isbn ). 154 add( "price", _price.toDBObject() ). 155 add( "publicationYear", _publicationYear ). 156 add( "tags", _tags ). 157 add( "title", _title ). 158 add( "publisher", _publisher ). 159 add( "edition", _edition ).get(); 160 } 161 162 public static class Author { 163 public Author(String name) { 164 _name = name; 165 } Tuesday, November 15, 11

Slide 76

Slide 76 text

MongoDB from Java 167 public String getName(){ 168 return _name; 169 } 170 171 public void setName( String name ){ 172 _name = name; 173 } 174 175 private String _name; 176 177 public String toString(){ 178 return "Author{" + 179 "_name='" + _name + '\'' + 180 '}'; 181 } 182 183 184 } 185 186 public static class Price { 187 188 public Price(String currency, double discount, 189 double msrp) { 190 _currency = currency; 191 _discount = discount; 192 _msrp = msrp; 193 } 194 Tuesday, November 15, 11

Slide 77

Slide 77 text

MongoDB from Java 195 public Price( DBObject price ){ 196 _currency = (String) price.get( "currency" ); 197 _discount = ((Number) price.get( "discount" )).doubleValue(); 198 _msrp = ((Number) price.get( "msrp" )).doubleValue(); 199 } 200 201 public String getCurrency(){ 202 return _currency; 203 } 204 205 public void setCurrency( String currency ){ 206 _currency = currency; 207 } 208 209 public double getDiscount(){ 210 return _discount; 211 } 212 213 public void setDiscount( double discount ){ 214 _discount = discount; 215 } 216 217 public double getMsrp(){ 218 return _msrp; 219 } 220 221 public void setMsrp( double msrp ){ 222 _msrp = msrp; 223 } Tuesday, November 15, 11

Slide 78

Slide 78 text

MongoDB from Java 225 private String _currency; 226 private double _discount; 227 private double _msrp; 228 229 public String toString(){ 230 return "Price{" + 231 "_currency='" + _currency + '\'' + 232 ", _discount=" + _discount + 233 ", _msrp=" + _msrp + 234 '}'; 235 } 236 237 public DBObject toDBObject(){ 238 return BasicDBObjectBuilder.start( ).add( "currency", _currency ). 239 add( "discount", _discount ). 240 add( "msrp", _msrp ).get(); 241 } 242 } 243 244 245 246 public static List findAll() 247 throws UnknownHostException{ 248 List books = new ArrayList( ); 249 for (DBObject doc : c.find( )) { 250 books.add( new Book( doc ) ); 251 } 252 return books; 253 } 254 Tuesday, November 15, 11

Slide 79

Slide 79 text

MongoDB from Java 255 private static Mongo m; 256 257 static{ 258 try { 259 m = new Mongo(); 260 } 261 catch ( UnknownHostException e ) { 262 e.printStackTrace(); //To change body of catch statement use File | Settings | File Templates. 263 } 264 } 265 266 private static DB db = m.getDB( "bookstore" ); 267 private static DBCollection c = db.getCollection( "books" ); 268 269 public static void main(String[] args) 270 throws UnknownHostException{ 271 for (Book book : findAll()) { 272 System.out.println( " " + book ); 273 } 274 275 Author tim = new Author( "Timothy Perrett" ); 276 List authors = new ArrayList( ); 277 authors.add( tim ); 278 279 Price price = new Price( "USD", 39.99, 39.99 ); 280 Tuesday, November 15, 11

Slide 80

Slide 80 text

MongoDB from Java 281 List tags = new ArrayList( ); 282 283 tags.add( "functional programming" ); 284 tags.add( "scala" ); 285 tags.add( "web development" ); 286 tags.add( "lift" ); 287 tags.add( "#legendofklang" ); 288 289 Book liftInAction = new Book( new ObjectId(), authors, "9781935182801", 290 price, 2011, tags, "Lift in Action: The Simply Functional Web Framework for Scala", 291 "Manning Publications Co.", "First"); 292 293 c.insert( liftInAction.toDBObject() ); 294 295 } 296 297 298 } Tuesday, November 15, 11

Slide 81

Slide 81 text

MongoDB and Scala Tuesday, November 15, 11

Slide 82

Slide 82 text

MongoDB from Scala (casbah) 1 import com.mongodb.casbah.Imports._ 2 3 case class Book(id: ObjectId, author: Seq[Author], isbn: String, 4 price: Price, publicationYear: Int, tags: Seq[String], 5 title: String, publisher: String, edition: Option[String]) { 6 7 def toDBObject = MongoDBObject( 8 "author" -> author.map { a => a.name }, 9 "_id" -> id, 10 "isbn" -> isbn, 11 "price" -> price.toDBObject, 12 "publicationYear" -> publicationYear, 13 "tags" -> tags, 14 "title" -> title, 15 "publisher" -> publisher, 16 "edition" -> edition 17 ) 18 } 19 20 case class Author(name: String) 21 22 case class Price(currency: String, discount: Double, msrp: Double) { 23 def toDBObject = MongoDBObject( 24 "currency" -> currency, 25 "discount" -> discount, 26 "msrp" -> msrp 27 ) 28 } 29 Tuesday, November 15, 11

Slide 83

Slide 83 text

MongoDB from Scala (casbah) 30 31 object Books extends App { 32 val mongo = MongoConnection()("bookstore")("books") 33 34 findAll().foreach(b => println(" " + b)) 35 36 val tim = Author("Timothy Perrett") 37 val authors = Seq(tim) 38 val price = Price("USD", 39.99, 39.99) 39 40 val tags = Seq("functional programming", "scala", "web development", "lift", "#legendofklang") 41 42 val liftInAction = Book(new ObjectId, authors, "9781935182801", 43 price, 2011, tags, "Lift in Action: The Simply Functional Web Framework for Scala", 44 "Manning Publications Co.", Some("First")) 45 46 mongo.insert(liftInAction.toDBObject) 47 48 def findAll() = for ( book <- mongo.find() ) yield newBook(book) 49 50 def newBook(doc: MongoDBObject) = Book( 51 doc.as[ObjectId]("_id"), 52 doc.as[BasicDBList]("author").map(a => Author(a.asInstanceOf[String])).toSeq, 53 doc.as[String]("isbn"), Tuesday, November 15, 11

Slide 84

Slide 84 text

MongoDB from Scala (casbah) 54 newPrice(doc.as[DBObject]("price")), 55 doc.as[Number]("publicationYear").intValue, 56 doc.as[BasicDBList] ("tags").map(_.toString).toSeq, 57 doc.as[String]("title"), 58 doc.as[String]("publisher"), 59 doc.getAs[String]("edition") 60 ) 61 62 def newPrice(doc: MongoDBObject) = Price( 63 doc.as[String]("currency"), 64 doc.as[Number]("discount").doubleValue, 65 doc.as[Number]("msrp").doubleValue 66 ) 67 } Tuesday, November 15, 11

Slide 85

Slide 85 text

Object Document Mapping Tuesday, November 15, 11

Slide 86

Slide 86 text

A New Paradigm for Mapping Objects <-> Documents Tuesday, November 15, 11

Slide 87

Slide 87 text

A New Paradigm for Mapping Objects <-> Documents • While the ORM Pattern can be a disaster, well designed Documents map well to a typical object hierarchy Tuesday, November 15, 11

Slide 88

Slide 88 text

A New Paradigm for Mapping Objects <-> Documents • While the ORM Pattern can be a disaster, well designed Documents map well to a typical object hierarchy • The world of ODMs for MongoDB has evolved in many languages, with fantastic tools in Scala, Java, Python and Ruby Tuesday, November 15, 11

Slide 89

Slide 89 text

A New Paradigm for Mapping Objects <-> Documents • While the ORM Pattern can be a disaster, well designed Documents map well to a typical object hierarchy • The world of ODMs for MongoDB has evolved in many languages, with fantastic tools in Scala, Java, Python and Ruby • Typically “relationship” fields can be defined to be either “embedded” or “referenced” Tuesday, November 15, 11

Slide 90

Slide 90 text

ODM Systems for the JVM < Java > Tuesday, November 15, 11

Slide 91

Slide 91 text

ODM Systems for the JVM < Java > • Two Major ODMs in the Java World Tuesday, November 15, 11

Slide 92

Slide 92 text

ODM Systems for the JVM < Java > • Two Major ODMs in the Java World • Morphia Tuesday, November 15, 11

Slide 93

Slide 93 text

ODM Systems for the JVM < Java > • Two Major ODMs in the Java World • Morphia • JPA Inspired Tuesday, November 15, 11

Slide 94

Slide 94 text

ODM Systems for the JVM < Java > • Two Major ODMs in the Java World • Morphia • JPA Inspired • Annotation Driven; Able to integrate with existing objects Tuesday, November 15, 11

Slide 95

Slide 95 text

ODM Systems for the JVM < Java > • Two Major ODMs in the Java World • Morphia • JPA Inspired • Annotation Driven; Able to integrate with existing objects • Written purely for MongoDB, strong coupling Tuesday, November 15, 11

Slide 96

Slide 96 text

ODM Systems for the JVM < Java > • Two Major ODMs in the Java World • Morphia • JPA Inspired • Annotation Driven; Able to integrate with existing objects • Written purely for MongoDB, strong coupling • Spring-Data-Document Tuesday, November 15, 11

Slide 97

Slide 97 text

ODM Systems for the JVM < Java > • Two Major ODMs in the Java World • Morphia • JPA Inspired • Annotation Driven; Able to integrate with existing objects • Written purely for MongoDB, strong coupling • Spring-Data-Document • Part of the Spring-Data System Tuesday, November 15, 11

Slide 98

Slide 98 text

ODM Systems for the JVM < Java > • Two Major ODMs in the Java World • Morphia • JPA Inspired • Annotation Driven; Able to integrate with existing objects • Written purely for MongoDB, strong coupling • Spring-Data-Document • Part of the Spring-Data System • Follows the Spring paradigms; comfortable to the Spring veteran Tuesday, November 15, 11

Slide 99

Slide 99 text

ODM Systems for the JVM < Java > • Two Major ODMs in the Java World • Morphia • JPA Inspired • Annotation Driven; Able to integrate with existing objects • Written purely for MongoDB, strong coupling • Spring-Data-Document • Part of the Spring-Data System • Follows the Spring paradigms; comfortable to the Spring veteran • Designed for multiple datastores (Couch support forthcoming); less strongly coupled Tuesday, November 15, 11

Slide 100

Slide 100 text

Object Mapping with Morphia 11 @Entity("books") 12 class Book { 13 14 public Book() {} 15 16 17 public Book(ObjectId id, List author, 18 String isbn, Price price, int publicationYear, 19 List tags, String title, String publisher, 20 String edition) { 21 this.id = id; 22 this.author = author; 23 this.isbn = isbn; 24 this.price = price; 25 this.year = publicationYear; 26 this.tags = tags; 27 this.title = title; 28 this.publisher = publisher; 29 this.edition = edition; 30 } 31 32 @Id private ObjectId id; 33 34 private List author = new ArrayList(); 35 private String isbn; Tuesday, November 15, 11

Slide 101

Slide 101 text

Object Mapping with Morphia 36 /** 37 * Could also use "reference", which are stored to 38 * their own collection and loaded automatically 39 * 40 * Morphia uses the field name for where to store the value, 41 * can override with a string arg to the annotation. 42 */ 43 @Embedded private Price price; 44 45 /** 46 * Can rename a field for how stored in MongoDB 47 */ 48 @Property("publicationYear") private int year; 49 50 private List tags = new ArrayList(); 51 private String title; 52 private String publisher; 53 private String edition; 54 55 public String toString(){ 56 return "Book{" + 57 "author=" + author + 58 ", id=" + id + 59 ", isbn='" + isbn + '\'' + 60 ", price=" + price + 61 ", year=" + year + 62 ", tags=" + tags + Tuesday, November 15, 11

Slide 102

Slide 102 text

Object Mapping with Morphia 63 ", title='" + title + '\'' + 64 ", publisher='" + publisher + '\'' + 65 ", edition='" + edition + '\'' + 66 '}'; 67 } 68 69 public static class Price { 70 71 public Price(){} 72 73 public Price( String currency, double discount, double msrp ){ 74 this.currency = currency; 75 this.discount = discount; 76 this.msrp = msrp; 77 } 78 79 private String currency; 80 private double discount; 81 private double msrp; 82 83 public String toString(){ 84 return "Price{" + 85 "currency='" + currency + '\'' + 86 ", discount=" + discount + 87 ", msrp=" + msrp + 88 '}'; 89 } 90 } Tuesday, November 15, 11

Slide 103

Slide 103 text

Object Mapping with Morphia 91 92 93 public static void main(String args[]) 94 throws UnknownHostException{ 95 Datastore ds = new Morphia().createDatastore("bookstore"); 96 97 for (Book book : ds.find(Book.class)) 98 System.out.println( book ); 99 100 List authors = new ArrayList( ); 101 authors.add( "Timothy Perrett" ); 102 103 Price price = new Price( "USD", 39.99, 39.99 ); 104 105 List tags = new ArrayList( ); 106 107 tags.add( "functional programming" ); 108 tags.add( "scala" ); 109 tags.add( "web development" ); 110 tags.add( "lift" ); 111 tags.add( "#legendofklang" ); 112 ds.save( new Book( new ObjectId(), authors, "9781935182801", 113 price, 2011, tags, "Lift in Action: The Simply Functional Web Framework for Scala", 114 "Manning Publications Co.", "First") ); 115 } 116 } Tuesday, November 15, 11

Slide 104

Slide 104 text

ODM Systems for the JVM < Scala > Tuesday, November 15, 11

Slide 105

Slide 105 text

ODM Systems for the JVM < Scala > • Two Major ODMs in the Scala World Tuesday, November 15, 11

Slide 106

Slide 106 text

ODM Systems for the JVM < Scala > • Two Major ODMs in the Scala World • Lift-MongoDB-Record Tuesday, November 15, 11

Slide 107

Slide 107 text

ODM Systems for the JVM < Scala > • Two Major ODMs in the Scala World • Lift-MongoDB-Record • Based on the Record pattern Tuesday, November 15, 11

Slide 108

Slide 108 text

ODM Systems for the JVM < Scala > • Two Major ODMs in the Scala World • Lift-MongoDB-Record • Based on the Record pattern • Requires entirely custom objects following Record paradigm Tuesday, November 15, 11

Slide 109

Slide 109 text

ODM Systems for the JVM < Scala > • Two Major ODMs in the Scala World • Lift-MongoDB-Record • Based on the Record pattern • Requires entirely custom objects following Record paradigm • Strongly coupled to MongoDB but still bound by Record Tuesday, November 15, 11

Slide 110

Slide 110 text

ODM Systems for the JVM < Scala > • Two Major ODMs in the Scala World • Lift-MongoDB-Record • Based on the Record pattern • Requires entirely custom objects following Record paradigm • Strongly coupled to MongoDB but still bound by Record • Bonus: Absolutely incredible “Rogue” DSL from Foursquare for taking Lift-MongoDB-Record further Tuesday, November 15, 11

Slide 111

Slide 111 text

ODM Systems for the JVM < Scala > • Two Major ODMs in the Scala World • Lift-MongoDB-Record • Based on the Record pattern • Requires entirely custom objects following Record paradigm • Strongly coupled to MongoDB but still bound by Record • Bonus: Absolutely incredible “Rogue” DSL from Foursquare for taking Lift-MongoDB-Record further • Salat Tuesday, November 15, 11

Slide 112

Slide 112 text

ODM Systems for the JVM < Scala > • Two Major ODMs in the Scala World • Lift-MongoDB-Record • Based on the Record pattern • Requires entirely custom objects following Record paradigm • Strongly coupled to MongoDB but still bound by Record • Bonus: Absolutely incredible “Rogue” DSL from Foursquare for taking Lift-MongoDB-Record further • Salat • Built by same team who helped start Casbah (scala driver) Tuesday, November 15, 11

Slide 113

Slide 113 text

ODM Systems for the JVM < Scala > • Two Major ODMs in the Scala World • Lift-MongoDB-Record • Based on the Record pattern • Requires entirely custom objects following Record paradigm • Strongly coupled to MongoDB but still bound by Record • Bonus: Absolutely incredible “Rogue” DSL from Foursquare for taking Lift-MongoDB-Record further • Salat • Built by same team who helped start Casbah (scala driver) • Annotation driven, built from ground up to apply onto existing business objects cleanly Tuesday, November 15, 11

Slide 114

Slide 114 text

ODM Systems for the JVM < Scala > • Two Major ODMs in the Scala World • Lift-MongoDB-Record • Based on the Record pattern • Requires entirely custom objects following Record paradigm • Strongly coupled to MongoDB but still bound by Record • Bonus: Absolutely incredible “Rogue” DSL from Foursquare for taking Lift-MongoDB-Record further • Salat • Built by same team who helped start Casbah (scala driver) • Annotation driven, built from ground up to apply onto existing business objects cleanly • Strongly coupled to MongoDB Tuesday, November 15, 11

Slide 115

Slide 115 text

Object Mapping with Lift 5 class Book private() extends BsonRecord[Book] with ObjectIdPk[Book] { 6 def meta = Book 7 8 object author extends MongoListField[Book, String](this) 9 object isbn extends StringField(this, 64) 10 object price extends BsonRecordField(this, Price) 11 object publicationYear extends IntField(this) 12 object tags extends MongoListField[Book, String](this) 13 object title extends StringField(this, 255) 14 object publisher extends StringField(this, 128) 15 object edition extends StringField(this, 32) { 16 override def optional_? = true 17 } 18 19 } 20 21 object Book extends Book with MongoMetaRecord[Book] 22 23 class Price private() extends BsonRecord[Price] { 24 def meta = Price 25 26 object currency extends StringField(this, 3) 27 object discount extends DoubleField(this) 28 object msrp extends DoubleField(this) 29 } 30 Tuesday, November 15, 11

Slide 116

Slide 116 text

Object Mapping with Lift 31 val price = Price.createRecord.currency("USD").discount(39.99).msrp(39.99) 32 val liftInAction = Book.createRecord.author(Seq("Timothy Perrett")). 33 isbn("9781935182801"). 34 price(price). 35 publicationYear(2011). 36 tags(Seq("functional programming", "scala", "web development", "lift", "#legendofklang")). 37 title("Lift in Action: The Simply Functional Web Framework for Scala"). 38 publisher("Manning Publications Co."). 39 edition("First").save 40 41 42 // vim: set ts=2 sw=2 sts=2 et: Tuesday, November 15, 11

Slide 117

Slide 117 text

Object Mapping with Salat 1 /** Using our existing "Books" case classes, Salat can simply 2 * Serialize them to MongoDB with it's "grater system" 3 */ 4 5 val tim = Author("Timothy Perrett") 6 val authors = Seq(tim) 7 val price = Price("USD", 39.99, 39.99) 8 9 val tags = Seq("functional programming", "scala", "web development", "lift", "#legendofklang") 10 11 val liftInAction = Book(new ObjectId, authors, "9781935182801", 12 price, 2011, tags, "Lift in Action: The Simply Functional Web Framework for Scala", 13 "Manning Publications Co.", Some("First")) 14 15 /** 16 * Instead of our custom "toDBObject" method, 17 * the Salat Grater uses runtime Scala type reflection to 18 * generate a MongoDB Object. 19 */ 20 val dbo = grater[Book].asDBObject(liftInAction) 21 22 mongo.save(dbo) 23 24 Tuesday, November 15, 11

Slide 118

Slide 118 text

Advanced MongoDB Tuesday, November 15, 11

Slide 119

Slide 119 text

Hadoop and MongoDB • Input and Output Formats for MongoDB + Hadoop • Process MongoDB data inside of Hadoop, output back out to MongoDB • Growing support for the “deeper” Hadoop infrastructure such as Pig, Cascading, Hive, etc. Tuesday, November 15, 11

Slide 120

Slide 120 text

Beyond Web Applications Tuesday, November 15, 11

Slide 121

Slide 121 text

Some Solid “non-web” Use Patterns • Event / Pipeline Processing • Logging • Graylog2 • Flume Sink • Durable Messaging • Broadcast Messaging • Pub/Sub Tuesday, November 15, 11

Slide 122

Slide 122 text

Today, Let’s Discuss Messaging Tuesday, November 15, 11

Slide 123

Slide 123 text

Today, Let’s Discuss Messaging • Messaging is something that, with MongoDB, you “roll your own” Tuesday, November 15, 11

Slide 124

Slide 124 text

Today, Let’s Discuss Messaging • Messaging is something that, with MongoDB, you “roll your own” • But there are great builtin facilities to make it easier for you to do it Tuesday, November 15, 11

Slide 125

Slide 125 text

Today, Let’s Discuss Messaging • Messaging is something that, with MongoDB, you “roll your own” • But there are great builtin facilities to make it easier for you to do it • Capped Collections Tuesday, November 15, 11

Slide 126

Slide 126 text

Today, Let’s Discuss Messaging • Messaging is something that, with MongoDB, you “roll your own” • But there are great builtin facilities to make it easier for you to do it • Capped Collections • Tailable cursors Tuesday, November 15, 11

Slide 127

Slide 127 text

Today, Let’s Discuss Messaging • Messaging is something that, with MongoDB, you “roll your own” • But there are great builtin facilities to make it easier for you to do it • Capped Collections • Tailable cursors • findAndModify Tuesday, November 15, 11

Slide 128

Slide 128 text

WTF Is A Capped Collection? Tuesday, November 15, 11

Slide 129

Slide 129 text

WTF Is A Capped Collection? • Special size bounded MongoDB collection designed for Replication Tuesday, November 15, 11

Slide 130

Slide 130 text

WTF Is A Capped Collection? • Special size bounded MongoDB collection designed for Replication • Created specially with a number of bytes it may hold Tuesday, November 15, 11

Slide 131

Slide 131 text

WTF Is A Capped Collection? • Special size bounded MongoDB collection designed for Replication • Created specially with a number of bytes it may hold • No _id index Tuesday, November 15, 11

Slide 132

Slide 132 text

WTF Is A Capped Collection? • Special size bounded MongoDB collection designed for Replication • Created specially with a number of bytes it may hold • No _id index • Documents are maintained in insertion order Tuesday, November 15, 11

Slide 133

Slide 133 text

WTF Is A Capped Collection? • Special size bounded MongoDB collection designed for Replication • Created specially with a number of bytes it may hold • No _id index • Documents are maintained in insertion order • No deletes allowed Tuesday, November 15, 11

Slide 134

Slide 134 text

WTF Is A Capped Collection? • Special size bounded MongoDB collection designed for Replication • Created specially with a number of bytes it may hold • No _id index • Documents are maintained in insertion order • No deletes allowed • Updates only allowed if document won’t “grow” Tuesday, November 15, 11

Slide 135

Slide 135 text

WTF Is A Capped Collection? • Special size bounded MongoDB collection designed for Replication • Created specially with a number of bytes it may hold • No _id index • Documents are maintained in insertion order • No deletes allowed • Updates only allowed if document won’t “grow” • As collection fills up, oldest entries “fall out” Tuesday, November 15, 11

Slide 136

Slide 136 text

WTF Is A Capped Collection? • Special size bounded MongoDB collection designed for Replication • Created specially with a number of bytes it may hold • No _id index • Documents are maintained in insertion order • No deletes allowed • Updates only allowed if document won’t “grow” • As collection fills up, oldest entries “fall out” • Allow for a special cursor type: Tailable Cursors Tuesday, November 15, 11

Slide 137

Slide 137 text

tail -f `mongo ‘db.data.find()’` Tuesday, November 15, 11

Slide 138

Slide 138 text

tail -f `mongo ‘db.data.find()’` • Tailable Cursors mode are a special cursor mode in MongoDB Tuesday, November 15, 11

Slide 139

Slide 139 text

tail -f `mongo ‘db.data.find()’` • Tailable Cursors mode are a special cursor mode in MongoDB • Similar to Unix’ ‘tail -f’, maintain a pointer to the last document seen; continue moving forward as new documents added Tuesday, November 15, 11

Slide 140

Slide 140 text

tail -f `mongo ‘db.data.find()’` • Tailable Cursors mode are a special cursor mode in MongoDB • Similar to Unix’ ‘tail -f’, maintain a pointer to the last document seen; continue moving forward as new documents added • With “Await” cursor mode, can poll until new documents arrive Tuesday, November 15, 11

Slide 141

Slide 141 text

tail -f `mongo ‘db.data.find()’` • Tailable Cursors mode are a special cursor mode in MongoDB • Similar to Unix’ ‘tail -f’, maintain a pointer to the last document seen; continue moving forward as new documents added • With “Await” cursor mode, can poll until new documents arrive • Incredibly efficient for non-indexed queries Tuesday, November 15, 11

Slide 142

Slide 142 text

Broadcast Messaging Made Easy Tuesday, November 15, 11

Slide 143

Slide 143 text

Broadcast Messaging Made Easy • This mechanism allows for very easy broadcast messaging Tuesday, November 15, 11

Slide 144

Slide 144 text

Broadcast Messaging Made Easy • This mechanism allows for very easy broadcast messaging • ... In fact, it is exactly how MongoDB does replication Tuesday, November 15, 11

Slide 145

Slide 145 text

Broadcast Messaging Made Easy • This mechanism allows for very easy broadcast messaging • ... In fact, it is exactly how MongoDB does replication • Because you can’t delete messages this wouldn’t be ideal for pub/sub Tuesday, November 15, 11

Slide 146

Slide 146 text

Broadcast Messaging Made Easy • This mechanism allows for very easy broadcast messaging • ... In fact, it is exactly how MongoDB does replication • Because you can’t delete messages this wouldn’t be ideal for pub/sub • But could be paired carefully with findAndModify Tuesday, November 15, 11

Slide 147

Slide 147 text

Pub/Sub and findAndModify Tuesday, November 15, 11

Slide 148

Slide 148 text

Pub/Sub and findAndModify • Compare and Swap / ABA Problems can be tricky • AKA “Distributed Locking is Hard - Let’s Go Shopping!” • MongoDB’s update doesn’t allow you to fetch the exact document(s) changed • The findAndModify command enables a proper mechanism • Find and modify first matching document and return new doc or old one • Find and remove first matching document and return the pre-removed document • Isolated; two competing threads won’t get the same document Tuesday, November 15, 11

Slide 149

Slide 149 text

Lots of Ideas to be Explored • Akka (Scala distributed computing & Actor framework) now includes a MongoDB based durable mailbox, using these concepts for unbounded (soon: bounded) messaging • 10gen’s MMS monitoring service uses findAndModify to facilitate worker queues Tuesday, November 15, 11

Slide 150

Slide 150 text

Example of Akka Durable 15 class MongoBasedMailboxSpec extends DurableMailboxSpec("mongodb", MongoNaiveDurableMailboxStorage) { 16 import org.apache.log4j.{ Logger, Level } 17 import com.mongodb.async._ 18 19 val mongo = MongoConnection("localhost", 27017)("akka") 20 21 mongo.dropDatabase() { success 㱺 } 22 23 Logger.getRootLogger.setLevel(Level.DEBUG) 24 } 25 26 object DurableMongoMailboxSpecActorFactory { 27 28 class MongoMailboxTestActor extends Actor { 29 self.lifeCycle = Temporary 30 def receive = { 31 case "sum" => self.reply("sum") 32 } 33 } 34 35 def createMongoMailboxTestActor(id: String)(implicit dispatcher: MessageDispatcher): ActorRef = { 36 val queueActor = localActorOf[MongoMailboxTestActor] 37 queueActor.dispatcher = dispatcher 38 queueActor.start 39 } 40 } 41 Tuesday, November 15, 11

Slide 151

Slide 151 text

Example of Akka Durable 42 class MongoBasedMailboxSpec extends WordSpec with MustMatchers with BeforeAndAfterEach with BeforeAndAfterAll { 43 import DurableMongoMailboxSpecActorFactory._ 44 45 implicit val dispatcher = DurableDispatcher("mongodb", MongoNaiveDurableMailboxStorage, 1) 46 47 "A MongoDB based naive mailbox backed actor" should { 48 "should handle reply to ! for 1 message" in { 49 val latch = new CountDownLatch(1) 50 val queueActor = createMongoMailboxTestActor("mongoDB Backend should handle Reply to !") 51 val sender = localActorOf(new Actor { def receive = { case "sum" => latch.countDown } }).start 52 53 queueActor.!("sum")(Some(sender)) 54 latch.await(10, TimeUnit.SECONDS) must be (true) 55 } 56 57 "should handle reply to ! for multiple messages" in { 58 val latch = new CountDownLatch(5) 59 val queueActor = createMongoMailboxTestActor("mongoDB Backend should handle reply to !") 60 val sender = localActorOf( new Actor { def receive = { case "sum" => latch.countDown } } ).start 61 Tuesday, November 15, 11

Slide 152

Slide 152 text

Example of Akka Durable 62 queueActor.!("sum")(Some(sender)) 63 queueActor.!("sum")(Some(sender)) 64 queueActor.!("sum")(Some(sender)) 65 queueActor.!("sum")(Some(sender)) 66 queueActor.!("sum")(Some(sender)) 67 latch.await(10, TimeUnit.SECONDS) must be (true) 68 } 69 } 70 71 override def beforeEach() { 72 registry.local.shutdownAll 73 } 74 } Tuesday, November 15, 11

Slide 153

Slide 153 text

Scaling Tuesday, November 15, 11

Slide 154

Slide 154 text

Scaling •Operations/sec go up •Storage needs go up •Capacity •IOPs •Complexity goes up •Caching Tuesday, November 15, 11

Slide 155

Slide 155 text

• Optimization & Tuning • Schema & Index Design • O/S tuning • Hardware configuration • Vertical scaling • Hardware is expensive • Hard to scale in cloud How do you scale now? $$$ throughput Tuesday, November 15, 11

Slide 156

Slide 156 text

MongoDB Scaling - Single Node write read node_a1 Tuesday, November 15, 11

Slide 157

Slide 157 text

Read scaling - add Replicas write read node_b1 node_a1 Tuesday, November 15, 11

Slide 158

Slide 158 text

Read scaling - add Replicas write read node_c1 node_b1 node_a1 Tuesday, November 15, 11

Slide 159

Slide 159 text

Write scaling - Sharding write read shard1 node_c1 node_b1 node_a1 Tuesday, November 15, 11

Slide 160

Slide 160 text

Write scaling - add Shards write read shard1 node_c1 node_b1 node_a1 shard2 node_c2 node_b2 node_a2 Tuesday, November 15, 11

Slide 161

Slide 161 text

Write scaling - add Shards write read shard1 node_c1 node_b1 node_a1 shard2 node_c2 node_b2 node_a2 shard3 node_c3 node_b3 node_a3 Tuesday, November 15, 11

Slide 162

Slide 162 text

Scaling with MongoDB • Schema & Index Design • Sharding • Replication Tuesday, November 15, 11

Slide 163

Slide 163 text

Schemas Tuesday, November 15, 11

Slide 164

Slide 164 text

Schema • Data model effects performance • Embedding versus Linking • Roundtrips to database • Disk seek time • Size of data to read & write • Partial versus full document writes • Partial versus full document reads • Schema and Schema usage critical for scaling and performance Tuesday, November 15, 11

Slide 165

Slide 165 text

Indexes • Index common queries • Do not over index •(A) and (A,B) are equivalent, choose one • Right-balanced indexes keep working set small Tuesday, November 15, 11

Slide 166

Slide 166 text

Query for {a: 7} {...}  {...}  {...}  {...}  {...}  {...}  {...}  {...}  {...}  {...}  {...} [-­‐∞,  5) [5,  10) [10,  ∞) [5,  7) [7,  9) [9,  10) [10,  ∞)  buckets [-­‐∞,  5)  buckets With  Index Without  index  -­‐  Scan Tuesday, November 15, 11

Slide 167

Slide 167 text

Indexing Embedded Documents & Multikeys db.posts.save({    title:        “My  First  blog”,    tags:          [“mongodb”,  “cool”],    comments:  [          {author:  “James”,  ts  :  new  Date()}  ] }); db.posts.ensureIndex({“tags”:  1})   db.posts.ensureIndex({“comments.author”:  1}) Tuesday, November 15, 11

Slide 168

Slide 168 text

Picking an a Index find({x:  10,  y:  “foo”})    scan    index  on  x    index  on  y remember terminate Tuesday, November 15, 11

Slide 169

Slide 169 text

Sharding Tuesday, November 15, 11

Slide 170

Slide 170 text

What is Sharding • Ad-hoc partitioning • Consistent hashing • Amazon Dynamo • Range based partitioning • Google BigTable • Yahoo! PNUTS • MongoDB Tuesday, November 15, 11

Slide 171

Slide 171 text

MongoDB Sharding • Automatic partitioning and management • Range based • Convert to sharded system with no downtime • Fully consistent Tuesday, November 15, 11

Slide 172

Slide 172 text

How MongoDB Sharding works >  db.runCommand(  {  addshard  :  "shard1"  }  ); >  db.runCommand(        {  shardCollection  :  “mydb.blogs”,            key  :  {  age  :  1}  }  ) -∞ +∞ •Range keys from -∞ to +∞ •Ranges are stored as “chunks” Tuesday, November 15, 11

Slide 173

Slide 173 text

How MongoDB Sharding works >  db.posts.save(  {age:40}  ) -∞ +∞ -∞ 40 41 +∞ •Data in inserted •Ranges are split into more “chunks” Tuesday, November 15, 11

Slide 174

Slide 174 text

How MongoDB Sharding works >  db.posts.save(  {age:40}  ) >  db.posts.save(  {age:50}  ) -∞ +∞ -∞ 40 41 +∞ 41 50 51 +∞ •More Data in inserted •Ranges are split into more“chunks” Tuesday, November 15, 11

Slide 175

Slide 175 text

How MongoDB Sharding works >  db.posts.save(  {age:40}  ) >  db.posts.save(  {age:50}  ) >  db.posts.save(  {age:60}  ) -∞ +∞ -∞ 40 41 +∞ 41 50 51 +∞ 61 +∞ 51 60 Tuesday, November 15, 11

Slide 176

Slide 176 text

-∞ +∞ 41 +∞ 51 +∞ How MongoDB Sharding works >  db.posts.save(  {age:40}  ) >  db.posts.save(  {age:50}  ) >  db.posts.save(  {age:60}  ) -∞ 40 41 50 61 +∞ 51 60 Tuesday, November 15, 11

Slide 177

Slide 177 text

How MongoDB Sharding works -∞ 40 41 50 61 +∞ 51 60 shard1 Tuesday, November 15, 11

Slide 178

Slide 178 text

How MongoDB Sharding works >  db.runCommand(  {  addshard  :  "shard2"  }  ); -∞ 40 41 50 61 +∞ 51 60 Tuesday, November 15, 11

Slide 179

Slide 179 text

How MongoDB Sharding works >  db.runCommand(  {  addshard  :  "shard2"  }  ); -∞ 40 41 50 61 +∞ 51 60 shard1 Tuesday, November 15, 11

Slide 180

Slide 180 text

How MongoDB Sharding works >  db.runCommand(  {  addshard  :  "shard2"  }  ); -∞ 40 41 50 61 +∞ 51 60 shard1 shard2 Tuesday, November 15, 11

Slide 181

Slide 181 text

How MongoDB Sharding works >  db.runCommand(  {  addshard  :  "shard2"  }  ); -∞ 40 41 50 61 +∞ 51 60 shard1 shard2 >  db.runCommand(  {  addshard  :  "shard3"  }  ); shard3 Tuesday, November 15, 11

Slide 182

Slide 182 text

Sharding Features • Shard data without no downtime • Automatic balancing as data is written • Commands routed (switched) to correct node • Inserts - must have the Shard Key • Updates - must have the Shard Key • Queries • With Shard Key - routed to nodes • Without Shard Key - scatter gather • Indexed Queries • With Shard Key - routed in order • Without Shard Key - distributed sort merge Tuesday, November 15, 11

Slide 183

Slide 183 text

Replication Tuesday, November 15, 11

Slide 184

Slide 184 text

MongoDB Replication • MongoDB replication like MySQL replication •Asynchronous master/slave • Variations: •Master / slave •Replica Sets Tuesday, November 15, 11

Slide 185

Slide 185 text

• A cluster of N servers • Any (one) node can be primary • Consensus election of primary • Automatic failover • Automatic recovery • All writes to primary • Reads can be to primary (default) or a secondary Replica Set features Tuesday, November 15, 11

Slide 186

Slide 186 text

How MongoDB Replication works Member  1 Member  2 Member  3 •Set is made up of 2 or more nodes Tuesday, November 15, 11

Slide 187

Slide 187 text

How MongoDB Replication works Member  1 Member  2 PRIMARY Member  3 •Election establishes the PRIMARY •Data replication from PRIMARY to SECONDARY Tuesday, November 15, 11

Slide 188

Slide 188 text

How MongoDB Replication works Member  1 Member  2 DOWN Member  3 negotiate   new  master •PRIMARY may fail •Automatic election of new PRIMARY Tuesday, November 15, 11

Slide 189

Slide 189 text

How MongoDB Replication works Member  1 Member  2 DOWN Member  3 PRIMARY •New PRIMARY elected •Replication Set re-established Tuesday, November 15, 11

Slide 190

Slide 190 text

How MongoDB Replication works Member  1 Member  2 RECOVERING Member  3 PRIMARY •Automatic recovery Tuesday, November 15, 11

Slide 191

Slide 191 text

How MongoDB Replication works Member  1 Member  2 Member  3 PRIMARY •Replication Set re-established Tuesday, November 15, 11

Slide 192

Slide 192 text

Creating a Replica Set >  cfg  =  {        _id  :  "acme_a",        members  :  [            {  _id  :  0,  host  :  "sf1.acme.com"  },            {  _id  :  1,  host  :  "sf2.acme.com"  },            {  _id  :  2,  host  :  "sf3.acme.com"  }  ]  } >  use  admin >  db.runCommand(  {  replSetInitiate  :  cfg  }  ) Tuesday, November 15, 11

Slide 193

Slide 193 text

Replica Set Member Types • Normal {priority:1} • Passive {priority:0} • Cannot be elected as PRIMARY • Arbiters • Can vote in an election • Do not hold any data • Hidden {hidden:True} • Tagging - New in 2.0 • tags  :  {"dc":  "ny"},  "rack":  "r23s5"} Tuesday, November 15, 11

Slide 194

Slide 194 text

Using Replicas slaveOk() - driver will send read requests to Secondaries - driver will always send writes to Primary Java examples -­‐  DB.slaveOk() -­‐  Collection.slaveOk() -­‐  find(q).addOption(Bytes.QUERYOPTION_SLAVEOK); Tuesday, November 15, 11

Slide 195

Slide 195 text

Safe Writes •  db.runCommand({getLastError:  1,  w  :  1}) - ensure write is synchronous - command returns after primary has written to memory •  w=n  or  w='majority' - n is the number of nodes data must be replicated to - driver will always send writes to Primary • w='myTag' [MongoDB 2.0] - Each member is "tagged" e.g. "US_EAST", "EMEA", "US_WEST" - Ensure that the write is executed in each tagged "region" Tuesday, November 15, 11

Slide 196

Slide 196 text

Safe Writes • fsync:true - Ensures changed disk blocks are flushed to disk • j:true - Ensures changes are flush to Journal Tuesday, November 15, 11

Slide 197

Slide 197 text

Replication features • Reads from Primary are always consistent • Reads from Secondaries are eventually consistent • Automatic failover if a Primary fails • Automatic recovery when a node joins the set Tuesday, November 15, 11

Slide 198

Slide 198 text

Scaling Use Cases Tuesday, November 15, 11

Slide 199

Slide 199 text

Scaling Use Case • User profile information • Multiple ways to identify a "user" • Facebook ID • Twitter Name • Email address • SSN# / National Identifier • What is the best schema, index and sharding strategy? Tuesday, November 15, 11

Slide 200

Slide 200 text

Schema #1 >  db.profiles.save( {  _id  :  “”,  facebook_name  :  null,  twitter_name  :  “rit”,  linkedin_name  :  “bwmcadams”,  details  :  {  loc:  [51.49790,  -­‐0.18213],  ...} }) >  db.profiles.ensureIndex({twitter_name:1}) >  db.runCommand(        {  shardCollection  :  “social.profiles”,            key  :  {  twitter_name  :  1}  }  ) >  db.profiles.find({twitter_name:  "rit"}) >  db.profiles.find({linkedin_name:"bwmcadams"}) Tuesday, November 15, 11

Slide 201

Slide 201 text

Schema #1 Good: • Schema is simple to understand • Easy to add new identifiers, e.g. foursquare name • Query is routed to a shard db.profiles.find({twitter_name:  "rit"}) Bad: • Each identifier needs a separate index • More indexes means less data in memory • Memory contention and disk paging • Query is scatter/gathered across cluster db.profiles.find({linkedin_name:"bwmcadams"}) Tuesday, November 15, 11

Slide 202

Slide 202 text

Schema #2 >  db.profiles.save( {  _id  :  ObjectId("1234")  details  :  {loc:  [51.49790,  -­‐0.18213],  ...}}) //  One  Identifier  document  for  each  users  identifier,  link  back  to  profile >  db.identifiers.save( {  _id  :  {type:  "twitter_name",  value:  "rit”},    profile:  ObjectId("1234")}) >  db.identifiers.save( {  _id  :  {type:  "linkedin_name",  value:  "bwmcadams”},  profile:  ObjectId("1234")}) >  db.runCommand(   {  shardCollection  :  “social.identifiers”,  key  :  {  _id  :  1}  }  ) >  db.runCommand(   {  shardCollection  :  “social.profiles”,  key  :  {  _id  :  1}  }  ) >  db.identifiers.findOne( {_id  :  {type:  "twitter_name":  value:  "rit"}}) >  db.profiles.findOne( {_id  :  ObjectId("1234")}) Tuesday, November 15, 11

Slide 203

Slide 203 text

Schema #2 Good: • Easy to add new identifiers, e.g. foursquare name • All query are routed to a shard >  db.identifiers.find( {_id  :  {type:  "twitter_name":  value:  "rit"}}) >  db.identifiers.find( {_id  :  {type:  "foursquare_id":  value:  "bwmcadams"}}) Bad: • Schema is more complex • Two lookups are required for each access (but both routed) • Need to maintain links (data relationships) Tuesday, November 15, 11

Slide 204

Slide 204 text

Sharding Key Examples •  Good  :  {server:1} • All data for one server is in a single chunk • Chunk cannot be split any smaller •  Better  :  {server:1,time:1} • Chunk can be split by millisecond {      server  :  "ny153.example.com"  ,      application  :  "apache"  ,      time  :  "2011-­‐01-­‐02T21:21:56.249Z"  ,      level  :  "ERROR"  ,      msg  :  "something  is  broken" } Tuesday, November 15, 11

Slide 205

Slide 205 text

Sharding Key Examples •  Good  :  {time  :  1} • Time is an increasing number • All data will be first written to a single shard • Data balanced to other shards later •  Better  :  {server:1,application:1,time:1} • More key values to enable writes to all shards {      server  :  "ny153.example.com"  ,      application  :  "apache"  ,      time  :  "2011-­‐01-­‐02T21:21:56.249Z"  ,      level  :  "ERROR"  ,      msg  :  "something  is  broken" } Tuesday, November 15, 11

Slide 206

Slide 206 text

Epilogue Sharding Administration and the Balancer Tuesday, November 15, 11

Slide 207

Slide 207 text

Config Servers are Crucial to Sharding • All of the information about the sharding setup is stored in the config servers; it’s important you don’t lose them • You may have 1 or 3 config servers; this is the only valid configuration (Two Phase Commit) • Production deployments should always have 3 • If any config server fails ... • Chunk splitting will stop • Migration / balancing will stop • ... Until all 3 servers are back up • This can lead to unbalanced shard situations • Through mongos the config info is in the “config” db Tuesday, November 15, 11

Slide 208

Slide 208 text

Keeping Your Balance • The Balancer is crucial to good sharding • Basic unit of transfer: “chunk” • Default size of 64 MB proves to be a “sweet spot” • More: Migration takes too long, queries la • Less: Overhead of moving doesn’t pay off • The idea is to keep a balance of data & load on each server. Even is good! • Once a threshold of “imbalance” is reached, the balancer kicks in • Usually about ~8 chunks: Don’t want to balance on one doc diff. Tuesday, November 15, 11

Slide 209

Slide 209 text

Balancer Migrations • The balancer migrates chunks one at a time • Known as balancer “rounds” • Balancing rounds continue until difference between any two shards is only 2 chunks • Common Question – “Why isn’t collection $x being balanced?!” • Commonly, it just doesn’t need to. Not enough chunk diff, and the cost of balancing would outweigh the benefit. • Alternately, the balancer may be running but not progressing Tuesday, November 15, 11

Slide 210

Slide 210 text

Checking Balancer Status • One mongos process will be chosen to coordinate balancing rounds • In config server (“config” database) db.locks.find( { _id: “balancer” } ) • Output: { "_id" : "balancer", "process" : "guaruja:1292810611:1804289383", "state" : 1, "ts" : ObjectId("4d0f872630c42d1978be8a2e"), "when" : "Mon Dec 20 2010 11:41:10 GMT-0500 (EST)", "who" : "guaruja: 1292810611:1804289383:Balancer:846930886", "why" : "doing balance round" } Tuesday, November 15, 11

Slide 211

Slide 211 text

Understanding the Lock Entry • "state" : 1 • State will be “2” for MongoDB v2.0 • Indicates the lock is taken • "when" : "Mon Dec 20 2010 11:41:10 GMT-0500 (EST)" • This is when the balancer began running for this round • "who" : "guaruja:1292810611:1804289383:Balancer:846930886" • The name of the server the balancer round is running on Tuesday, November 15, 11

Slide 212

Slide 212 text

What is the Balancer Doing? • Look at the mongos log for the machine running balancing, for entries starting with [Balancer] • The balancer here is migrating chunk _id:[52 ... 105] from shard001 to shard000 • If you want to stop the balancer ... // connect to mongos > use config > db.settings.update( { _id: "balancer" }, { $set : { stopped: true } } , true ); •Mon Dec 20 11:53:00 [Balancer] chose [shard0001] to [shard0000] { _id: "test.foo- _id_52.0", lastmod: Timestamp 2000|1, ns: "test.foo", min: { _id: 52.0 }, max: { _id: 105.0 }, shard: "shard0001" } Tuesday, November 15, 11

Slide 213

Slide 213 text

Setting the Balancer Window Tuesday, November 15, 11

Slide 214

Slide 214 text

Setting the Balancer Window • The balancer operates continuously by default Tuesday, November 15, 11

Slide 215

Slide 215 text

Setting the Balancer Window • The balancer operates continuously by default • At times, this may not be desirable Tuesday, November 15, 11

Slide 216

Slide 216 text

Setting the Balancer Window • The balancer operates continuously by default • At times, this may not be desirable • High traffic during peak hours where the balancer slows things down Tuesday, November 15, 11

Slide 217

Slide 217 text

Setting the Balancer Window • The balancer operates continuously by default • At times, this may not be desirable • High traffic during peak hours where the balancer slows things down • Sites which don’t write a lot of data during the day, but having the balancer run may be disruptive Tuesday, November 15, 11

Slide 218

Slide 218 text

Setting the Balancer Window • The balancer operates continuously by default • At times, this may not be desirable • High traffic during peak hours where the balancer slows things down • Sites which don’t write a lot of data during the day, but having the balancer run may be disruptive • MongoDB allows you to set a time frame in which the Balancer runs Tuesday, November 15, 11

Slide 219

Slide 219 text

Setting the Balancer Window // connect to mongos > use config > db.settings.update({ _id : "balancer" }, { $set : { activeWindow : { start : "9:00", stop : "21:00" } } }, true ) Tuesday, November 15, 11

Slide 220

Slide 220 text

Setting the Balancer Window • Balance chunks only from 9am to 9pm // connect to mongos > use config > db.settings.update({ _id : "balancer" }, { $set : { activeWindow : { start : "9:00", stop : "21:00" } } }, true ) Tuesday, November 15, 11

Slide 221

Slide 221 text

Setting the Balancer Window • Balance chunks only from 9am to 9pm // connect to mongos > use config > db.settings.update({ _id : "balancer" }, { $set : { activeWindow : { start : "9:00", stop : "21:00" } } }, true ) Tuesday, November 15, 11

Slide 222

Slide 222 text

Setting the Balancer Window • Balance chunks only from 9am to 9pm // connect to mongos > use config > db.settings.update({ _id : "balancer" }, { $set : { activeWindow : { start : "9:00", stop : "21:00" } } }, true ) Tuesday, November 15, 11

Slide 223

Slide 223 text

Setting the Balancer Window • Balance chunks only from 9am to 9pm // connect to mongos > use config > db.settings.update({ _id : "balancer" }, { $set : { activeWindow : { start : "9:00", stop : "21:00" } } }, true ) Tuesday, November 15, 11

Slide 224

Slide 224 text

Setting the Balancer Window • Balance chunks only from 9am to 9pm // connect to mongos > use config > db.settings.update({ _id : "balancer" }, { $set : { activeWindow : { start : "9:00", stop : "21:00" } } }, true ) Tuesday, November 15, 11

Slide 225

Slide 225 text

Setting the Balancer Window • Balance chunks only from 9am to 9pm // connect to mongos > use config > db.settings.update({ _id : "balancer" }, { $set : { activeWindow : { start : "9:00", stop : "21:00" } } }, true ) Tuesday, November 15, 11

Slide 226

Slide 226 text

Setting the Balancer Window • Balance chunks only from 9am to 9pm • Currently only time of day is supported to scheduling // connect to mongos > use config > db.settings.update({ _id : "balancer" }, { $set : { activeWindow : { start : "9:00", stop : "21:00" } } }, true ) Tuesday, November 15, 11

Slide 227

Slide 227 text

@mongodb conferences,  appearances,  and  meetups http://www.10gen.com/events http://bit.ly/mongo:   Facebook                    |                  Twitter                  |                  LinkedIn http://linkd.in/joinmongo download at mongodb.org We’re Hiring ! [email protected] (twitter: @rit) Tuesday, November 15, 11