Slide 1

Slide 1 text

Rapid and Scalable Development with MongoDB, PyMongo, and Ming Rick Copeland @rick446 Thursday, January 10, 13

Slide 2

Slide 2 text

Roadmap • Brief overview of MongoDB • Getting started with PyMongo • Sprinkle in some Ming schemas • Object-Document Mapping: When a dict just won’t do Thursday, January 10, 13

Slide 3

Slide 3 text

MongoDB Terminology • MongoDB databases contain collections • MongoDB collections contain documents Relational MongoDB Database Database Table Collection Index Index Row Document Column Field Thursday, January 10, 13

Slide 4

Slide 4 text

JSON and BSON • JSON: Javascript Object Notation • BSON: Binary JSON • Extra types • ObjectId, datetime, UUID, Binary, etc. • Restrictions on keys • stay away from “.” and “$” Thursday, January 10, 13

Slide 5

Slide 5 text

BSON Example { “_id”: ObjectId(...), “title”: “MongoDB for Developers”, “date”: ISODateTime(“2012-09-17...”), “instructor”: { “first”: “Rick”, “last”: “Copeland” }, “topics”: [ “MongoDB”, “Python” ] } Thursday, January 10, 13

Slide 6

Slide 6 text

BSON Example { “_id”: ObjectId(...), “title”: “MongoDB for Developers”, “date”: ISODateTime(“2012-09-17...”), “instructor”: { “first”: “Rick”, “last”: “Copeland” }, “topics”: [ “MongoDB”, “Python” ] } Document “primary key” Thursday, January 10, 13

Slide 7

Slide 7 text

BSON Example { “_id”: ObjectId(...), “title”: “MongoDB for Developers”, “date”: ISODateTime(“2012-09-17...”), “instructor”: { “first”: “Rick”, “last”: “Copeland” }, “topics”: [ “MongoDB”, “Python” ] } Document “primary key” Datetime stored as 64-bit signed # of ms Thursday, January 10, 13

Slide 8

Slide 8 text

BSON Example { “_id”: ObjectId(...), “title”: “MongoDB for Developers”, “date”: ISODateTime(“2012-09-17...”), “instructor”: { “first”: “Rick”, “last”: “Copeland” }, “topics”: [ “MongoDB”, “Python” ] } Document “primary key” Datetime stored as 64-bit signed # of ms Compound sub- document Thursday, January 10, 13

Slide 9

Slide 9 text

BSON Example { “_id”: ObjectId(...), “title”: “MongoDB for Developers”, “date”: ISODateTime(“2012-09-17...”), “instructor”: { “first”: “Rick”, “last”: “Copeland” }, “topics”: [ “MongoDB”, “Python” ] } Document “primary key” Datetime stored as 64-bit signed # of ms Compound sub- document Arrays for embedding “1:N” relations Thursday, January 10, 13

Slide 10

Slide 10 text

MongoDB Queries • BSON-based query language • Query by example • db.foo.find({‘name’: ‘Rick’}) • Various query operators • db.foo.find({‘rating’: { ‘$gt’: 4 } }) • Query “into” arrays/subdocuments • db.foo.find({‘comments.author’: ‘Rick’}) Thursday, January 10, 13

Slide 11

Slide 11 text

MongoDB Updates • db.update({spec}, {update}) • Default is replacement • db.foo.update({‘_id’: ...}, { k0:v0, k1:v1...}) • Can also do partial update with operators • db.posts.update({‘_id’: ObjectId(...)}, {‘$push’: { ‘comments’: ‘This is cool’ } }) Thursday, January 10, 13

Slide 12

Slide 12 text

MongoDB Indexing • At most one index is used for any given query/update • Most indexes are B-tree based • GeoSpatial indexes and queries • Brand-new experimental full-text search (http://blog.serverdensity.com/full-text- search-in-mongodb/) Thursday, January 10, 13

Slide 13

Slide 13 text

Scaling MongoDB Shard 1 0..10 Shard 2 10..20 Shard 3 20..30 Shard 4 30..40 Primary Secondary Secondary Primary Secondary Secondary Primary Secondary Secondary Primary Secondary Secondary MongoS Configuration Config 1 Config 2 Config 3 MongoS Thursday, January 10, 13

Slide 14

Slide 14 text

Roadmap • Brief overview of MongoDB • Getting started with PyMongo • Sprinkle in some Ming schemas • Object-Document Mapping: When a dict just won’t do Thursday, January 10, 13

Slide 15

Slide 15 text

PyMongo: Connections and Databases >>> import pymongo >>> cli = pymongo.MongoClient() >>> cli MongoClient('localhost', 27017) >>> cli.test Database(MongoClient('localhost', 27017), u'test') >>> cli.test.foo Collection(Database(MongoClient('localhost', 27017), u'test'), u'foo') >>> cli['test-db'] Database(MongoClient('localhost', 27017), u'test-db') >>> cli['test-db']['foo-collection'] Collection(Database(MongoClient('localhost', 27017), u'test-db'), u'foo-collection') >>> cli.test.foo.bar.baz Collection(Database(MongoClient('localhost', 27017), u'test'), u'foo.bar.baz') Thursday, January 10, 13

Slide 16

Slide 16 text

PyMongo: Connections and Databases >>> import pymongo >>> cli = pymongo.MongoClient() >>> cli MongoClient('localhost', 27017) >>> cli.test Database(MongoClient('localhost', 27017), u'test') >>> cli.test.foo Collection(Database(MongoClient('localhost', 27017), u'test'), u'foo') >>> cli['test-db'] Database(MongoClient('localhost', 27017), u'test-db') >>> cli['test-db']['foo-collection'] Collection(Database(MongoClient('localhost', 27017), u'test-db'), u'foo-collection') >>> cli.test.foo.bar.baz Collection(Database(MongoClient('localhost', 27017), u'test'), u'foo.bar.baz') Get a connection Thursday, January 10, 13

Slide 17

Slide 17 text

PyMongo: Connections and Databases >>> import pymongo >>> cli = pymongo.MongoClient() >>> cli MongoClient('localhost', 27017) >>> cli.test Database(MongoClient('localhost', 27017), u'test') >>> cli.test.foo Collection(Database(MongoClient('localhost', 27017), u'test'), u'foo') >>> cli['test-db'] Database(MongoClient('localhost', 27017), u'test-db') >>> cli['test-db']['foo-collection'] Collection(Database(MongoClient('localhost', 27017), u'test-db'), u'foo-collection') >>> cli.test.foo.bar.baz Collection(Database(MongoClient('localhost', 27017), u'test'), u'foo.bar.baz') Get a connection Get a database Thursday, January 10, 13

Slide 18

Slide 18 text

PyMongo: Connections and Databases >>> import pymongo >>> cli = pymongo.MongoClient() >>> cli MongoClient('localhost', 27017) >>> cli.test Database(MongoClient('localhost', 27017), u'test') >>> cli.test.foo Collection(Database(MongoClient('localhost', 27017), u'test'), u'foo') >>> cli['test-db'] Database(MongoClient('localhost', 27017), u'test-db') >>> cli['test-db']['foo-collection'] Collection(Database(MongoClient('localhost', 27017), u'test-db'), u'foo-collection') >>> cli.test.foo.bar.baz Collection(Database(MongoClient('localhost', 27017), u'test'), u'foo.bar.baz') Get a connection Get a database Get a collection Thursday, January 10, 13

Slide 19

Slide 19 text

PyMongo: Connections and Databases >>> import pymongo >>> cli = pymongo.MongoClient() >>> cli MongoClient('localhost', 27017) >>> cli.test Database(MongoClient('localhost', 27017), u'test') >>> cli.test.foo Collection(Database(MongoClient('localhost', 27017), u'test'), u'foo') >>> cli['test-db'] Database(MongoClient('localhost', 27017), u'test-db') >>> cli['test-db']['foo-collection'] Collection(Database(MongoClient('localhost', 27017), u'test-db'), u'foo-collection') >>> cli.test.foo.bar.baz Collection(Database(MongoClient('localhost', 27017), u'test'), u'foo.bar.baz') Get a connection Get a database Get a collection Using invalid Python names Thursday, January 10, 13

Slide 20

Slide 20 text

PyMongo: Connections and Databases >>> import pymongo >>> cli = pymongo.MongoClient() >>> cli MongoClient('localhost', 27017) >>> cli.test Database(MongoClient('localhost', 27017), u'test') >>> cli.test.foo Collection(Database(MongoClient('localhost', 27017), u'test'), u'foo') >>> cli['test-db'] Database(MongoClient('localhost', 27017), u'test-db') >>> cli['test-db']['foo-collection'] Collection(Database(MongoClient('localhost', 27017), u'test-db'), u'foo-collection') >>> cli.test.foo.bar.baz Collection(Database(MongoClient('localhost', 27017), u'test'), u'foo.bar.baz') Get a connection Get a database Get a collection Using invalid Python names Collections with ‘.’ embedded Thursday, January 10, 13

Slide 21

Slide 21 text

PyMongo: Insert/ Update/Delete >>> id = db.foo.insert({'bar':1, 'baz':[ 1, 2, {'k':5} ] }) >>> id ObjectId('...') >>> db.foo.find() >>> list(db.foo.find()) [{u'bar': 1, u'_id': ObjectId('...'), u'baz': [1, 2, {u'k': 5}]}] Thursday, January 10, 13

Slide 22

Slide 22 text

PyMongo: Insert/ Update/Delete >>> id = db.foo.insert({'bar':1, 'baz':[ 1, 2, {'k':5} ] }) >>> id ObjectId('...') >>> db.foo.find() >>> list(db.foo.find()) [{u'bar': 1, u'_id': ObjectId('...'), u'baz': [1, 2, {u'k': 5}]}] Auto-generated _id Thursday, January 10, 13

Slide 23

Slide 23 text

PyMongo: Insert/ Update/Delete >>> id = db.foo.insert({'bar':1, 'baz':[ 1, 2, {'k':5} ] }) >>> id ObjectId('...') >>> db.foo.find() >>> list(db.foo.find()) [{u'bar': 1, u'_id': ObjectId('...'), u'baz': [1, 2, {u'k': 5}]}] Auto-generated _id Cursor == Python Generator Thursday, January 10, 13

Slide 24

Slide 24 text

PyMongo: Insert/ Update/Delete >>> db.foo.update({'_id': id}, {'$set': { 'bar': 2} }) {u'updatedExisting': True, u'connectionId': 24, u'ok': 1.0, u'err': None, u'n': 1} >>> db.foo.find_one() {u'bar': 2, u'_id': ObjectId('...'), u'baz': [1, 2, {u'k': 5}]} >>> db.foo.remove({'_id': id}) {u'connectionId': 24, u'ok': 1.0, u'err': None, u'n': 1} >>> list(db.foo.find()) [] Thursday, January 10, 13

Slide 25

Slide 25 text

PyMongo: Insert/ Update/Delete >>> db.foo.update({'_id': id}, {'$set': { 'bar': 2} }) {u'updatedExisting': True, u'connectionId': 24, u'ok': 1.0, u'err': None, u'n': 1} >>> db.foo.find_one() {u'bar': 2, u'_id': ObjectId('...'), u'baz': [1, 2, {u'k': 5}]} >>> db.foo.remove({'_id': id}) {u'connectionId': 24, u'ok': 1.0, u'err': None, u'n': 1} >>> list(db.foo.find()) [] Partial Update Thursday, January 10, 13

Slide 26

Slide 26 text

PyMongo: Insert/ Update/Delete >>> db.foo.update({'_id': id}, {'$set': { 'bar': 2} }) {u'updatedExisting': True, u'connectionId': 24, u'ok': 1.0, u'err': None, u'n': 1} >>> db.foo.find_one() {u'bar': 2, u'_id': ObjectId('...'), u'baz': [1, 2, {u'k': 5}]} >>> db.foo.remove({'_id': id}) {u'connectionId': 24, u'ok': 1.0, u'err': None, u'n': 1} >>> list(db.foo.find()) [] Partial Update Remove: same query language as find() Thursday, January 10, 13

Slide 27

Slide 27 text

PyMongo: Queries >>> db.foo.insert([dict(x=i) for i in range(4) ]) [ObjectId('...'), ObjectId('...'), ObjectId('...'), ObjectId('...')] >>> list(db.foo.find()) [{u'x': 0, u'_id': ObjectId('...')}, {u'x': 1, u'_id': ObjectId('...')}, {u'x': 2, u'_id': ObjectId('...')}, {u'x': 3, u'_id': ObjectId('...')}] >>> list(db.foo.find({'x': { '$gte': 2}})) [{u'x': 2, u'_id': ObjectId('...')}, {u'x': 3, u'_id': ObjectId('...')}] >>> list(db.foo.find({'x': { '$gte': 2}}, { '_id': 0 } )) [{u'x': 2}, {u'x': 3}] Thursday, January 10, 13

Slide 28

Slide 28 text

PyMongo: Queries >>> db.foo.insert([dict(x=i) for i in range(4) ]) [ObjectId('...'), ObjectId('...'), ObjectId('...'), ObjectId('...')] >>> list(db.foo.find()) [{u'x': 0, u'_id': ObjectId('...')}, {u'x': 1, u'_id': ObjectId('...')}, {u'x': 2, u'_id': ObjectId('...')}, {u'x': 3, u'_id': ObjectId('...')}] >>> list(db.foo.find({'x': { '$gte': 2}})) [{u'x': 2, u'_id': ObjectId('...')}, {u'x': 3, u'_id': ObjectId('...')}] >>> list(db.foo.find({'x': { '$gte': 2}}, { '_id': 0 } )) [{u'x': 2}, {u'x': 3}] Batching inserts Thursday, January 10, 13

Slide 29

Slide 29 text

PyMongo: Queries >>> db.foo.insert([dict(x=i) for i in range(4) ]) [ObjectId('...'), ObjectId('...'), ObjectId('...'), ObjectId('...')] >>> list(db.foo.find()) [{u'x': 0, u'_id': ObjectId('...')}, {u'x': 1, u'_id': ObjectId('...')}, {u'x': 2, u'_id': ObjectId('...')}, {u'x': 3, u'_id': ObjectId('...')}] >>> list(db.foo.find({'x': { '$gte': 2}})) [{u'x': 2, u'_id': ObjectId('...')}, {u'x': 3, u'_id': ObjectId('...')}] >>> list(db.foo.find({'x': { '$gte': 2}}, { '_id': 0 } )) [{u'x': 2}, {u'x': 3}] Batching inserts Find all documents Thursday, January 10, 13

Slide 30

Slide 30 text

PyMongo: Queries >>> db.foo.insert([dict(x=i) for i in range(4) ]) [ObjectId('...'), ObjectId('...'), ObjectId('...'), ObjectId('...')] >>> list(db.foo.find()) [{u'x': 0, u'_id': ObjectId('...')}, {u'x': 1, u'_id': ObjectId('...')}, {u'x': 2, u'_id': ObjectId('...')}, {u'x': 3, u'_id': ObjectId('...')}] >>> list(db.foo.find({'x': { '$gte': 2}})) [{u'x': 2, u'_id': ObjectId('...')}, {u'x': 3, u'_id': ObjectId('...')}] >>> list(db.foo.find({'x': { '$gte': 2}}, { '_id': 0 } )) [{u'x': 2}, {u'x': 3}] Batching inserts Find all documents Restrict by range (>=) Thursday, January 10, 13

Slide 31

Slide 31 text

PyMongo: Queries >>> db.foo.insert([dict(x=i) for i in range(4) ]) [ObjectId('...'), ObjectId('...'), ObjectId('...'), ObjectId('...')] >>> list(db.foo.find()) [{u'x': 0, u'_id': ObjectId('...')}, {u'x': 1, u'_id': ObjectId('...')}, {u'x': 2, u'_id': ObjectId('...')}, {u'x': 3, u'_id': ObjectId('...')}] >>> list(db.foo.find({'x': { '$gte': 2}})) [{u'x': 2, u'_id': ObjectId('...')}, {u'x': 3, u'_id': ObjectId('...')}] >>> list(db.foo.find({'x': { '$gte': 2}}, { '_id': 0 } )) [{u'x': 2}, {u'x': 3}] Batching inserts Find all documents Restrict by range (>=) Retrieve partial results Thursday, January 10, 13

Slide 32

Slide 32 text

PyMongo: Indexes >>> db.foo.find({'x': {'$gte': 2}}).explain() { ..., n: 2, u'cursor': u'BasicCursor', ..., u'nscannedObjects': 4, ..., u'nscanned': 4} >>> db.foo.ensure_index('x') u'x_1' >>> db.foo.find({'x': {'$gte': 2}}).explain() {..., u'n':2, u'cursor': u'BtreeCursor x_1', ..., u'nscannedObjects': 2, ..., u'nscanned': 2} >>> db.foo.find({'x': {'$gte': 2}}, ... { 'x':1, '_id': 0}).explain() {..., u'indexOnly': True, ...} Thursday, January 10, 13

Slide 33

Slide 33 text

PyMongo: Indexes >>> db.foo.find({'x': {'$gte': 2}}).explain() { ..., n: 2, u'cursor': u'BasicCursor', ..., u'nscannedObjects': 4, ..., u'nscanned': 4} >>> db.foo.ensure_index('x') u'x_1' >>> db.foo.find({'x': {'$gte': 2}}).explain() {..., u'n':2, u'cursor': u'BtreeCursor x_1', ..., u'nscannedObjects': 2, ..., u'nscanned': 2} >>> db.foo.find({'x': {'$gte': 2}}, ... { 'x':1, '_id': 0}).explain() {..., u'indexOnly': True, ...} No Index: Scan all the documents Thursday, January 10, 13

Slide 34

Slide 34 text

PyMongo: Indexes >>> db.foo.find({'x': {'$gte': 2}}).explain() { ..., n: 2, u'cursor': u'BasicCursor', ..., u'nscannedObjects': 4, ..., u'nscanned': 4} >>> db.foo.ensure_index('x') u'x_1' >>> db.foo.find({'x': {'$gte': 2}}).explain() {..., u'n':2, u'cursor': u'BtreeCursor x_1', ..., u'nscannedObjects': 2, ..., u'nscanned': 2} >>> db.foo.find({'x': {'$gte': 2}}, ... { 'x':1, '_id': 0}).explain() {..., u'indexOnly': True, ...} No Index: Scan all the documents With index: skip to the returned documents Thursday, January 10, 13

Slide 35

Slide 35 text

PyMongo: Indexes >>> db.foo.find({'x': {'$gte': 2}}).explain() { ..., n: 2, u'cursor': u'BasicCursor', ..., u'nscannedObjects': 4, ..., u'nscanned': 4} >>> db.foo.ensure_index('x') u'x_1' >>> db.foo.find({'x': {'$gte': 2}}).explain() {..., u'n':2, u'cursor': u'BtreeCursor x_1', ..., u'nscannedObjects': 2, ..., u'nscanned': 2} >>> db.foo.find({'x': {'$gte': 2}}, ... { 'x':1, '_id': 0}).explain() {..., u'indexOnly': True, ...} No Index: Scan all the documents With index: skip to the returned documents indexOnly: don’t even load the doc Thursday, January 10, 13

Slide 36

Slide 36 text

And if you really must... >>> list(db.foo.find({'$where': 'this.bar >= 1'})) [{u'bar': 2, u'_id': ObjectId('...'), u'baz': [1, 2, {u'k': 5}]}] • ...but please don’t, if you want performance • Javascript global interpreter lock • BSON/JS translation • Forget about indexes • (for the $where, at least) Thursday, January 10, 13

Slide 37

Slide 37 text

And if you really must... >>> list(db.foo.find({'$where': 'this.bar >= 1'})) [{u'bar': 2, u'_id': ObjectId('...'), u'baz': [1, 2, {u'k': 5}]}] Javascript expr, evaluated in document context • ...but please don’t, if you want performance • Javascript global interpreter lock • BSON/JS translation • Forget about indexes • (for the $where, at least) Thursday, January 10, 13

Slide 38

Slide 38 text

PyMongo: Aggregation >>> db.foo.insert([dict(x=i, y=i%4) for i in range(10)]) [ObjectId('...'), ...] >>> db.foo.count() 10 >>> db.foo.find({'x': {'$lt': 4}}).count() 4 >>> db.foo.find({'x': {'$lt': 4}}).distinct('y') [0, 1, 2, 3] >>> db.foo.group( ... ['y'], {'x': { '$gt': 4}}, ... {'count': 0}, ... 'function(cur, acc) { acc.count += 1; }') [{u'y': 1.0, u'count': 2.0}, {u'y': 2.0, u'count': 1.0}, {u'y': 3.0, u'count': 1.0}, {u'y': 0.0, u'count': 1.0}] Thursday, January 10, 13

Slide 39

Slide 39 text

PyMongo: Aggregation >>> db.foo.insert([dict(x=i, y=i%4) for i in range(10)]) [ObjectId('...'), ...] >>> db.foo.count() 10 >>> db.foo.find({'x': {'$lt': 4}}).count() 4 >>> db.foo.find({'x': {'$lt': 4}}).distinct('y') [0, 1, 2, 3] >>> db.foo.group( ... ['y'], {'x': { '$gt': 4}}, ... {'count': 0}, ... 'function(cur, acc) { acc.count += 1; }') [{u'y': 1.0, u'count': 2.0}, {u'y': 2.0, u'count': 1.0}, {u'y': 3.0, u'count': 1.0}, {u'y': 0.0, u'count': 1.0}] Always fast Thursday, January 10, 13

Slide 40

Slide 40 text

PyMongo: Aggregation >>> db.foo.insert([dict(x=i, y=i%4) for i in range(10)]) [ObjectId('...'), ...] >>> db.foo.count() 10 >>> db.foo.find({'x': {'$lt': 4}}).count() 4 >>> db.foo.find({'x': {'$lt': 4}}).distinct('y') [0, 1, 2, 3] >>> db.foo.group( ... ['y'], {'x': { '$gt': 4}}, ... {'count': 0}, ... 'function(cur, acc) { acc.count += 1; }') [{u'y': 1.0, u'count': 2.0}, {u'y': 2.0, u'count': 1.0}, {u'y': 3.0, u'count': 1.0}, {u'y': 0.0, u'count': 1.0}] Always fast Sometimes slow Thursday, January 10, 13

Slide 41

Slide 41 text

PyMongo: Aggregation >>> db.foo.insert([dict(x=i, y=i%4) for i in range(10)]) [ObjectId('...'), ...] >>> db.foo.count() 10 >>> db.foo.find({'x': {'$lt': 4}}).count() 4 >>> db.foo.find({'x': {'$lt': 4}}).distinct('y') [0, 1, 2, 3] >>> db.foo.group( ... ['y'], {'x': { '$gt': 4}}, ... {'count': 0}, ... 'function(cur, acc) { acc.count += 1; }') [{u'y': 1.0, u'count': 2.0}, {u'y': 2.0, u'count': 1.0}, {u'y': 3.0, u'count': 1.0}, {u'y': 0.0, u'count': 1.0}] Always fast Sometimes slow Limited result size Thursday, January 10, 13

Slide 42

Slide 42 text

PyMongo: Aggregation >>> db.foo.insert([dict(x=i, y=i%4) for i in range(10)]) [ObjectId('...'), ...] >>> db.foo.count() 10 >>> db.foo.find({'x': {'$lt': 4}}).count() 4 >>> db.foo.find({'x': {'$lt': 4}}).distinct('y') [0, 1, 2, 3] >>> db.foo.group( ... ['y'], {'x': { '$gt': 4}}, ... {'count': 0}, ... 'function(cur, acc) { acc.count += 1; }') [{u'y': 1.0, u'count': 2.0}, {u'y': 2.0, u'count': 1.0}, {u'y': 3.0, u'count': 1.0}, {u'y': 0.0, u'count': 1.0}] Always fast Sometimes slow Limited result size 20k results Uses JS Can’t shard Thursday, January 10, 13

Slide 43

Slide 43 text

PyMongo: MapReduce { doc } map() (key,{doc}) pairs group by key (key,[{docs}]) pairs reduce(key, values) Write back / return finalize(key, value) { value } Thursday, January 10, 13

Slide 44

Slide 44 text

PyMongo: MapReduce >>> db.cities.find_one( ... {'country_code': 'US'}, ... {'_id': 0, 'name': 1, 'country_code': 1, 'admin1_code': 1, 'population': 1}) {u'admin1_code': u'VA', u'name': u'Fort Hunt', u'country_code': u'US', u'population': 16045L} >>> mapf = '''function() { ... emit(this.admin1_code, ... { count: 1, pop: this.population } ); ... }''' >>> Thursday, January 10, 13

Slide 45

Slide 45 text

PyMongo: MapReduce >>> reducef = '''function(key, docs) { ... var result = { count: 0, pop: 0 }; ... docs.forEach(function(doc) { ... result.count += doc.count; ... result.pop += doc.pop; ... }); ... return result; ... }''' >>> >>> finalizef = '''function(key, doc) { ... return { ... count: doc.count, ... pop: doc.pop, ... mean_pop: doc.pop / doc.count}; ... }''' Thursday, January 10, 13

Slide 46

Slide 46 text

PyMongo: MapReduce >>> db.cities.map_reduce( ... map=mapf, ... reduce=reducef, ... out='state_pop', ... query={'country_code': 'US'}, ... finalize=finalizef) Collection(Database(MongoClient('localhost', 27017), u'tutorial'), u'state_pop') >>> db.state_pop.find_one() {u'_id': u'AK', u'value': {u'count': 4.0, u'mean_pop': 93529.5, u'pop': 374118.0}} Thursday, January 10, 13

Slide 47

Slide 47 text

PyMongo: MapReduce • Still uses JS, but can parallelize across shards • Can write back results to a collection (suitable for large batch processes) Thursday, January 10, 13

Slide 48

Slide 48 text

PyMongo: Aggregation Framework • Pipeline of operators • $match • $project • $skip, $limit, $sort • $unwind • $group Thursday, January 10, 13

Slide 49

Slide 49 text

PyMongo: Aggregation Framework Pipeline All Docs in Collection Matched Docs Reshaped Docs Unwound Docs Grouped Docs Sorted Docs $match $project $group $sort $unwind Thursday, January 10, 13

Slide 50

Slide 50 text

PyMongo: Aggregation Framework Example >>> db.cities.aggregate( [ ... { '$match': { 'name': 'Atlanta' } }, ... { '$project': { ... 'name': 1, ... 'country_code': 1, ... 'position': { 'lon': '$longitude', ... 'lat': '$latitude' } ... } } ... ] ... ) {u'ok': 1.0, u'result': [{u'position': {u'lat': 33.749, u'lon': -84.38798}, u'_id': 4180439, u'name': u'Atlanta', u'country_code': u'US'}]} Thursday, January 10, 13

Slide 51

Slide 51 text

PyMongo: Aggregation Framework • No Javascript GIL • Sharding supported • Limited results to a single document • “Super-find” Thursday, January 10, 13

Slide 52

Slide 52 text

PyMongo: GridFS >>> import gridfs >>> fs = gridfs.GridFS(db) >>> with fs.new_file() as fp: ... fp.write('The file') ... >>> fp >>> fp._id ObjectId('...') >>> fs.get(fp._id).read() 'The file' • File-like abstraction for data >16MB • Files open for read or write, not both Thursday, January 10, 13

Slide 53

Slide 53 text

Roadmap • Brief overview of MongoDB • Getting started with PyMongo • Sprinkle in some Ming schemas • Object-Document Mapping: When a dict just won’t do Thursday, January 10, 13

Slide 54

Slide 54 text

Why Ming? • Your data has a schema (even if the DB doesn’t enforce it) • Sometimes you need migrations • “Unit of work” - sometimes it’s nice to queue up your updates Thursday, January 10, 13

Slide 55

Slide 55 text

Ming: Session, Model, Datastore Model (schema) Datastore (database) Session Thursday, January 10, 13

Slide 56

Slide 56 text

Ming: Datastore and Session >>> import ming >>> ds = ming.create_datastore('test') >>> ds.db Database(MongoClient('localhost', 27017), u'test') >>> sess = ming.Session(ds) >>> sess.db Database(MongoClient('localhost', 27017), u'test') >>> import ming.config >>> ming.config.configure_from_nested_dict( ... { 'main': { 'uri': 'mongodb://localhost:27017/test' } }) >>> sess = ming.Session.by_name('main') >>> sess.db Database(MongoClient(u'localhost', 27017), u'test') Thursday, January 10, 13

Slide 57

Slide 57 text

Ming: Define your Schema WikiDoc = collection('wiki_page', session, Field('_id', schema.ObjectId()), Field('title', str, index=True), Field('text', str)) CommentDoc = collection('comment', session, Field('_id', schema.ObjectId()), Field('page_id', schema.ObjectId(), index=True), Field('text', str)) Thursday, January 10, 13

Slide 58

Slide 58 text

Ming: Define your Schema WikiDoc = collection('wiki_page', session, Field('_id', schema.ObjectId()), Field('title', str, index=True), Field('text', str)) CommentDoc = collection('comment', session, Field('_id', schema.ObjectId()), Field('page_id', schema.ObjectId(), index=True), Field('text', str)) Index on session configuration Thursday, January 10, 13

Slide 59

Slide 59 text

Ming: Define your Schema WikiDoc = collection('wiki_page', session, Field('_id', schema.ObjectId()), Field('title', str, index=True), Field('text', str)) CommentDoc = collection('comment', session, Field('_id', schema.ObjectId()), Field('page_id', schema.ObjectId(), index=True), Field('text', str)) Index on session configuration Shorthand for schema.String() Thursday, January 10, 13

Slide 60

Slide 60 text

Ming Schema for the Classically Inclined class WikiDoc(Document): class __mongometa__: session=Session.by_name('main') name='wiki_page' indexes=[ ('title') ] title = Field(str) text = Field(str) Thursday, January 10, 13

Slide 61

Slide 61 text

Using Ming Models >>> from wiki import WikiDoc >>> doc = WikiDoc(dict(title='Cats', text='I can haz cheezburger?')) >>> doc.m.save() >>> WikiDoc.m.find() >>> WikiDoc.m.find().all() [{'text': u'I can haz cheezburger?', '_id': ObjectId('50eddf6bfb72f03b78a3823c'), 'title': u'Cats'}] >>> WikiDoc.m.find().one().text u'I can haz cheezburger?' >>> doc = WikiDoc(dict(tietul='LOL', text='Invisible bicycle')) >>> doc.m.save() Traceback (most recent call last): ... ming.schema.Invalid: Extra keys: set(['tietul']) Thursday, January 10, 13

Slide 62

Slide 62 text

Using Ming Models >>> from wiki import WikiDoc >>> doc = WikiDoc(dict(title='Cats', text='I can haz cheezburger?')) >>> doc.m.save() >>> WikiDoc.m.find() >>> WikiDoc.m.find().all() [{'text': u'I can haz cheezburger?', '_id': ObjectId('50eddf6bfb72f03b78a3823c'), 'title': u'Cats'}] >>> WikiDoc.m.find().one().text u'I can haz cheezburger?' >>> doc = WikiDoc(dict(tietul='LOL', text='Invisible bicycle')) >>> doc.m.save() Traceback (most recent call last): ... ming.schema.Invalid: Extra keys: set(['tietul']) A Document is a dict subclass Thursday, January 10, 13

Slide 63

Slide 63 text

Using Ming Models >>> from wiki import WikiDoc >>> doc = WikiDoc(dict(title='Cats', text='I can haz cheezburger?')) >>> doc.m.save() >>> WikiDoc.m.find() >>> WikiDoc.m.find().all() [{'text': u'I can haz cheezburger?', '_id': ObjectId('50eddf6bfb72f03b78a3823c'), 'title': u'Cats'}] >>> WikiDoc.m.find().one().text u'I can haz cheezburger?' >>> doc = WikiDoc(dict(tietul='LOL', text='Invisible bicycle')) >>> doc.m.save() Traceback (most recent call last): ... ming.schema.Invalid: Extra keys: set(['tietul']) A Document is a dict subclass Validate data Thursday, January 10, 13

Slide 64

Slide 64 text

Ming Bonus: MIM • In-memory partial pymongo implementation • Useful for unit tests • Does not scale well (SmallData) >>> ming.create_datastore('mim:///test').db mim.Database(test) Thursday, January 10, 13

Slide 65

Slide 65 text

Roadmap • Brief overview of MongoDB • Getting started with PyMongo • Sprinkle in some Ming schemas • Object-Document Mapping: When a dict just won’t do Thursday, January 10, 13

Slide 66

Slide 66 text

Ming ODM: Classes and Collections odmsession = ODMSession(session) class WikiPage(object): pass class Comment(object): pass odmsession.mapper(WikiPage, WikiDoc, properties=dict( comments=RelationProperty('Comment'))) odmsession.mapper(Comment, CommentDoc, properties=dict( page_id=ForeignIdProperty('WikiPage'), page=RelationProperty('WikiPage'))) Thursday, January 10, 13

Slide 67

Slide 67 text

Ming ODM: Classes and Collections odmsession = ODMSession(session) class WikiPage(object): pass class Comment(object): pass odmsession.mapper(WikiPage, WikiDoc, properties=dict( comments=RelationProperty('Comment'))) odmsession.mapper(Comment, CommentDoc, properties=dict( page_id=ForeignIdProperty('WikiPage'), page=RelationProperty('WikiPage'))) Plain Old Python Classes Thursday, January 10, 13

Slide 68

Slide 68 text

Ming ODM: Classes and Collections odmsession = ODMSession(session) class WikiPage(object): pass class Comment(object): pass odmsession.mapper(WikiPage, WikiDoc, properties=dict( comments=RelationProperty('Comment'))) odmsession.mapper(Comment, CommentDoc, properties=dict( page_id=ForeignIdProperty('WikiPage'), page=RelationProperty('WikiPage'))) Plain Old Python Classes Map class to collection + session Thursday, January 10, 13

Slide 69

Slide 69 text

Ming ODM: Classes and Collections odmsession = ODMSession(session) class WikiPage(object): pass class Comment(object): pass odmsession.mapper(WikiPage, WikiDoc, properties=dict( comments=RelationProperty('Comment'))) odmsession.mapper(Comment, CommentDoc, properties=dict( page_id=ForeignIdProperty('WikiPage'), page=RelationProperty('WikiPage'))) Plain Old Python Classes Map class to collection + session “Relations” Thursday, January 10, 13

Slide 70

Slide 70 text

And again, if you like classes... class WikiPage(MappedClass): class __mongometa__: session = main_odm_session name='wiki_page' indexes = [ 'title' ] _id = FieldProperty(S.ObjectId) title = FieldProperty(str) text = FieldProperty(str) comments = RelationProperty('Comment') Thursday, January 10, 13

Slide 71

Slide 71 text

Ming ODM: Sessions and Queries • Session ==> ODMSession • collection.m.... ==> MappedClass.query... • Session actually does stuff • Track object identity • Track object modifications • Unit of work to save everything at once Thursday, January 10, 13

Slide 72

Slide 72 text

Ming ODM: Sessions and Queries >>> pg = WikiPage(title='MyPage', text='is here') >>> session.db.wiki_page.count() 0 >>> odmsession WikiPage : ... => >>> odmsession.flush() >>> session.db.wiki_page.count() 1 Thursday, January 10, 13

Slide 73

Slide 73 text

Integration with Python Web Frameworks • ThreadLocalODMSesssion • ming.odm.middleware.MingMiddleware • flush all sessions on success • clear all sessions on exception • when you don’t have real transactions, fake ‘em Thursday, January 10, 13

Slide 74

Slide 74 text

Wrapping Up • MongoDB: Scalable document store • http://mongodb.org • PyMongo: Python API mapping dicts to docs • http://api.mongodb.org/python/current/ • Ming: Schema validation and ODM • http://sf.net/p/merciless Thursday, January 10, 13

Slide 75

Slide 75 text

Questions? MongoDB Applied Design Patterns Coming out Real Soon Now MongoDB with Python and Ming ebook http://arborian.com/book Need MongoDB or Python help? Rick Copeland @rick446 http://arborian.com Thursday, January 10, 13