Slide 1

Slide 1 text

aCrash Course inMongoDB PyCon US 2013

Slide 2

Slide 2 text

Andy Dirnberger hi. I’m Engineering @ CBS Local @dirnonline github.com/dirn [email protected]

Slide 3

Slide 3 text

So what is MongoDB http://mongodb.org ?

Slide 4

Slide 4 text

MongoDB is... ‣ Document-oriented ‣ JSON-like (BSON) ‣ Dynamic schema* ‣ Scalable ‣ Open Source (GNU AGPL v3.0)** *not the same thing as schemaless **drivers use the Apache license

Slide 5

Slide 5 text

MongoDB can be used for... ‣ Metrics ‣ Logging* ‣ Messaging Queues ‣ Blog ‣ Content Management ‣ Anything you want *Capped collections behave as fixed-sized FIFO queues *TTL collections have a special index that will automatically remove old data

Slide 6

Slide 6 text

To run MongoDB... Download it: http://mongodb.org/downloads or install it: $ sudo apt-get install mongodb $ brew install mongodb Run it: $ mongod $ mongod --dbpath /var/lib/mongodb/ $ mongod --fork http://docs.mongodb.org/manual/tutorial/manage-mongodb-processes/

Slide 7

Slide 7 text

Python MongoDB using with https://github.com/mongodb/mongo-python-driver PyMongo

Slide 8

Slide 8 text

The driver... Install it: $ pip install pymongo http://api.mongodb.org/python/current/ Packages: pymongo bson gridfs

Slide 9

Slide 9 text

BSON supports... ‣ int ‣ float ‣ basestring ‣ list ‣ dict ‣ datetime.datetime http://bsonspec.org/

Slide 10

Slide 10 text

Object IDs are made of... ‣ 4-byte timestamp (50d4dce7) ‣ 3-byte machine identifier (0ea5fa) ‣ 2-byte process ID (e6fb) ‣ 3-byte counter (84e44b) 50d4dce70ea5fae6fb84e44b

Slide 11

Slide 11 text

Connect with MongoClient >>> from pymongo import MongoClient >>> >>> MongoClient(host='localhost', port=27017) MongoClient('localhost', 27017) >>> >>> MongoClient(host='mongodb://localhost:27017') MongoClient('localhost', 27017) >>> >>> MongoClient('mongodb://localhost:27017').pycon Database(MongoClient('localhost', 27017), u'pycon')

Slide 12

Slide 12 text

Querying

Slide 13

Slide 13 text

Documents can be retrieved with... >>> coll = db.talks >>> coll.find_one({ 'name': 'A Crash Course in MongoDB'}) { u'track': 2, u'_id': ObjectId('5145e5380ea5fa321fa97064'), u'speaker': u'Andy Dirnberger', u'name': u'A Crash Course in MongoDB', u'language': u'python', u'time': datetime.datetime(2013, 3, 17, 14, 30) }

Slide 14

Slide 14 text

Documents can be retrieved with... >>> coll.find({ 'track': 2, 'time': {'$gte': datetime(2013, 3, 17), '$lt': datetime(2013, 3, 18)}}, {'name': 1}) http://docs.mongodb.org/manual/reference/operators/#query-selectors

Slide 15

Slide 15 text

What’s in the cursor? >>> for doc in cursor: ... print doc ... {u'_id': ObjectId('5145e4f00ea5fa321fa97062'), u'name': u'Elasticsearch (Part 2)'} {u'_id': ObjectId('5145e5200ea5fa321fa97063'), u'name': u'Going beyond the Django ORM'} {u'_id': ObjectId('5145e5380ea5fa321fa97064'), u'name': u'A Crash Course in MongoDB'} http://api.mongodb.org/python/current/api/pymongo/cursor.html

Slide 16

Slide 16 text

Updating

Slide 17

Slide 17 text

Documents can be removed with... >>> coll.remove({'language': 'ruby'}) { u'connectionId': 8, u'ok': 1.0, u'err': None, u'n': 0 }

Slide 18

Slide 18 text

Documents can be removed with... >>> coll.remove({ 'language': {'$in': ['php', 'node.js']}}) { u'connectionId': 8, u'ok': 1.0, u'err': None, u'n': 0 }

Slide 19

Slide 19 text

Documents can be removed with... >>> coll.remove({'language': {'$ne': 'python'}}) { u'connectionId': 8, u'ok': 1.0, u'err': None, u'n': 0 }

Slide 20

Slide 20 text

Documents can be inserted with... >>> db.tracks.insert({ 'number': 2, 'room': 'Grand Ballroom CD'}) ObjectId('5145eb4e0ea5fa321fa97065')

Slide 21

Slide 21 text

Documents can be inserted with... >>> db.sessions.update( {'track': 2}, {'track': 2, 'date': datetime(2013, 3, 17), 'order': 1, 'chair': 'Megan Speir', 'runner': 'Erik Bray'}, upsert=True) { ... u'upserted': ObjectId('5145ecfd3f69a773554253e8'), u'n': 1, u'updatedExisting': False }

Slide 22

Slide 22 text

A couple of other methods... save() find_and_modify() Works like update(..., upsert=True) if _id is specified, insert() if it’s not Modifies the document in the database, returns the original by default, the updated with new=True

Slide 23

Slide 23 text

A note about update() >>> db.sessions.update( {'_id': ObjectId('5145ecfd3f69a773554253e8')}, {'num_talks': 3}) {...} >>> >>> # The document has been replaced >>> db.sessions.find_one({ '_id': ObjectId('5145ecfd3f69a773554253e8')}) { u'_id': ObjectId('5145ecfd3f69a773554253e8'), u'num_talks': 3 }

Slide 24

Slide 24 text

Using update operators to target specific fields... >>> db.sessions.update( {'_id': ObjectId('5145ecfd3f69a773554253e8')}, {'$set': {'num_talks': 3}}) { u'updatedExisting': True, u'connectionId': 8, u'ok': 1.0, u'err': None, u'n': 1 } http://docs.mongodb.org/manual/reference/operators/#update

Slide 25

Slide 25 text

Write concern... w wtimeout The number of servers that must acknowledge the write, including the primary The timeout for the write, without it the write could block forever http://docs.mongodb.org/manual/core/write-operations/#write-concern

Slide 26

Slide 26 text

Write concern... is turned on by default in MongoClient

Slide 27

Slide 27 text

Indexes

Slide 28

Slide 28 text

You can create an index with... create_index() ensure_index() Unconditionally creates an index on one or more fields Works like create_index() except the driver will “remember” that the index was already made

Slide 29

Slide 29 text

Indexes... Are directional >>> db.sessions.ensure_index([ ('date', pymongo.ASCENDING), ('order', pymongo.DESCENDING)]) u'date_1_order_-1' Can be sparse Only documents containing all fields in the index will be included in the index

Slide 30

Slide 30 text

Explain plans... { 'cursor' : '', 'n' : , 'nscanned': , 'scanAndOrder': , } http://docs.mongodb.org/manual/reference/explain/ You want n and nscanned to be as close together as possible If scanAndOrder is True, the index can’t be used for sorting

Slide 31

Slide 31 text

GridFS

Slide 32

Slide 32 text

Storing files with GridFS... ‣ Files are stored in chunks ‣ 4MB of RAM ‣ Replication and Sharing http://docs.mongodb.org/manual/applications/gridfs/

Slide 33

Slide 33 text

To use GridFS... >>> import gridfs >>> fs = gridfs.GridFS(db) >>> file_id = fs.put('PyCon 2013', city='Santa Clara', state='CA') >>> file = fs.get(file_id) >>> file.read() 'PyCon 2013' >>> file.upload_date datetime.datetime(2013, 3, 17, 21, 30, 0, 0) >>> file.city, file.state (u'Santa Clara', u'CA')

Slide 34

Slide 34 text

GridFS is versioned... get_last_version() get_version() Gets the most recent file matching the query Works like get_last_version() except it can request specific versions of a file

Slide 35

Slide 35 text

Geospatial

Slide 36

Slide 36 text

Create an index... >>> db.tracks.update( {'_id': ObjectId('5145eb4e0ea5fa321fa97065')}, {'loc': [37.3542, 121.9542]}) {...} >>> db.tracks.ensure_index([ ('loc', pymongo.GEO2D)]) u'loc_2d' http://docs.mongodb.org/manual/applications/geospatial-indexes/

Slide 37

Slide 37 text

Query, query, query... >>> db.tracks.find({'loc': [37.3542, 121.9542]}) >>> db.tracks.find({ 'loc': {'$near': [37.3542, 121.9542]}})

Slide 38

Slide 38 text

You can query $within shapes... ‣ {'$center': [center, radius]} ‣ {'$box': [[x1, y1], [x2, y2]]} ‣ {'$polygon': [[x1, y1], [x2, y2], [x3, y3]]}

Slide 39

Slide 39 text

Anything else... Aggregation Framework Libraries Helps with simple map reduce queries, but is subject to the same 16MB as documents http://api.mongodb.org/python/current/tools.html

Slide 40

Slide 40 text

Thank you! dirn.it/PyCon2013 Questions?