Upgrade to Pro — share decks privately, control downloads, hide ads and more …

A Crash Course in MongoDB

PyCon 2013
March 17, 2013
3k

A Crash Course in MongoDB

PyCon 2013

March 17, 2013
Tweet

Transcript

  1. MongoDB is... ‣ Document-oriented ‣ JSON-like (BSON) ‣ Dynamic schema*

    ‣ Scalable ‣ Open Source (GNU AGPL v3.0)** *not the same thing as schemaless **drivers use the Apache license
  2. MongoDB can be used for... ‣ Metrics ‣ Logging* ‣

    Messaging Queues ‣ Blog ‣ Content Management ‣ Anything you want *Capped collections behave as fixed-sized FIFO queues *TTL collections have a special index that will automatically remove old data
  3. To run MongoDB... Download it: http://mongodb.org/downloads or install it: $

    sudo apt-get install mongodb $ brew install mongodb Run it: $ mongod $ mongod --dbpath /var/lib/mongodb/ $ mongod --fork http://docs.mongodb.org/manual/tutorial/manage-mongodb-processes/
  4. BSON supports... ‣ int ‣ float ‣ basestring ‣ list

    ‣ dict ‣ datetime.datetime http://bsonspec.org/
  5. Object IDs are made of... ‣ 4-byte timestamp (50d4dce7) ‣

    3-byte machine identifier (0ea5fa) ‣ 2-byte process ID (e6fb) ‣ 3-byte counter (84e44b) 50d4dce70ea5fae6fb84e44b
  6. Connect with MongoClient >>> from pymongo import MongoClient >>> >>>

    MongoClient(host='localhost', port=27017) MongoClient('localhost', 27017) >>> >>> MongoClient(host='mongodb://localhost:27017') MongoClient('localhost', 27017) >>> >>> MongoClient('mongodb://localhost:27017').pycon Database(MongoClient('localhost', 27017), u'pycon')
  7. Documents can be retrieved with... >>> coll = db.talks >>>

    coll.find_one({ 'name': 'A Crash Course in MongoDB'}) { u'track': 2, u'_id': ObjectId('5145e5380ea5fa321fa97064'), u'speaker': u'Andy Dirnberger', u'name': u'A Crash Course in MongoDB', u'language': u'python', u'time': datetime.datetime(2013, 3, 17, 14, 30) }
  8. Documents can be retrieved with... >>> coll.find({ 'track': 2, 'time':

    {'$gte': datetime(2013, 3, 17), '$lt': datetime(2013, 3, 18)}}, {'name': 1}) <pymongo.cursor.Cursor object at 0x10da4ed90> http://docs.mongodb.org/manual/reference/operators/#query-selectors
  9. What’s in the cursor? >>> for doc in cursor: ...

    print doc ... {u'_id': ObjectId('5145e4f00ea5fa321fa97062'), u'name': u'Elasticsearch (Part 2)'} {u'_id': ObjectId('5145e5200ea5fa321fa97063'), u'name': u'Going beyond the Django ORM'} {u'_id': ObjectId('5145e5380ea5fa321fa97064'), u'name': u'A Crash Course in MongoDB'} http://api.mongodb.org/python/current/api/pymongo/cursor.html
  10. Documents can be removed with... >>> coll.remove({ 'language': {'$in': ['php',

    'node.js']}}) { u'connectionId': 8, u'ok': 1.0, u'err': None, u'n': 0 }
  11. Documents can be removed with... >>> coll.remove({'language': {'$ne': 'python'}}) {

    u'connectionId': 8, u'ok': 1.0, u'err': None, u'n': 0 }
  12. Documents can be inserted with... >>> db.tracks.insert({ 'number': 2, 'room':

    'Grand Ballroom CD'}) ObjectId('5145eb4e0ea5fa321fa97065')
  13. Documents can be inserted with... >>> db.sessions.update( {'track': 2}, {'track':

    2, 'date': datetime(2013, 3, 17), 'order': 1, 'chair': 'Megan Speir', 'runner': 'Erik Bray'}, upsert=True) { ... u'upserted': ObjectId('5145ecfd3f69a773554253e8'), u'n': 1, u'updatedExisting': False }
  14. A couple of other methods... save() find_and_modify() Works like update(...,

    upsert=True) if _id is specified, insert() if it’s not Modifies the document in the database, returns the original by default, the updated with new=True
  15. A note about update() >>> db.sessions.update( {'_id': ObjectId('5145ecfd3f69a773554253e8')}, {'num_talks': 3})

    {...} >>> >>> # The document has been replaced >>> db.sessions.find_one({ '_id': ObjectId('5145ecfd3f69a773554253e8')}) { u'_id': ObjectId('5145ecfd3f69a773554253e8'), u'num_talks': 3 }
  16. Using update operators to target specific fields... >>> db.sessions.update( {'_id':

    ObjectId('5145ecfd3f69a773554253e8')}, {'$set': {'num_talks': 3}}) { u'updatedExisting': True, u'connectionId': 8, u'ok': 1.0, u'err': None, u'n': 1 } http://docs.mongodb.org/manual/reference/operators/#update
  17. Write concern... w wtimeout The number of servers that must

    acknowledge the write, including the primary The timeout for the write, without it the write could block forever http://docs.mongodb.org/manual/core/write-operations/#write-concern
  18. You can create an index with... create_index() ensure_index() Unconditionally creates

    an index on one or more fields Works like create_index() except the driver will “remember” that the index was already made
  19. Indexes... Are directional >>> db.sessions.ensure_index([ ('date', pymongo.ASCENDING), ('order', pymongo.DESCENDING)]) u'date_1_order_-1'

    Can be sparse Only documents containing all fields in the index will be included in the index
  20. Explain plans... { 'cursor' : '<Cursor Type and Index>', 'n'

    : <num (documents matching query)>, 'nscanned': <num (documents scanned)>, 'scanAndOrder': <boolean>, } http://docs.mongodb.org/manual/reference/explain/ You want n and nscanned to be as close together as possible If scanAndOrder is True, the index can’t be used for sorting
  21. Storing files with GridFS... ‣ Files are stored in chunks

    ‣ 4MB of RAM ‣ Replication and Sharing http://docs.mongodb.org/manual/applications/gridfs/
  22. To use GridFS... >>> import gridfs >>> fs = gridfs.GridFS(db)

    >>> file_id = fs.put('PyCon 2013', city='Santa Clara', state='CA') >>> file = fs.get(file_id) >>> file.read() 'PyCon 2013' >>> file.upload_date datetime.datetime(2013, 3, 17, 21, 30, 0, 0) >>> file.city, file.state (u'Santa Clara', u'CA')
  23. GridFS is versioned... get_last_version() get_version() Gets the most recent file

    matching the query Works like get_last_version() except it can request specific versions of a file
  24. Create an index... >>> db.tracks.update( {'_id': ObjectId('5145eb4e0ea5fa321fa97065')}, {'loc': [37.3542, 121.9542]})

    {...} >>> db.tracks.ensure_index([ ('loc', pymongo.GEO2D)]) u'loc_2d' http://docs.mongodb.org/manual/applications/geospatial-indexes/
  25. Query, query, query... >>> db.tracks.find({'loc': [37.3542, 121.9542]}) <pymongo.cursor.Cursor object at

    0x10e14eb90> >>> db.tracks.find({ 'loc': {'$near': [37.3542, 121.9542]}}) <pymongo.cursor.Cursor object at 0x10e14edd0>
  26. You can query $within shapes... ‣ {'$center': [center, radius]} ‣

    {'$box': [[x1, y1], [x2, y2]]} ‣ {'$polygon': [[x1, y1], [x2, y2], [x3, y3]]}
  27. Anything else... Aggregation Framework Libraries Helps with simple map reduce

    queries, but is subject to the same 16MB as documents http://api.mongodb.org/python/current/tools.html