Upgrade to Pro — share decks privately, control downloads, hide ads and more …

MongoEngine: Lessons learnt building an ODM

rozza
October 14, 2012

MongoEngine: Lessons learnt building an ODM

My PyCon Ireland 2012 talk about lessons learnt maintaining MongoEngine and working with the community.

rozza

October 14, 2012
Tweet

More Decks by rozza

Other Decks in Technology

Transcript

  1. WHAT IS MONGODB? A document database Highly scalable Developer friendly

    http://mongodb.org In BSON { _id : ObjectId("..."), author : "Ross", date : ISODate("2012-07-05..."), text : "About MongoDB...", tags : [ "tech", "databases" ], comments : [{ author : "Tim", date : ISODate("2012-07-05..."), text : "Best Post Ever!" }], comment_count : 1 }
  2. WHAT IS MONGODB? In Python In BSON { "_id" :

    ObjectId("..."), "author" : "Ross", "date" : datetime(2012,7,5,10,0), "text" : "About MongoDB...", "tags" : ["tech", "databases"], "comments" : [{ "author" : "Tim", "date" : datetime(2012,7,5,11,35), "text" : "Best Post Ever!" }], "comment_count" : 1 } { _id : ObjectId("..."), author : "Ross", date : ISODate("2012-07-05..."), text : "About MongoDB...", tags : [ "tech", "databases" ], comments : [{ author : "Tim", date : ISODate("2012-07-05..."), text : "Best Post Ever!" }], comment_count : 1 }
  3. MongoDB a good fit Documents schema in code Enforces schema

    Data validation Speeds up development Build tooling off it Can DRY up code... SCHEMA LESS != CHAOS
  4. Inspired by Django's ORM Supports Python 2.5 - Python 3.3

    Originally authored by Harry Marr 2010 I took over development in May 2011 Current release 0.7.5 http://github.com/MongoEngine/mongoengine
  5. INTRODUCING MONGOENGINE class Post(Document): title = StringField(max_length=120, required=True) author =

    ReferenceField('User') tags = ListField(StringField(max_length=30)) comments = ListField(EmbeddedDocumentField('Comment')) class Comment(EmbeddedDocument): content = StringField() name = StringField(max_length=120) class User(Document): email = StringField(required=True) first_name = StringField(max_length=50) last_name = StringField(max_length=50)
  6. CREATING A MODEL class Post(Document): title = StringField(max_length=120, required=True) author

    = ReferenceField('User') tags = ListField(StringField(max_length=30)) comments = ListField(EmbeddedDocumentField('Comment')) Define a class inheriting from Document Map a field to a defined data type strings, ints, binary, files, lists etc.. By default all declared fields aren't required Pass keyword arguments to apply constraints eg set if unique, max_length, default values.
  7. INSERTING DATA # Pass data into the constructor user =

    User(email="[email protected]", name="Ross").save() # Create instance and edit / update in place post = Post() post.title = "mongoengine" post.author = user post.tags = ['odm', 'mongodb', 'python'] post.save() Create instance of the object Update its attributes Call save, insert, update to persist the data
  8. QUERYING DATA # An `objects` manager is added to every

    `Document` class users = User.objects(email='[email protected]') # Pass kwargs to commands are lazy and be extended as needed users.filter(auth=True) # Iterating evaluates the queryset print [u for u in users] Documents have a queryset manager (objects) for querying You can continually extend it Queryset evaluated on iteration
  9. In May 2011 >200 forks >100 issues ~50 pull requests

    I needed it Volunteered to help Started reviewing issues Supported Harry and community PROJECT STALLED
  10. WHATS NEEDED TO MAKE AN ORM? Instance methods validation data

    manipulate data convert data to and from mongodb Queryset methods Finding data Bulk changes
  11. METACLASSES class Document(object): __metaclass__ = TopLevelDocumentMetaclass ... class EmbeddedDocument(object): __metaclass__

    = DocumentMetaclass ... Needed for: 1. inspect the object inheritance 2. inject functionality to the class Its surprisingly simple - all we need is: __new__
  12. METACLASSES TopLevelDocument Document python's type Creates default meta data inheritance

    rules, id_field, index information, default ordering. Merges in parents meta Validation abstract flag on an inherited class collection set on a subclass Manipulates the attrs going in. IN
  13. METACLASSES TopLevelDocument Document python's type Merges all fields from parents

    Adds in own field definitions Creates lookups _db_field_map _reverse_db_field_map Determine superclasses (for model inheritance) IN
  14. METACLASSES TopLevelDocument Document python's type Adds in handling for delete

    rules So we can handle deleted References Adds class to the registry So we can load the data into the correct class OUT
  15. LESSONS LEARNT Spend time learning what is being done and

    why Don't continually patch: Rewrote the metaclasses in 0.7
  16. LE S SO N 3: S TR AYI NG F

    ROM THE PATH http://www.flickr.com/photos/51838104@N02/5841690990
  17. REWRITING THE QUERY LANGUAGE # In pymongo you pass dictionaries

    to query uk_pages = db.page.find({"published": True}) # In mongoengine uk_pages = Page.objects(published=True) # pymongo dot syntax to query sub documents uk_pages = db.page.find({"author.country": "uk"}) # In mongoengine we use dunder instead uk_pages = Page.objects(author__country='uk')
  18. REWRITING THE QUERY LANGUAGE #Somethings are nicer - regular expresion

    search db.post.find({'title': re.compile('MongoDB', re.IGNORECASE)}) Post.objects(title__icontains='MongoDB') # In mongoengine # Chaining is nicer db.post.update({"published": False}, {"$set": {"published": True}}, multi=True) Post.objects(published=False).update(set__published=True)
  19. LE S SO N 4 : NOT ALL IDEAS ARE

    GOOD http://www.flickr.com/photos/abiding_silence/6951229015
  20. CHANGING SAVE # In pymongo save replaces the whole document

    db.post.save({'_id': 'my_id', 'title': 'MongoDB', 'published': True}) # In mongoengine we track changes post = Post.objects(_id='my_id').first() post.published = True post.save() # Results in: db.post.update({'_id': 'my_id'}, {'$set': {'published': True}})
  21. CHANGING SAVE Any field changes are noted How to monitor

    lists and dicts? Custom List and Dict classes Observes changes and marks as dirty
  22. HOW IT WORKS class Post(Document): title = StringField(max_length=120, required=True) author

    = ReferenceField('User') tags = ListField(StringField(max_length=30)) comments = ListField(EmbeddedDocumentField('Comment')) class User(Document): email = StringField(required=True) first_name = StringField(max_length=50) last_name = StringField(max_length=50) class Comment(EmbeddedDocument): content = StringField() name = StringField(max_length=120)
  23. Post HOW IT WORKS - comments comment comment comment post

    = Post.objects.first() post.comments[1].name = 'Fred' post.save()
  24. Post HOW IT WORKS - comments comment 1.Convert the comments

    data to a BaseList BaseList Stores the instance and name / location comment comment post.comments[1].name = 'Fred'
  25. Post HOW IT WORKS - comments comment 2.Convert the comment

    data to BaseDict sets name as: 'comments.1' comment comment post.comments[1].name = 'Fred'
  26. Post HOW IT WORKS - comments comment 3.Change name to

    "Fred" 4. Tell Post 'comments.1.name' has changed comment comment post.comments[1].name = 'Fred'
  27. Post HOW IT WORKS - comments comment 5.On save() Iterate

    all the changes on post and run $set / $unset queries comment comment post.save() db.post.update( {'_id': 'my_id'}, {'$set': { 'comments.1.name': 'Fred'}} )
  28. A GOOD IDEA? + Makes it easier to use +

    save acts how people think it should - Its expensive - Doesn't help people understand MongoDB
  29. LE S SO N 5: M ANAGI NG A COMMUNIT

    Y http://kingscross.the-hub.net/
  30. Github effect >10 django mongoengine projects None active on pypi

    Little cross project communication CODERS JUST WANT TO CODE * * Side effect of being stalled?
  31. Flask-mongoengine on pypi There were 2 different projects Now has

    extra maintainers from the flask-mongorest Django-mongoengine* Spoke to authors of 7 projects and merged their work together to form a single library * Coming soon! REACH OUT
  32. THE COMMUNITY Not all ideas are good! Vocal people don't

    always have great ideas Travis is great* * but you still have to read the pull request Communities have to be managed I've only just started to learn how to herding cats
  33. LE S SO N 6 : DON' T BE AF

    R A ID TO ASK http://www.flickr.com/photos/kandyjaxx/2012468692