Christine Spang - To ORM or not to ORM

Christine Spang - To ORM or not to ORM

Database ORMs make it really convenient to pythonically query a database, but it's difficult to decide when to use them and when not to---and what the alternatives are. In this talk you'll learn strategies for deciding when and where to use an ORM, when to be cautious, and how to tell that you're doing the right thing, drawn from real-world lessons learned building the Inbox email platform.

https://us.pycon.org/2015/schedule/presentation/371/

D5710b3bca38f1233274b4cbc523dc4b?s=128

PyCon 2015

April 18, 2015
Tweet

Transcript

  1. To ORM or Not to ORM Christine Spang Cofounder, Nylas

    @spang
  2. 25k lines of Python developed over 2 years. Nylas Sync

    Engine
  3. This talk • What is an ORM? • Why should

    you use one? • Which one should you use? Lessons learned at Nylas
  4. This talk • What is an ORM? • Why should

    you use one? • Which one should you use? Lessons learned at Nylas
  5. Object-Relational Mapping Object in memory Object in database Two-way sync

  6. Interpreter function Object-Relational Mapping interpreter .py file executable byte code

    ORM Python code SQL
  7. the ORM that powers Nylas

  8. ORMs centralize the data model Example from SQLAlchemy from sqlalchemy.ext.declarative

    import declarative_base Base = declarative_base() class Contact(Base): __tablename__ = 'contact' id = Column(Integer, primary_key=True) name = Column(String(50)) email_address = Column(String(255))
  9. Generating SQL from an ORM Example from SQLAlchemy db_session =

    session_scope() c = Contact(name='Ben Bitdiddle', email_address='benbitdiddle@nylas.com') db_session.add(c) db_session.commit() contact_id = c.id INSERT INTO contact (name, email_address) VALUES ('Ben Bitdiddle', 'benbitdiddle@nylas.com') # later on c = db_session.query(Contact).get(contact_id) SELECT * FROM contact WHERE id = 1567;
  10. This talk • What is an ORM? • Why should

    you use one? • Which one should you use? Lessons learned at Nylas
  11. None
  12. l Should have used an ORM! • Built for SQLite,

    migrated to MySQL • Manual SQL querying without an ORM
  13. import MySQLdb db = MySQLdb.connect(passwd="example", db="billing") c = db.cursor() c.execute("SELECT

    id FROM users WHERE billing_enabled = 1") rows = c.fetchall() for col1, col2 in rows: # insert some rows in another table c.execute(“INSERT INTO invoices SET date={} amount={} line_items={} customer_id={}”.format(datetime.now(), [1534, 235555], col1)
  14. None
  15. Python in the data model Better interfaces and backwards compatibility

    class ImapAccount(Account): # private columns – never directly accessed _imap_server_host = Column(String(255), nullable=True) _imap_server_port = Column(Integer, nullable=False, server_default='993') @property def imap_endpoint(self): if self._imap_server_host is not None: return (self._imap_server_host, self._imap_server_port) else: return self.provider_info['imap']
  16. Python in the data model Polymorphism class Account(MailSyncBase, HasPublicID, HasEmailAddress):

    discriminator = Column('type', String(16)) __mapper_args__ = {'polymorphic_identity': 'account', 'polymorphic_on': discriminator} class ImapAccount(Account): id = Column(Integer, ForeignKey(Account.id, ondelete='CASCADE'), primary_key=True) __mapper_args__ = {'polymorphic_identity': 'imapaccount'} class ExchangeAccount(Account): id = Column(Integer, ForeignKey('account.id', ondelete='CASCADE'), primary_key=True) __mapper_args__ = {'polymorphic_identity': 'easaccount'}
  17. Python in the data model Abstracting many-to-many relationships class Thread(MailSyncBase,

    HasPublicID, HasRevisions): tags = association_proxy( 'tagitems', 'tag', creator=lambda tag: TagItem(tag=tag)) class TagItem(MailSyncBase): """Mapping between user tags and threads.""" thread_id = Column(Integer, ForeignKey(Thread.id, ondelete='CASCADE'), nullable=False) tag_id = Column(Integer, ForeignKey(Tag.id, ondelete='CASCADE'), nullable=False) tag = relationship(Tag, backref='tagitems', cascade='all, delete-orphan', lazy='dynamic'))
  18. Python in the data model Clean up sessions with the

    `with` statement from inbox.models.session import session_scope with session_scope() as mailsync_session: raise Exception("still cleans up session afterwards")
  19. Python in the data model Clean up sessions with the

    `with` statement from inbox.models.session import session_scope with session_scope() as mailsync_session: raise Exception("still cleans up session afterwards")
  20. Python in the data model Batch flushing changes for performance

    with session_scope() as mailsync_session: t = mailsync_session.query(Thread).get(1234) t.apply_tag(Tag('inbox')) t.messages.append(Message()) # generates SQL update / inserts based on batched changes mailsync_session.commit()
  21. When Not to Use ORMs

  22. A profiler will show you.

  23. None
  24. Database specific code • Data types • Performance optimizations •

    Configuration options Even with an ORM!
  25. Database specific code Typical conditionals from inbox.config import get_db_info if

    get_db_info()['engine'] == 'mysql': from inbox.sqlalchemy_ext.util import BigJSON, BLOB, Base36UID BIGJSON_TYPE = BigJSON BLOB_TYPE = BLOB BASE36_TYPE = Base36UID else: from sqlalchemy.dialects.postgresql import JSON, BYTEA BIGJSON_TYPE = JSON BLOB_TYPE = BYTEA BASE36_TYPE = BYTEA
  26. ORMs hide queries class Message(Base): __tablename__ = 'message' parts =

    relationship('Part', ...) @property def has_attached_events(self): return 'text/calendar' in \ [p.block.content_type for p in self.parts] m = db_session.query(Message).get(1) # issue lots of queries behind the scenes! if m.has_attached_events: pass
  27. Intuitively translating ORM to SQL Learn to customize SQL output

    from sqlalchemy import create_engine engine = create_engine( 'mysql://scott:tiger@localhost/foo', echo=True)
  28. Intuitively translating ORM to SQL imap_uid_entries = db_session.query(ImapUid)\ .options(load_only('msg_uid'), lazyload('folder'),

    joinedload('message').load_only('g_msgid') .lazyload('namespace'))\ .filter_by(account_id=self.account_id, folder_id=self.folder_id) # loading exactly the right amount of data In [5]: uid = db_session.query(ImapUid).options(load_only('msg_uid'), lazyload('folder'), joinedload('message').load_only('g_msgid').lazyload('namespace')).filter_by(account_id=1, folder_id=1).first() 2015-03-11 15:17:54,228 INFO sqlalchemy.engine.base.Engine SELECT imapuid.id AS imapuid_id, imapuid.msg_uid AS imapuid_msg_uid, imapuid.message_id AS imapuid_message_id, message_1.id AS message_1_id, message_1.g_msgid AS message_1_g_msgid FROM imapuid LEFT OUTER JOIN message AS message_1 ON message_1.id = imapuid.message_id WHERE imapuid.account_id = %s AND imapuid.folder_id = %s LIMIT %s 2015-03-11 15:17:54,229 INFO sqlalchemy.engine.base.Engine (1, 1, 1) In [6]: uid.message.g_msgid Out[6]: 1489348545793534853 How to customize SQL output
  29. None
  30. When Not to Use ORMs • In simple scripts •

    In migrations
  31. Don't use ORMs in migrations conn = op.get_bind() conn.execute(text(""" ALTER

    TABLE folder ADD COLUMN identifier VARCHAR(:len) NULL, ADD CONSTRAINT account_id UNIQUE (account_id, name, canonical_name, identifier)"""), len=767) # Set canonical tags: inbox, sent, drafts, trash, archive too conn.execute(text(""" UPDATE folder INNER JOIN account ON folder.id=account.inbox_folder_id SET folder.canonical_name='inbox' WHERE account.type='easaccount'"""))
  32. This talk • What is an ORM? • Why should

    you use one? • Which one should you use? Lessons learned at Nylas
  33. • For apps ➔ Use built-in Django ORM • For

    everything else ➔ Use Recommended options
  34. • ORMs solve a hard problem • Abstractions manage data

    model complexity • Learn to use SQL echo! What did we learn? Say hi! spang@nylas.com More code online: https://github.com/nylas/sync-engine More Q&A w/Nylas team @ 3 Brasseurs, 6pm! https://nylas.com/pycon