Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Christine Spang - To ORM or not to ORM

Christine Spang - To ORM or not to ORM

Database ORMs make it really convenient to pythonically query a database, but it's difficult to decide when to use them and when not to---and what the alternatives are. In this talk you'll learn strategies for deciding when and where to use an ORM, when to be cautious, and how to tell that you're doing the right thing, drawn from real-world lessons learned building the Inbox email platform.

https://us.pycon.org/2015/schedule/presentation/371/

PyCon 2015

April 18, 2015
Tweet

More Decks by PyCon 2015

Other Decks in Programming

Transcript

  1. This talk • What is an ORM? • Why should

    you use one? • Which one should you use? Lessons learned at Nylas
  2. This talk • What is an ORM? • Why should

    you use one? • Which one should you use? Lessons learned at Nylas
  3. ORMs centralize the data model Example from SQLAlchemy from sqlalchemy.ext.declarative

    import declarative_base Base = declarative_base() class Contact(Base): __tablename__ = 'contact' id = Column(Integer, primary_key=True) name = Column(String(50)) email_address = Column(String(255))
  4. Generating SQL from an ORM Example from SQLAlchemy db_session =

    session_scope() c = Contact(name='Ben Bitdiddle', email_address='[email protected]') db_session.add(c) db_session.commit() contact_id = c.id INSERT INTO contact (name, email_address) VALUES ('Ben Bitdiddle', '[email protected]') # later on c = db_session.query(Contact).get(contact_id) SELECT * FROM contact WHERE id = 1567;
  5. This talk • What is an ORM? • Why should

    you use one? • Which one should you use? Lessons learned at Nylas
  6. l Should have used an ORM! • Built for SQLite,

    migrated to MySQL • Manual SQL querying without an ORM
  7. import MySQLdb db = MySQLdb.connect(passwd="example", db="billing") c = db.cursor() c.execute("SELECT

    id FROM users WHERE billing_enabled = 1") rows = c.fetchall() for col1, col2 in rows: # insert some rows in another table c.execute(“INSERT INTO invoices SET date={} amount={} line_items={} customer_id={}”.format(datetime.now(), [1534, 235555], col1)
  8. Python in the data model Better interfaces and backwards compatibility

    class ImapAccount(Account): # private columns – never directly accessed _imap_server_host = Column(String(255), nullable=True) _imap_server_port = Column(Integer, nullable=False, server_default='993') @property def imap_endpoint(self): if self._imap_server_host is not None: return (self._imap_server_host, self._imap_server_port) else: return self.provider_info['imap']
  9. Python in the data model Polymorphism class Account(MailSyncBase, HasPublicID, HasEmailAddress):

    discriminator = Column('type', String(16)) __mapper_args__ = {'polymorphic_identity': 'account', 'polymorphic_on': discriminator} class ImapAccount(Account): id = Column(Integer, ForeignKey(Account.id, ondelete='CASCADE'), primary_key=True) __mapper_args__ = {'polymorphic_identity': 'imapaccount'} class ExchangeAccount(Account): id = Column(Integer, ForeignKey('account.id', ondelete='CASCADE'), primary_key=True) __mapper_args__ = {'polymorphic_identity': 'easaccount'}
  10. Python in the data model Abstracting many-to-many relationships class Thread(MailSyncBase,

    HasPublicID, HasRevisions): tags = association_proxy( 'tagitems', 'tag', creator=lambda tag: TagItem(tag=tag)) class TagItem(MailSyncBase): """Mapping between user tags and threads.""" thread_id = Column(Integer, ForeignKey(Thread.id, ondelete='CASCADE'), nullable=False) tag_id = Column(Integer, ForeignKey(Tag.id, ondelete='CASCADE'), nullable=False) tag = relationship(Tag, backref='tagitems', cascade='all, delete-orphan', lazy='dynamic'))
  11. Python in the data model Clean up sessions with the

    `with` statement from inbox.models.session import session_scope with session_scope() as mailsync_session: raise Exception("still cleans up session afterwards")
  12. Python in the data model Clean up sessions with the

    `with` statement from inbox.models.session import session_scope with session_scope() as mailsync_session: raise Exception("still cleans up session afterwards")
  13. Python in the data model Batch flushing changes for performance

    with session_scope() as mailsync_session: t = mailsync_session.query(Thread).get(1234) t.apply_tag(Tag('inbox')) t.messages.append(Message()) # generates SQL update / inserts based on batched changes mailsync_session.commit()
  14. Database specific code Typical conditionals from inbox.config import get_db_info if

    get_db_info()['engine'] == 'mysql': from inbox.sqlalchemy_ext.util import BigJSON, BLOB, Base36UID BIGJSON_TYPE = BigJSON BLOB_TYPE = BLOB BASE36_TYPE = Base36UID else: from sqlalchemy.dialects.postgresql import JSON, BYTEA BIGJSON_TYPE = JSON BLOB_TYPE = BYTEA BASE36_TYPE = BYTEA
  15. ORMs hide queries class Message(Base): __tablename__ = 'message' parts =

    relationship('Part', ...) @property def has_attached_events(self): return 'text/calendar' in \ [p.block.content_type for p in self.parts] m = db_session.query(Message).get(1) # issue lots of queries behind the scenes! if m.has_attached_events: pass
  16. Intuitively translating ORM to SQL Learn to customize SQL output

    from sqlalchemy import create_engine engine = create_engine( 'mysql://scott:tiger@localhost/foo', echo=True)
  17. Intuitively translating ORM to SQL imap_uid_entries = db_session.query(ImapUid)\ .options(load_only('msg_uid'), lazyload('folder'),

    joinedload('message').load_only('g_msgid') .lazyload('namespace'))\ .filter_by(account_id=self.account_id, folder_id=self.folder_id) # loading exactly the right amount of data In [5]: uid = db_session.query(ImapUid).options(load_only('msg_uid'), lazyload('folder'), joinedload('message').load_only('g_msgid').lazyload('namespace')).filter_by(account_id=1, folder_id=1).first() 2015-03-11 15:17:54,228 INFO sqlalchemy.engine.base.Engine SELECT imapuid.id AS imapuid_id, imapuid.msg_uid AS imapuid_msg_uid, imapuid.message_id AS imapuid_message_id, message_1.id AS message_1_id, message_1.g_msgid AS message_1_g_msgid FROM imapuid LEFT OUTER JOIN message AS message_1 ON message_1.id = imapuid.message_id WHERE imapuid.account_id = %s AND imapuid.folder_id = %s LIMIT %s 2015-03-11 15:17:54,229 INFO sqlalchemy.engine.base.Engine (1, 1, 1) In [6]: uid.message.g_msgid Out[6]: 1489348545793534853 How to customize SQL output
  18. Don't use ORMs in migrations conn = op.get_bind() conn.execute(text(""" ALTER

    TABLE folder ADD COLUMN identifier VARCHAR(:len) NULL, ADD CONSTRAINT account_id UNIQUE (account_id, name, canonical_name, identifier)"""), len=767) # Set canonical tags: inbox, sent, drafts, trash, archive too conn.execute(text(""" UPDATE folder INNER JOIN account ON folder.id=account.inbox_folder_id SET folder.canonical_name='inbox' WHERE account.type='easaccount'"""))
  19. This talk • What is an ORM? • Why should

    you use one? • Which one should you use? Lessons learned at Nylas
  20. • For apps ➔ Use built-in Django ORM • For

    everything else ➔ Use Recommended options
  21. • ORMs solve a hard problem • Abstractions manage data

    model complexity • Learn to use SQL echo! What did we learn? Say hi! [email protected] More code online: https://github.com/nylas/sync-engine More Q&A w/Nylas team @ 3 Brasseurs, 6pm! https://nylas.com/pycon