Slide 1

Slide 1 text

To ORM or Not to ORM Christine Spang Cofounder, Nylas @spang

Slide 2

Slide 2 text

25k lines of Python developed over 2 years. Nylas Sync Engine

Slide 3

Slide 3 text

This talk ● What is an ORM? ● Why should you use one? ● Which one should you use? Lessons learned at Nylas

Slide 4

Slide 4 text

This talk ● What is an ORM? ● Why should you use one? ● Which one should you use? Lessons learned at Nylas

Slide 5

Slide 5 text

Object-Relational Mapping Object in memory Object in database Two-way sync

Slide 6

Slide 6 text

Interpreter function Object-Relational Mapping interpreter .py file executable byte code ORM Python code SQL

Slide 7

Slide 7 text

the ORM that powers Nylas

Slide 8

Slide 8 text

ORMs centralize the data model Example from SQLAlchemy from sqlalchemy.ext.declarative import declarative_base Base = declarative_base() class Contact(Base): __tablename__ = 'contact' id = Column(Integer, primary_key=True) name = Column(String(50)) email_address = Column(String(255))

Slide 9

Slide 9 text

Generating SQL from an ORM Example from SQLAlchemy db_session = session_scope() c = Contact(name='Ben Bitdiddle', email_address='[email protected]') db_session.add(c) db_session.commit() contact_id = c.id INSERT INTO contact (name, email_address) VALUES ('Ben Bitdiddle', '[email protected]') # later on c = db_session.query(Contact).get(contact_id) SELECT * FROM contact WHERE id = 1567;

Slide 10

Slide 10 text

This talk ● What is an ORM? ● Why should you use one? ● Which one should you use? Lessons learned at Nylas

Slide 11

Slide 11 text

No content

Slide 12

Slide 12 text

l Should have used an ORM! ● Built for SQLite, migrated to MySQL ● Manual SQL querying without an ORM

Slide 13

Slide 13 text

import MySQLdb db = MySQLdb.connect(passwd="example", db="billing") c = db.cursor() c.execute("SELECT id FROM users WHERE billing_enabled = 1") rows = c.fetchall() for col1, col2 in rows: # insert some rows in another table c.execute(“INSERT INTO invoices SET date={} amount={} line_items={} customer_id={}”.format(datetime.now(), [1534, 235555], col1)

Slide 14

Slide 14 text

No content

Slide 15

Slide 15 text

Python in the data model Better interfaces and backwards compatibility class ImapAccount(Account): # private columns – never directly accessed _imap_server_host = Column(String(255), nullable=True) _imap_server_port = Column(Integer, nullable=False, server_default='993') @property def imap_endpoint(self): if self._imap_server_host is not None: return (self._imap_server_host, self._imap_server_port) else: return self.provider_info['imap']

Slide 16

Slide 16 text

Python in the data model Polymorphism class Account(MailSyncBase, HasPublicID, HasEmailAddress): discriminator = Column('type', String(16)) __mapper_args__ = {'polymorphic_identity': 'account', 'polymorphic_on': discriminator} class ImapAccount(Account): id = Column(Integer, ForeignKey(Account.id, ondelete='CASCADE'), primary_key=True) __mapper_args__ = {'polymorphic_identity': 'imapaccount'} class ExchangeAccount(Account): id = Column(Integer, ForeignKey('account.id', ondelete='CASCADE'), primary_key=True) __mapper_args__ = {'polymorphic_identity': 'easaccount'}

Slide 17

Slide 17 text

Python in the data model Abstracting many-to-many relationships class Thread(MailSyncBase, HasPublicID, HasRevisions): tags = association_proxy( 'tagitems', 'tag', creator=lambda tag: TagItem(tag=tag)) class TagItem(MailSyncBase): """Mapping between user tags and threads.""" thread_id = Column(Integer, ForeignKey(Thread.id, ondelete='CASCADE'), nullable=False) tag_id = Column(Integer, ForeignKey(Tag.id, ondelete='CASCADE'), nullable=False) tag = relationship(Tag, backref='tagitems', cascade='all, delete-orphan', lazy='dynamic'))

Slide 18

Slide 18 text

Python in the data model Clean up sessions with the `with` statement from inbox.models.session import session_scope with session_scope() as mailsync_session: raise Exception("still cleans up session afterwards")

Slide 19

Slide 19 text

Python in the data model Clean up sessions with the `with` statement from inbox.models.session import session_scope with session_scope() as mailsync_session: raise Exception("still cleans up session afterwards")

Slide 20

Slide 20 text

Python in the data model Batch flushing changes for performance with session_scope() as mailsync_session: t = mailsync_session.query(Thread).get(1234) t.apply_tag(Tag('inbox')) t.messages.append(Message()) # generates SQL update / inserts based on batched changes mailsync_session.commit()

Slide 21

Slide 21 text

When Not to Use ORMs

Slide 22

Slide 22 text

A profiler will show you.

Slide 23

Slide 23 text

No content

Slide 24

Slide 24 text

Database specific code ● Data types ● Performance optimizations ● Configuration options Even with an ORM!

Slide 25

Slide 25 text

Database specific code Typical conditionals from inbox.config import get_db_info if get_db_info()['engine'] == 'mysql': from inbox.sqlalchemy_ext.util import BigJSON, BLOB, Base36UID BIGJSON_TYPE = BigJSON BLOB_TYPE = BLOB BASE36_TYPE = Base36UID else: from sqlalchemy.dialects.postgresql import JSON, BYTEA BIGJSON_TYPE = JSON BLOB_TYPE = BYTEA BASE36_TYPE = BYTEA

Slide 26

Slide 26 text

ORMs hide queries class Message(Base): __tablename__ = 'message' parts = relationship('Part', ...) @property def has_attached_events(self): return 'text/calendar' in \ [p.block.content_type for p in self.parts] m = db_session.query(Message).get(1) # issue lots of queries behind the scenes! if m.has_attached_events: pass

Slide 27

Slide 27 text

Intuitively translating ORM to SQL Learn to customize SQL output from sqlalchemy import create_engine engine = create_engine( 'mysql://scott:tiger@localhost/foo', echo=True)

Slide 28

Slide 28 text

Intuitively translating ORM to SQL imap_uid_entries = db_session.query(ImapUid)\ .options(load_only('msg_uid'), lazyload('folder'), joinedload('message').load_only('g_msgid') .lazyload('namespace'))\ .filter_by(account_id=self.account_id, folder_id=self.folder_id) # loading exactly the right amount of data In [5]: uid = db_session.query(ImapUid).options(load_only('msg_uid'), lazyload('folder'), joinedload('message').load_only('g_msgid').lazyload('namespace')).filter_by(account_id=1, folder_id=1).first() 2015-03-11 15:17:54,228 INFO sqlalchemy.engine.base.Engine SELECT imapuid.id AS imapuid_id, imapuid.msg_uid AS imapuid_msg_uid, imapuid.message_id AS imapuid_message_id, message_1.id AS message_1_id, message_1.g_msgid AS message_1_g_msgid FROM imapuid LEFT OUTER JOIN message AS message_1 ON message_1.id = imapuid.message_id WHERE imapuid.account_id = %s AND imapuid.folder_id = %s LIMIT %s 2015-03-11 15:17:54,229 INFO sqlalchemy.engine.base.Engine (1, 1, 1) In [6]: uid.message.g_msgid Out[6]: 1489348545793534853 How to customize SQL output

Slide 29

Slide 29 text

No content

Slide 30

Slide 30 text

When Not to Use ORMs ● In simple scripts ● In migrations

Slide 31

Slide 31 text

Don't use ORMs in migrations conn = op.get_bind() conn.execute(text(""" ALTER TABLE folder ADD COLUMN identifier VARCHAR(:len) NULL, ADD CONSTRAINT account_id UNIQUE (account_id, name, canonical_name, identifier)"""), len=767) # Set canonical tags: inbox, sent, drafts, trash, archive too conn.execute(text(""" UPDATE folder INNER JOIN account ON folder.id=account.inbox_folder_id SET folder.canonical_name='inbox' WHERE account.type='easaccount'"""))

Slide 32

Slide 32 text

This talk ● What is an ORM? ● Why should you use one? ● Which one should you use? Lessons learned at Nylas

Slide 33

Slide 33 text

● For apps ➔ Use built-in Django ORM ● For everything else ➔ Use Recommended options

Slide 34

Slide 34 text

● ORMs solve a hard problem ● Abstractions manage data model complexity ● Learn to use SQL echo! What did we learn? Say hi! [email protected] More code online: https://github.com/nylas/sync-engine More Q&A w/Nylas team @ 3 Brasseurs, 6pm! https://nylas.com/pycon