Description
The open source Nylas Sync Engine provides a RESTful API on top of a powerful email sync platform, making it easy to build messaging into apps. It’s built using Python and gevent and has scaled to sync billions of messages over its lifetime deployment. In this talk, we’ll show you how it’s built and what technical challenges we’ve solved along the way.
Abstract
Why a sync engine?
If you’ve ever tried to build anything that works with email, you’ll find that it’s a problem full of twisty corners: the protocols themselves are obtuse and require entire RFCs just to describe how to implement sync with them, if you want your integration to work with everyone’s email you face implementing several different protocols or flavours of protocols (IMAP with CONDSTORE, IMAP with no CONDSTORE, Gmail IMAP, Exchange Web Services, Exchange ActiveSync, Office365 REST) plus OAuth authentication for different providers, and once you’ve gotten data flowing you still need to handle parsing email, which involves a complex format known as MIME as well as pretty much every way of encoding non-ASCII text as ASCII that has ever been invented.
We’ve built a platform that layers a sync engine over 30 years of email history and allows developers to read and write to mailboxes and calendars using a modern REST API. It’s not just a simple proxy that makes calls to IMAP or Exchange behind the scenes: in order to meet the speed and reliability demands our customers require, when a user connects their email account to a developer’s app, our infrastructure syncs a copy of that mailbox and keeps it up-to-date as changes are made from that app or traditional web, mobile, and desktop email clients. This is a demanding technical challenge and wasn’t easy to build.
How a sync engine?
A semi-monolithic application composed of several services that all share a common database and a fair amount of code, but run on separate server fleets (email sync, api frontend, webhooks, etc.)
~90k lines of Python, including tests and migrations
MySQL: one sharded database and one global database
Major libraries we use: Flask, gevent, SQLAlchemy, pytest In production: haproxy, nginx+gunicorn w/gevent pywsgi adapter.
Technical challenges (so far!)
What are the major problems that we’ve solved?
A universal API across providers
Philosophy: whenever possible, unify the API across providers
We should allow developers to build one integration, not many
A few exceptions: Folders vs labels
Transactions, delta streaming & webhooks
Capturing mailbox changes & allowing apps to subscribe to them
For now: SQLAlchemy events & MySQL are the backbone
Error handling & retries using gevent
Wrapping greenlets to implement backoff
Saving & aggregating errors
Sharded data store
How we split data across multiple MySQL clusters
Performance instrumentation
Extensive custom instrumentation built on top of greenlets
Available for you to use: nylas-perftools
Load balancing
Mail accounts are heterogeneous: different protocols, sizes, rates of new mail receipt…
How to distribute load across a fleet of servers & keep them balanced?
The future
mypy, Python 3, Kafka, more flexible MySQL clusters, and beyond!
Bio
Christine went to MIT, dropped out of an operating systems graduate program to be an early engineer at Ksplice, and most recently cofounded Nylas, a startup building an email platform. When she's not building rock-solid infrastructure for the Internet or speaking around the world at conferences like DebConf and PyCon, rumour has it she can be found on cliff walls, remote trails, and dance floors. She lives in Oakland, California.