SQLAlchemy ORM https://docs.sqlalchemy.org/en/14/orm/tutorial.html >>> from sqlalchemy.orm import sessionmaker >>> Session = sessionmaker(bind=engine) >>> session = Session() >>> ed_user = User(name='ed', fullname='Ed Jones', nickname='edsnickname') >>> session.add(ed_user) >>> session.commit() >>> for instance in session.query(User).order_by(User.id): ... print(instance.name, instance.fullname) ed Ed Jones wendy Wendy Williams mary Mary Contrary fred Fred Flintstone 6
– asyncpg + asyncpgsa § ORM on the Greenlet ecosystem – peewee § Yet another ORM with multi-database backends – databases + orm Python ORMs in the asyncio ecosystem 7
lots of boilerplate codes to convert between Row objects and actual Python objects. § Still, SQLAlchemy offers: – Structured schema version management via alembic – Seamless custom type conversion for complex-type columns (e.g., JSON) § Binding – Tried asyncpgsa at first but later moved to aiopg... Backend.AI + SQLAlchemy 11
transaction – The timing looks like a slit but it happens in production! – Solution: had to wrap many db transactions used in long-running websocket connection handlers with asyncio.shield()... ▹ asyncio.shield() itself is not an ideal API since it still may get cancelled upon event loop shutdown. – Patched after 1.5yr since the first report (aio-libs/aiopg#332) Technical issues with aiopg 12
– Tricky case: adding/removing enum constants ▹ Need to cast enum constants to text back and forth ▹ Need to handle removal for downgrade – Easy-to-mistake case: Referring other table's schema in migrations ▹ Migrations may run in arbitrary git commits & branches! ▹ Referred table schemas must be duplicated in the migration script to reflect its definition at the time of writing the migration. Technical issues with Alembic 14
SQLAlchemy in at least ten years” – Native asyncio support for both Core and ORM! – Type annotation support (sqlalchemy2-stubs) – Universal & transparent SQL compilation cache – 2.0-style transaction models (optional) – Core & ORM queries with unified query composition Landing of SQLAlchemy v1.4 17
2.0 ü A stepping stone for existing apps ü Ensure no deprecation warnings raised ü Subtle changes ü Changed behavior of SELECT and join() methods ü Python 3.6+ required ü Turn on "RemovedIn20Warning" ü Explicit connection context manager required ü Explicit binding for metadata required ü Add "future" flag to Engine and Session objects for new transaction behaviors ü No library-level autocommit ü Rows are now named tuples instead of dicts ü session.execute() everywhere in ORM ü ... 18
should have reported “operation in progress” DBAPI error – Patched through a series of v1.4 patch releases (v1.4.13+) ▹ nested transaction begin() missing Technical issues during SA 1.4 migration 21
and not released yet at the time of writing this CFP…..) – Yes, “proper” cancellation is a difficult problem in asyncio…. § greenlet is missing! – pip install "SQLAlchemy[postgresql_asyncpg]" – Update poetry to 1.1.8 or later! – Some envs may still need to install greenlet manually (e.g., M1 Mac) ... Technical issues during SA 1.4 migration 22
involves many async tasks on multiple different nodes ▹ Should be globally synchronized to guarantee resource constraints! – READ COMMITTED vs. REPEATABLE READ vs. SERIALIZABLE Technical issues during SA 1.4 migration 1) Backend.AI session is a collection of containers running in one or more computing agent nodes in GPU clusters. 23
▹ Is RDBMS suitable for strongly-consistent distributed state management? ▹ In early versions of Backend.AI, we used Redis but have migrated to PostgreSQL in favor of well-structured data schemes as more features added. – Why now (with SQLAlchemy v1.4 and not with aiopg)? ▹ Maybe due to their internal connection pooling mechanism differences... ▹ SA reuses the underlying DBAPI connection but recreates the connection wrapper (“fairy”) objects everytime Technical issues during SA 1.4 migration 27
Core APIs based on asyncpg § Bleeding edge is bleeding edge... – It takes time for everyone (even experience developers) to deeply get used with asyncio (+cancellation) § Strongly-consistent state management on RDBMS is difficult. – Consider non-RDBMS solutions if possible – Use a proper transaction isolation level and retry serialization failures Summary 28