Save 37% off PRO during our Black Friday Sale! »

[PyCon KR] The next-generation SQLAlchemy with asyncio

[PyCon KR] The next-generation SQLAlchemy with asyncio

Ed7b6f41ac2581f1be3fd9b5bc883875?s=128

Joongi Kim

October 02, 2021
Tweet

Transcript

  1. The next generation SQLAlchemy with asyncio Joongi Kim / Lablup

    Inc.
  2. None
  3. § SQLAlchemy in 1 minute – Core API & ORM

    § Python databases in the asyncio ecosystem – aiopg + aiopg.sa / asyncpg + asyncpgsa § Technical issues when using aiopg & SQLAlchemy § Landing of SQLAlchemy v1.4 § Technical issues during the migration Agenda 1
  4. § “The de-facto standard DBAPI & ORM for Python programs”

    SQLAlchemy in 1 minute ORM Core DBAPI Database Schema / Types SQL Engine Dialect Connection Pool https://speakerdeck.com/zzzeek/introduction-to- sqlalchemy-pycon-2013?slide=8 2
  5. § 1-to-1 translation of Python's chained function constructor to SQL

    statement SQLAlchemy Core API >>> from sqlalchemy import Table, Column, Integer, String, MetaData >>> metadata = MetaData() >>> users = Table('users', metadata, ... Column('id', Integer, primary_key=True), ... Column('name', String), ... Column('fullname', String), ... ) >>> metadata.create_all(engine) https://docs.sqlalchemy.org/en/14/core/tutorial.html 3
  6. § 1-to-1 translation of Python's chained function constructor to SQL

    statement SQLAlchemy Core API >>> ins = users.insert().values(name='jack', fullname='Jack Jones') >>> str(ins) 'INSERT INTO users (name, fullname) VALUES (:name, :fullname)' >>> s = select(users.c.name, users.c.fullname) >>> result = conn.execute(s) >>> for row in result: ... print(row) (u'jack', u'Jack Jones') (u'wendy', u'Wendy Williams') https://docs.sqlalchemy.org/en/14/core/tutorial.html 4
  7. § 1-to-1 mapping of Python objects to SQL table rows

    SQLAlchemy ORM https://docs.sqlalchemy.org/en/14/orm/tutorial.html >>> from sqlalchemy.orm import declarative_base >>> from sqlalchemy import Column, Integer, String >>> Base = declarative_base() >>> class User(Base): ... __tablename__ = 'users' ... id = Column(Integer, primary_key=True) ... name = Column(String) ... fullname = Column(String) ... nickname = Column(String) ... ... def __repr__(self): ... return "<User(name='%s', fullname='%s', nickname='%s')>" % ( ... self.name, self.fullname, self.nickname, ... ) >>> Base.metadata.create_all(engine) 5
  8. § 1-to-1 mapping of Python objects to SQL table rows

    SQLAlchemy ORM https://docs.sqlalchemy.org/en/14/orm/tutorial.html >>> from sqlalchemy.orm import sessionmaker >>> Session = sessionmaker(bind=engine) >>> session = Session() >>> ed_user = User(name='ed', fullname='Ed Jones', nickname='edsnickname') >>> session.add(ed_user) >>> session.commit() >>> for instance in session.query(User).order_by(User.id): ... print(instance.name, instance.fullname) ed Ed Jones wendy Wendy Williams mary Mary Contrary fred Fred Flintstone 6
  9. § SQLAlchemy core binding for asyncio – aiopg + aiopg.sa

    – asyncpg + asyncpgsa § ORM on the Greenlet ecosystem – peewee § Yet another ORM with multi-database backends – databases + orm Python ORMs in the asyncio ecosystem 7
  10. aiopg + aiopg.sa (Core only!) import asyncio import sqlalchemy as

    sa from aiopg.sa import create_engine metadata = sa.MetaData() tbl = sa.Table(…) async def create_table(conn): await conn.execute('DROP TABLE IF EXISTS tbl') await conn.execute('CREATE TABLE tbl (id serial PRIMARY KEY, val varchar(255))') async def main(): async with create_engine( user='postgres', password='helloworld', host='127.0.0.1', port=15432, database='testing', ) as engine: async with engine.acquire() as conn: await create_table(conn) async with engine.acquire() as conn: await conn.execute(tbl.insert().values(val='abc')) async for row in conn.execute(tbl.select()): print(row.id, row.val) tbl = sa.Table( 'tbl', metadata, sa.Column('id', sa.Integer, primary_key=True), sa.Column('val', sa.String(255)), ) 8
  11. asyncpg + asyncpgsa (Core only!) import asyncio import sqlalchemy as

    sa import asyncpgsa metadata = sa.MetaData() tbl = sa.Table(…) async def create_table(conn): await conn.execute('DROP TABLE IF EXISTS tbl') await conn.execute('CREATE TABLE tbl (id serial PRIMARY KEY, val varchar(255))') async def main(): pool = await asyncpgsa.create_pool( host='127.0.0.1', port=15432, database='testing', user='postgres', password='helloworld', min_size=5, max_size=10, ) async with pool.acquire() as conn: await create_table(conn) async with pool.acquire() as conn: await conn.execute(tbl.insert().values(val='abc')) for row in await conn.fetch(tbl.select()): print(row['id'], row['val']) tbl = sa.Table( 'tbl', metadata, sa.Column('id', sa.Integer, primary_key=True), sa.Column('val', sa.String(255)), ) 9
  12. databases + SQLAlchemy (Core only) import asyncio import sqlalchemy as

    sa from databases import Database metadata = sa.MetaData() tbl = sa.Table(…) async def create_table(conn): await conn.execute('DROP TABLE IF EXISTS tbl') await conn.execute('CREATE TABLE tbl (id serial PRIMARY KEY, val varchar(255))') async def main(): database = Database( 'postgresql://postgres:helloworld@127.0.0.1:15432/testing' ) await database.connect() async with database.connection() as conn: async with conn.transaction(): await create_table(conn) async with database.connection() as conn: async with conn.transaction(): await conn.execute(query=tbl.insert(), values={'val': 'abc'}) for row in await conn.fetch_all(query=tbl.select()): print(row['id'], row['val']) await database.disconnect() 10
  13. § So, we could not use ORM. 😭 – Introduced

    lots of boilerplate codes to convert between Row objects and actual Python objects. § Still, SQLAlchemy offers: – Structured schema version management via alembic – Seamless custom type conversion for complex-type columns (e.g., JSON) § Binding – Tried asyncpgsa at first but later moved to aiopg... Backend.AI + SQLAlchemy 11
  14. § Unable to cancel the async task while beginning a

    transaction – The timing looks like a slit but it happens in production! – Solution: had to wrap many db transactions used in long-running websocket connection handlers with asyncio.shield()... ▹ asyncio.shield() itself is not an ideal API since it still may get cancelled upon event loop shutdown. – Patched after 1.5yr since the first report (aio-libs/aiopg#332) Technical issues with aiopg 12
  15. § No Windows support! – Backend.AI has a plan to

    support Windows setup. – AbstractEventLoop.{add_reader(), remove_reader()} are not implemented on ProactorEventLoop Technical issues with aiopg 13
  16. § Handling schema migrations – One answer for SQLAlchemy: alembic

    – Tricky case: adding/removing enum constants ▹ Need to cast enum constants to text back and forth ▹ Need to handle removal for downgrade – Easy-to-mistake case: Referring other table's schema in migrations ▹ Migrations may run in arbitrary git commits & branches! ▹ Referred table schemas must be duplicated in the migration script to reflect its definition at the time of writing the migration. Technical issues with Alembic 14
  17. Table definitions in Alembic migrations 15

  18. § Incremental updates of PostgreSQL JSONB columns – How to

    “update” an hierarchy of nested objects by merging with the existing value? – Solution: wrote a custom helper that generates SQL value expression Technical issues with SQLAlchemy def sql_json_merge( col, key: Tuple[str, ...], obj: Mapping[str, Any], *, _depth: int = 0, ): expr = sa.func.coalesce( col if _depth == 0 else col[key[:_depth]], sa.text("'{}'::jsonb"), ).concat( sa.func.jsonb_build_object( key[_depth], ( sa.func.coalesce(col[key], sa.text("'{}'::jsonb")) .concat(sa.func.cast(obj, psql.JSONB)) if _depth == len(key) - 1 else sql_json_merge(col, key, obj=obj, _depth=_depth + 1) ) ) ) return expr sa.update(tbl).values(status_data=sql_json_merge( tbl.c.status_data, ("scheduler",), {"last_try": datetime.now(tzutc()).isoformat(), "passed_predicates": ["concurrency", "keypair_resource_policy"]}, ) { "scheduler": { "first_try": "2021-05-01T02:01:01.00Z", "last_try": "2021-05-01T02:03:04.55Z", "passed_predicates": ["concurrency"], } } 16
  19. § https://www.sqlalchemy.org/blog/2021/03/15/sqlalchemy-1.4.0-released/ – “This is the most extensive release for

    SQLAlchemy in at least ten years” – Native asyncio support for both Core and ORM! – Type annotation support (sqlalchemy2-stubs) – Universal & transparent SQL compilation cache – 2.0-style transaction models (optional) – Core & ORM queries with unified query composition Landing of SQLAlchemy v1.4 17
  20. § Recommended upgrade path – https://docs.sqlalchemy.org/en/14/changelog/migration_20.html SQLAlchemy Roadmap 1.3 1.4

    2.0 ü A stepping stone for existing apps ü Ensure no deprecation warnings raised ü Subtle changes ü Changed behavior of SELECT and join() methods ü Python 3.6+ required ü Turn on "RemovedIn20Warning" ü Explicit connection context manager required ü Explicit binding for metadata required ü Add "future" flag to Engine and Session objects for new transaction behaviors ü No library-level autocommit ü Rows are now named tuples instead of dicts ü session.execute() everywhere in ORM ü ... 18
  21. SQLAlchemy v1.4 Core import asyncio import sqlalchemy as sa from

    sqlalchemy.ext.asyncio import create_async_engine metadata = sa.MetaData() tbl = sa.Table(…) async def create_table(conn): await conn.run_sync(metadata.drop_all) await conn.run_sync(metadata.create_all) async def main(): engine = create_async_engine( "postgresql+asyncpg://postgres:helloworld@127.0.0.1:15432/testing" ) async with engine.begin() as conn: await create_table(conn) async with engine.begin() as conn: await conn.execute(sa.insert(tbl).values(val='abc')) result = await conn.stream(sa.select(tbl)) async for row in result: print(row.id, row.val) await engine.dispose() 19
  22. SQLAlchemy v1.4 ORM import asyncio import sqlalchemy as sa from

    sqlalchemy.ext.asyncio import ( AsyncSession, create_async_engine ) from sqlalchemy.orm import declarative_base, sessionmaker Base = declarative_base() class A(Base): __tablename__ = "a" __mapper_args__ = {"eager_defaults": True} id = sa.Column(sa.Integer, primary_key=True) val = sa.Column(sa.String(255)) created_at = sa.Column( sa.DateTime, server_default=sa.func.now(), ) async def create_table(conn): await conn.run_sync(Base.metadata.drop_all) await conn.run_sync(Base.metadata.create_all) async def main(): engine = create_async_engine( "postgresql+asyncpg://...", ) async with engine.begin() as conn: await create_table(conn) async_session = sessionmaker( bind=engine, expire_on_commit=False, class_=AsyncSession, ) async with async_session() as session: async with session.begin(): session.add_all( [A(val="abc"), A(val="def")] ) await session.commit() async with async_session() as session: result = await session.execute(sa.select(A)) for a in result.scalars(): print(a.id, a.val, a.created_at) await engine.dispose() 20
  23. § Invisible errors – Transactions silently hangs up! ▹ It

    should have reported “operation in progress” DBAPI error – Patched through a series of v1.4 patch releases (v1.4.13+) ▹ nested transaction begin() missing Technical issues during SA 1.4 migration 21
  24. § Swallowing of task cancellation – Patched in v1.4.19 (discovered

    and not released yet at the time of writing this CFP…..) – Yes, “proper” cancellation is a difficult problem in asyncio…. § greenlet is missing! – pip install "SQLAlchemy[postgresql_asyncpg]" – Update poetry to 1.1.8 or later! – Some envs may still need to install greenlet manually (e.g., M1 Mac) ... Technical issues during SA 1.4 migration 22
  25. § Transaction isolation levels – Status transition of Backend.AI sessions1)

    involves many async tasks on multiple different nodes ▹ Should be globally synchronized to guarantee resource constraints! – READ COMMITTED vs. REPEATABLE READ vs. SERIALIZABLE Technical issues during SA 1.4 migration 1) Backend.AI session is a collection of containers running in one or more computing agent nodes in GPU clusters. 23
  26. § Transaction isolation levels – Solution: SERIALIZABLE + “proper” retrying

    of transactions + advisory lock + marking “read-only” transactions explicitly for performance Technical issues during SA 1.4 migration class ExtendedAsyncSAEngine(sqlalchemy.ext.asyncio.AsyncEngine): @aiotools.ctxmgr async def begin_readonly(self, deferrable: bool = False) -> AsyncIterator[AsyncConnection]: async with self.connect() as conn: conn_with_exec_opts = await conn.execution_options( postgresql_readonly=True, postgresql_deferrable=deferrable, ) async with conn_with_exec_opts.begin(): yield conn_with_exec_opts @aiotools.actxmgr async def advisory_lock(self, lock_id: int) -> AsyncIterator[None]: async with self.connect() as lock_conn: await lock_conn.exec_driver_sql(f"SELECT pg_advisory_lock({lock_id:d})") try: yield finally: await lock_conn.exec_driver_sql(f"SELECT pg_advisory_unlock({lock_id:d})") 24
  27. § Transaction isolation levels – Solution: SERIALIZABLE + “proper” retrying

    of transactions + advisory lock + marking “read-only” transactions explicitly for performance Technical issues during SA 1.4 migration async def enumerate_instances(self, check_shadow=True): async with self.db.begin_readonly() as conn: query = (sa.select('*').select_from(agents)) ... async for row in (await conn.stream(query)): yield row async def update_instance(self, inst_id, updated_fields): async def _update() -> None: async with self.db.begin() as conn: query = ( sa.update(agents) .values(**updated_fields) .where(agents.c.id == inst_id) ) await conn.execute(query) await execute_with_retry(_update) TQueryResult = TypeVar('TQueryResult') async def execute_with_retry( txn_func: Callable[[], Awaitable[TQueryResult]], ) -> TQueryResult: max_retries: Final = 10 num_retries = 0 while True: if num_retries == max_retries: raise RuntimeError("serialization failed") try: return await txn_func() except DBAPIError as e: num_retries += 1 if getattr(e.orig, 'pgcode', None) == '40001': await asyncio.sleep((num_retries - 1) * 0.02) continue raise 25
  28. § Planning removal of boilerplates like below used due to

    no async ORM... – Still ongoing! SA 1.4 ORM adoption @attr.s(auto_attribs=True, slots=True) class PendingSession: kernels: List[KernelInfo] access_key: AccessKey session_id: SessionId ... internal_data: Optional[MutableMapping[str, Any]] preopen_ports: List[int] @classmethod def from_row(cls, row: Row) -> PendingSession: return cls( kernels=[], access_key=row['access_key'], session_creation_id=row['session_creation_id'], session_id=row['session_id'], ... internal_data=row['internal_data'], preopen_ports=row['preopen_ports'], ) 26
  29. § Transaction isolation levels – Bigger view of the problem

    ▹ Is RDBMS suitable for strongly-consistent distributed state management? ▹ In early versions of Backend.AI, we used Redis but have migrated to PostgreSQL in favor of well-structured data schemes as more features added. – Why now (with SQLAlchemy v1.4 and not with aiopg)? ▹ Maybe due to their internal connection pooling mechanism differences... ▹ SA reuses the underlying DBAPI connection but recreates the connection wrapper (“fairy”) objects everytime Technical issues during SA 1.4 migration 27
  30. § SQLAlchemy v1.4 is out! – with asyncio-native ORM and

    Core APIs based on asyncpg § Bleeding edge is bleeding edge... – It takes time for everyone (even experience developers) to deeply get used with asyncio (+cancellation) § Strongly-consistent state management on RDBMS is difficult. – Consider non-RDBMS solutions if possible – Use a proper transaction isolation level and retry serialization failures Summary 28
  31. Bonus: What it's like replacing the DB engine of a

    live product EDS TV Commercial (2000) https://www.youtube.com/watch?v=L2zqTYgcpfg
  32. § https://github.com/achimnol/pycon-kr2021-samples § https://docs.sqlalchemy.org/en/14/ – https://docs.sqlalchemy.org/en/14/_static/dragons.png § https://aiopg.readthedocs.io/en/stable/examples.html § https://asyncpgsa.readthedocs.io/en/latest/

    References 29