Upgrade to Pro — share decks privately, control downloads, hide ads and more …

SQLAlchemy - un ami qui vous veut du bien

SQLAlchemy - un ami qui vous veut du bien

Retour d'expérience sur l'utilisation de SQLAlchemy - en particulier les points de douleur.

Avatar for Ordoquy Xavier - Linovia

Ordoquy Xavier - Linovia

November 11, 2025
Tweet

More Decks by Ordoquy Xavier - Linovia

Other Decks in Programming

Transcript

  1. Disclaimer • Je ne vais parler que de l’ORM •

    Je vais prendre des raccourcis • Points de douleurs avec SQLAlchemy (venant du monde Django) • La vrai vie est plus compliquée que les extraits de code montrés ici
  2. Qui suis-je ? • Xavier Ordoquy • Python (depuis ~2000)

    • Django (depuis ~2010) • Ancien mainteneur Django REST framework / contributeur Sentry • A mon compte (Linovia - 2004)
  3. IntegrityError en lecture Résultat Traceback (most recent call last): File

    "/pyconfr2025/step1.py", line 11, in <module> print(session.query(User).all()) ~~~~~~~~~~~~~~~~~~~~~~~^^ sqlalchemy.exc.IntegrityError: (raised as a result of Query-invoked autoflush; consider using a session.no_autoflush block if this flush is occurring prematurely) (psycopg2.errors.UniqueViolation) duplicate key value violates unique constraint "pyconfr2025_user_email_key" … user = session.query(User).filter(User.email=="[email protected]").one() user.email = "[email protected]" users = session.query(User).all() autres_modifications_user(user, users) session.flush()
  4. • Ne pas contrôler quand on veut envoyer les informations

    en base de donnée • ou pas assez fi nement (flush() marche en tout ou rien) • no_autoflush règle globalement le problème • Mais même si autoflush désactivé, certaines commandes font un flush :
 commit(), begin_nested() IntegrityError en lecture Points de douleur
  5. start = datetime.datetime.now() users = session.query(User).all() for user in users:

    print(f"User {user.email} has:") print("groups:") for group in user.groups: print(f"- {group.name}") end = datetime.datetime.now() print(f"Printed {len(users)} users in {(end - start).total_seconds()} seconds") joinedload Code naïf - posons le cadre
  6. start = datetime.datetime.now() users = session.query(User).all() for user in users:

    print(f"User {user.email} has:") print("groups:") for group in user.groups: print(f"- {group.name}") end = datetime.datetime.now() print(f"Printed {len(users)} users in {(end - start).total_seconds()} seconds") joinedload Code naïf - posons le cadre
  7. start = datetime.datetime.now() users = session.query(User).all() for user in users:

    print(f"User {user.email} has:") print("groups:") for group in user.groups: print(f"- {group.name}") end = datetime.datetime.now() print(f"Printed {len(users)} users in {(end - start).total_seconds()} seconds") joinedload Code naïf - posons le cadre
  8. start = datetime.datetime.now() users = session.query(User).all() for user in users:

    print(f"User {user.email} has:") print("groups:") for group in user.groups: print(f"- {group.name}") end = datetime.datetime.now() print(f"Printed {len(users)} users in {(end - start).total_seconds()} seconds") joinedload Code naïf - posons le cadre
  9. joinedload Qu’est-ce ? • Charger les informations des objets liées

    • Avec moins de requêtes SQL • Exemple: • A ffi cher les utilisateurs et l’entreprise dans laquelle ils travaillent
  10. joinedload Exemple simple Utilisateur 001 Entreprise A Utilisateur 002 Entreprise

    B Utilisateur 003 Entreprise C Utilisateur 004 Entreprise A Utilisateur 005 None / NULL SELECT user.*, company.* FROM user LEFT OUTER JOIN company ON user.company_id = company
  11. joinedload utilisateurs + groupes start = datetime.datetime.now() users = session.query(User).options(

    sqlalchemy.orm.joinedload(User.groups), ).all() for user in users: print(f"User {user.email} has:") print("groups:") for group in user.groups: print(f"- {group.name}") end = datetime.datetime.now() print(f"Loaded {len(users)} users in {(end - start).total_seconds()} seconds")
  12. joinedload utilisateurs + groupes + tags start = datetime.datetime.now() users

    = session.query(User).options( sqlalchemy.orm.joinedload(User.groups), sqlalchemy.orm.joinedload(User.tags), ).all() for user in users: print(f"User {user.email} has:") print("groups:") for group in user.groups: print(f"- {group.name}") print("tags:") for tag in user.tags: print(f"- {tag.name}") end = datetime.datetime.now() print(f"Loaded {len(users)} users in {(end - start).total_seconds()} seconds")
  13. joinedload Pourquoi ~5s pour 100 lignes ? User001 Tag001 Group001

    User001 Tag001 Group002 User001 Tag001 Group003 User001 Tag002 Group001 User001 Tag002 Group002 User001 Tag002 Group003
  14. joinedload Pourquoi ~5s pour 100 lignes ? User001 Tag001 Group001

    User001 Tag001 Group002 User001 Tag001 Group003 User001 Tag002 Group001 User001 Tag002 Group002 User001 Tag002 Group003 100 100 100 x x 1 000 000
  15. joinedload SQLAlchemy style start = datetime.datetime.now() users = session.query(User).options( sqlalchemy.orm.selectinload(User.groups),

    sqlalchemy.orm.selectinload(User.tags), ).all() for user in users: print(f"User {user.email} has:") print("groups:") for group in user.groups: print(f"- {group.name}") print("tags:") for tag in user.tags: print(f"- {tag.name}") end = datetime.datetime.now() print(f"Loaded {len(users)} users in {(end - start).total_seconds()} seconds")
  16. commit users = session.query(User).options( sqlalchemy.orm.selectinload(User.groups), sqlalchemy.orm.selectinload(User.tags), ).all() created_at = datetime.now()

    - timedelta(weeks=4) for user in users: user.created_at = ( created_at if len(tags)>5 else datetime.now() ) session.commit()
  17. commit users = session.query(User).options( sqlalchemy.orm.selectinload(User.groups), sqlalchemy.orm.selectinload(User.tags), ).all() created_at = datetime.now()

    - timedelta(weeks=4) for user in users: user.created_at = ( created_at if len(tags)>5 else datetime.now() ) session.commit() 3 requêtes SELECT N requêtes UPDATE
  18. commit users = session.query(User).options( sqlalchemy.orm.selectinload(User.groups), sqlalchemy.orm.selectinload(User.tags), ).all() created_at = datetime.now()

    - timedelta(weeks=4) for user in users: user.created_at = ( created_at if len(tags)>5 else datetime.now() ) session.commit() 3 requêtes SELECT N requêtes UPDATE 3xN requêtes SELECT
  19. commit Solution par lots user_ids = session.query(User.id).all() created_at = datetime.now()

    - timedelta(weeks=3) for batch_ids in batched(user_ids, 50): users = session.query(User).options( sqlalchemy.orm.selectinload(User.groups), sqlalchemy.orm.selectinload(User.tags), ).filter(User.id.in_([i for i, in batch_ids])).all() for user in users: user.created_at = created_at session.commit()
  20. commit Alternative expire_on_commit¶ – Defaults to True. When True, all

    instances will be fully expired after each commit(), so that all attribute/object access subsequent to a completed transaction will load from the most recent database state.
  21. Boucles et objets liés print(session.query(User).count()) # 100 for user in

    initial_company.users: user.company = destination_company session.flush() print(len(initial_company.users)) # 0 ? print(len(destination_company.users)) # 100 ?
  22. Boucles et objets liés Solution print(session.query(User).count()) # 100 for user

    in list(initial_enterprise.users): user.enterprise = destination_enterprise session.flush() assert len(initial_enterprise.users)) == 0 assert len(destination_enterprise.users)) == 100