Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Python & PostgreSQL - A Wonderful Wedding

Python & PostgreSQL - A Wonderful Wedding

Stéphane Wirtel

October 25, 2014
Tweet

More Decks by Stéphane Wirtel

Other Decks in Programming

Transcript

  1. Python, c’est quoi ? Langage de Programmation Syntaxe simple &

    lisible Interprété Multi-plateformes Typage Dynamique Fort Multi-paradigmes Garbage Collector
  2. PostgreSQL - Story Développée à Berkeley Réécriture de Ingres 1985

    -> Postgres (vient de Post-Ingres) 1995 -> Postgres95 (0.01) (Ajout d’un interpréter SQL) 1997 -> PostgreSQL 6.0 2013 -> PostgreSQL 9.3 2014 -> PostgreSQL 9.4 (b4)
  3. PostgreSQL - TL;DR ORDBMS, ACID SQL:2011 DataTypes Transactional DDL Concurrent

    Index Extensions Common Table Expression MultiVersion Concurrency Control Cross-Platform
  4. PostgreSQL - TL;DR Replication Foreign Data Wrappers Procedural Languages Triggers

    Full text Search Views (Materialized) Table Inheritance Listen/Notify
  5. PostgreSQL, PL ? Étend la base de données Fonctions stockées

    Safe (sandbox, SQL, PL/pgSQL) / Unsafe (C) PL/Python, PL/V8, PL/PERL
  6. DB-API 2.0 aka PEP-249 API pour les connecteurs de Base

    de Données Facile à utiliser, et à comprendre Utilise deux concepts: • Connection • Curseur http://legacy.python.org/dev/peps/pep-0249/
  7. Connection Emballe une connection vers la base de données Gère

    les transactions et leur état (Commit/ Rollback) N’exécute pas de requêtes SQL.
  8. Connection - Exemple import driver conn = driver.connect(database='database', host='localhost', port=5432)

    try: # cree le curseur # utilise le curseur except Exception: conn.rollback() else: conn.commit() conn.close()
  9. Curseur Créé via une instance d’une connection Utilisé pour la

    manipulation et interrogation de la Base de Données cursor = conn.cursor() cursor.execute(""" SELECT column1, column2 FROM tableA """) for column1, column2 in cursor.fetchall(): print(column1, column2)
  10. Curseur - API • callproc(procname[, parameters]) • execute(query[, parameters]) •

    fetchone(), fetchmany([size=cursor.arraysize]), fetchall() • close()
  11. Curseur - execute execute(query[, parameters]) Performance et Sécurité JAMAIS utiliser

    l’interpolation (%) et la concaténation (+) => SQL Injection
  12. Curseur - execute (2) Accepte un formatage de la requête.

    qmark Question mark WHERE field = ? numeric Numeric positional WHERE field = :1 named Named WHERE field = :code format ANSI C print format WHERE field = %s pyformat Python format WHERE field = %(name)s
  13. Exemple plus complet ;-) import driver conn = driver.connect(database='database', host='localhost',

    port=5432) cursor = conn.cursor() try: cursor.execute("SELECT column1, column2 FROM tableA") for column1, column2 in cursor.fetchall(): print(column1, column2) except Exception: conn.rollback() else: conn.commit() finally: cursor.close() conn.close()
  14. Introduction PostgreSQL Adaptor, basé sur libpq DB-API 2.0 Compliant Multi-thread

    Pool de Connections Full Asynchronous, Coroutines Supporte TOUS les types de données de PostgreSQL (Json, Hstore, …) Python 2.5+, 3.1+, PostgreSQL 7.4+
  15. Curseurs cursor = conn.cursor() cursor = conn.cursor(name=“pycon_cursor”) • Curseur coté

    PostgreSQL • Interrogation, manipulation des données de la base. • Nommé => Utilisé pour les gros ensembles de données
  16. Requêtes SQL cursor.execute("SELECT * FROM table") cursor.execute("INSERT INTO table (field1,

    field2) VALUES (%s, %s)", (field1, field2)) cursor.execute("DELETE FROM table") cursor.execute("UPDATE table SET field1=%s", ('value',)) cursor.execute("CREATE DATABASE database") cursor.execute("DROP DATABASE database")
  17. Exemple import psycopg2 conn = psycopg2.connect(host='localhost', port=5432, user='username', password='password', database='database')

    cursor = conn.cursor() try: cursor.execute("SELECT column1, column2 FROM tableA") for column1, column2 in cursor.fetchall(): print(column1, column2) except Exception: conn.rollback() else: conn.commit() finally: cursor.close() conn.close()
  18. Exemple - Context Manager import psycopg2 DSN = dict(host='localhost', port=5432,

    user='username', password='password', database='database') with psycopg2.connect(**DSN) as conn: with conn.cursor() as cursor: cursor.execute("SELECT column1, column2 FROM tableA") for column1, column2 in cursor.fetchall(): print(column1, column2) conn.commit()
  19. Introduction “Technique de Programmation créant l’illusion d’une base de données

    orientée objet à partir d’une base de données relationnelle en définissant des correspondances entre cette base de données et les objets du langage utilisé.” Wikipedia
  20. En bref • Mapper une classe Business sur une table

    ! • Méthodes pour ajouter, modifier, supprimer et rechercher
  21. Peewee ORM Simple, facile à comprendre et extensible Python 2.6+

    et 3.2+ Extensions 3200 lignes de code (MySQL, Sqlite3 et PostgreSQL) Très bonne documentation
  22. Modèles class BaseModel(peewee.Model): id = peewee.PrimaryKeyField() class Meta: database =

    database class Address(BaseModel): street = peewee.CharField() zipcode = peewee.CharField() country = peewee.CharField() class Contact(BaseModel): firstname = peewee.CharField(null=False) lastname = peewee.CharField(null=False) email = peewee.CharField(unique=True, null=False, index=True) address = peewee.ForeignKeyField(Address, null=False, related_name=‘contacts')
  23. Create with database.transaction(): address = Address.create( street='Rue de Lyon', zipcode='90001',

    country='France') contact = Contact.create( firstname='Stephane', lastname='Wirtel', email='[email protected]', address=address ) print(contact.id)
  24. Read contact = Contact.get(Contact.email == '[email protected]') print(contact.firstname) contacts = Contact.select()

    for contact in contacts: print(contact.firstname) for contact in contacts.where(Contact.email == '[email protected]'): print(contact.firstname)
  25. Delete # Suppression pour une collection contacts = Contact.delete().where(Contact.email ==

    '[email protected]') contacts.execute() # Suppression d'un seul enregistrement contact = Contact.get(Contact.email == '[email protected]') contact.delete_instance()
  26. Modèles from sqlalchemy import Column, Integer, String, ForeignKey from sqlalchemy.ext.declarative

    import declarative_base from sqlalchemy.orm import relationship Base = declarative_base() class Address(Base): __tablename__ = 'address' id = Column(Integer, primary_key=True) street = Column(String) zipcode = Column(String) country = Column(String, nullable=False) class Contact(Base): __tablename__ = 'contact' id = Column(Integer, primary_key=True) firstname = Column(String, nullable=False) lastname = Column(String, nullable=False) email = Column(String, nullable=False) address_id = Column(Integer, ForeignKey(Address.id), nullable=False) address = relationship('Address')
  27. Create session = Session(engine) address = Address(street='SQLAlchemy Street', zipcode='5432', country='Belgique')

    contact = Contact( firstname='Stephane', lastname='Wirtel', email='[email protected]', address=address ) session.add(contact) session.commit()
  28. Read contact = session.query(Contact).filter_by(email='[email protected]').first() print(contact.firstname) contacts = session.query(Contact).all() for contact

    in contacts: print(contact.firstname) contacts = session.query(Contact).filter_by(email='[email protected]').all() for contact in contacts: print(contact.firstname)
  29. Alembic Basé sur SQLAlchemy Utilise un environnement de Migration Fichier

    de configuration INI Scripts de migration http://alembic.readthedocs.org/
  30. Alembic cd votreproject alembic init migrations > tree migrations/ !""

    README !"" alembic.ini !"" env.py !"" script.py.mako #"" versions
  31. Alembic revision = '22630db6f519' down_revision = None from alembic import

    op import sqlalchemy as sa def upgrade(): op.create_table('user', sa.Column('id', sa.Integer(), nullable=False), sa.Column('created_at', sa.DateTime(), nullable=False), sa.Column('name', sa.String(length=255), nullable=False), sa.Column('email', sa.String(length=255), nullable=True), sa.Column('password', sa.String(length=255), nullable=True), sa.Column('active', sa.Boolean(), nullable=True), sa.Column('confirmed_at', sa.DateTime(), nullable=True), sa.PrimaryKeyConstraint('id'), sa.UniqueConstraint('email') ) def downgrade(): op.drop_table('user') alembic revision -m “bootstrap" alembic upgrade head
  32. Alembic revision = '7e4b6a43e6c' down_revision = '22630db6f519' from alembic import

    op import sqlalchemy as sa def upgrade(): op.add_column('user', sa.Column('slug', sa.Unicode(length=255), nullable=False, server_default='')) def downgrade(): op.drop_column('user', 'slug') alembic revision -m “add_slug” alembic upgrade head
  33. Alembic > tree migrations/ !"" README !"" alembic.ini !"" env.py

    !"" script.py.mako #"" versions !"" 22630db6f519_bootstrap.py #"" 7e4b6a43e6c_add_slug.py
  34. Alembic from alembic import op from sqlalchemy.orm import Session from

    youproject import models def upgrade(): engine = op.get_bind() session = Session(bind=engine) for contact in session.query(models.Contact).all(): session.add( models.WifiUser( login=contact.name.lower(), password=random_string() ) ) session.commit() alembic upgrade head
  35. Alembic alembic revision alembic upgrade head alembic upgrade +2 alembic

    downgrade -1 alembic downgrade base alembic current alembic history
  36. Rappel: Foreign Data Wrappers Utiliser une source externe à PostgreSQL

    Twitter, RSS, CSV, XML FileSystem Processes PostgreSQL, Oracle, MongoDB
  37. declarations for dynamic loading */ _MODULE_MAGIC; G_FUNCTION_INFO_V1(mongo_fdw_handler); PG_FUNCTION_INFO_V1(mongo_fdw_validator); /* *

    mongo_fdw_handler creates and returns a struct with pointers to foreign table * callback functions. */ Datum mongo_fdw_handler(PG_FUNCTION_ARGS) { FdwRoutine *fdwRoutine = makeNode(FdwRoutine); fdwRoutine->GetForeignRelSize = MongoGetForeignRelSize; fdwRoutine->GetForeignPaths = MongoGetForeignPaths; fdwRoutine->GetForeignPlan = MongoGetForeignPlan; fdwRoutine->ExplainForeignScan = MongoExplainForeignScan; fdwRoutine->BeginForeignScan = MongoBeginForeignScan; fdwRoutine->IterateForeignScan = MongoIterateForeignScan; fdwRoutine->ReScanForeignScan = MongoReScanForeignScan; fdwRoutine->EndForeignScan = MongoEndForeignScan; fdwRoutine->AnalyzeForeignTable = MongoAnalyzeForeignTable; PG_RETURN_POINTER(fdwRoutine);
  38. Multicorn • Extension PostgreSQL • Permet d’écrire un Foreign Data

    Wrapper • Wrapper Python de l’API C de PostgreSQL • Tous les outils utilisant SQL peuvent utiliser un FDW • Support FULL SQL ;-)
  39. FDW de base • RSS, CSV, XML • LDAP •

    Gmail, IMAP • Google Search • SQLAlchemy (mysql, sqlite, oracle, …)
  40. Simple exemple (2) CREATE SERVER wirtel_be_srv FOREIGN DATA WRAPPER multicorn

    OPTIONS ( wrapper 'multicorn.rssfdw.RssFdw' ); CREATE FOREIGN TABLE wirtel_be_rss ( "pubDate" TIMESTAMP, description CHARACTER VARYING, title CHARACTER VARYING, link CHARACTER VARYING ) SERVER wirtel_be_srv OPTIONS( url 'http://wirtel.be/feeds/python.rss.xml' );
  41. Simple exemple (3) SELECT "pubDate", title, link FROM wirtel_be_rss LIMIT

    1; -[ RECORD 1 ]------------------------------------------------------------------------ pubDate | 2014-10-19 00:00:00 title | Python @ FOSDEM 2015 - Call For Proposals link | http://wirtel.be/posts/en/2014/10/19/python-fosdem-2015-call-for-proposals/
  42. avec OpenERP from multicorn import ForeignDataWrapper import erppeek class OpenERPForeignDataWrapper(ForeignDataWrapper):

    def __init__(self, options, columns): super(OpenERPForeignDataWrapper, self).__init__(options, columns) url = 'http://{hostname}:{password}'.format(**options) self.client = erppeek.Client(url, options['database'], options['username'], options['password']) self.object_name = options['object'] def execute(self, quals, columns): proxy = self.client.model(self.object_name) item = {} for record in proxy.browse([]): for column in columns: item[column] = record[column] yield item
  43. avec OpenERP (2) CREATE EXTENSION multicorn; CREATE SERVER multicorn_openerp FOREIGN

    DATA WRAPPER multicorn OPTIONS (wrapper 'multicorn.openerpfdw.OpenERPForeignDataWrapper'); CREATE FOREIGN TABLE oe_users ( login character varying, name character varying) SERVER multicorn_openerp OPTIONS ( hostname 'localhost', port '8069', database 'openerp', username 'admin', password 'admin', object 'res.users' ); SELECT id, login, name, active FROM oe_users WHERE login = 'admin';
  44. PL/Python Python 2 ou Python 3 Utilise toutes les librairies

    Python (PyPI) Apprentissage plus rapide que PL/pgSQL
  45. PL/Python CREATE OR REPLACE FUNCTION str_title(s VARCHAR) RETURNS VARCHAR AS

    $$ return s.title() $$ LANGUAGE plpythonu; demo=# select str_title('hello world'); -[ RECORD 1 ]---------- str_title | Hello World
  46. PL/Python - DataT ypes PostgreSQL Python integer, bigint int, long

    boolean bool text types str SQL Array list Custom Types dict
  47. PL/Python - Debug Fonctions de base pour afficher les messages

    • plpy.notice(“<msg>”) • plpy.debug(“<msg>”) • plpy.error(“<msg>”) • plpy.fatal(“<msg>”) Oublier le ”print”
  48. PL/Python CREATE OR REPLACE FUNCTION get_pid_cpu_mem(pid INT) RETURNS TABLE(pid INT,

    cpu_perc FLOAT, mem_perc FLOAT) AS $$ import psutil process = psutil.Process(pid) return [ { 'pid': pid, 'cpu_perc': process.get_cpu_percent(interval=0), 'mem_perc': process.get_memory_percent() } ] $$ LANGUAGE plpythonu;
  49. PL/Python -[ RECORD 1 ]----+--------------------------------- datid | 16416 datname |

    demo pid | 14680 usesysid | 16384 usename | stephane application_name | psql client_addr | client_hostname | client_port | -1 backend_start | 2014-10-25 04:22:08.235532+02 xact_start | 2014-10-25 04:25:28.712812+02 query_start | 2014-10-25 04:25:28.712812+02 state_change | 2014-10-25 04:25:28.712815+02 waiting | f state | active query | select * from pg_stat_activity ; SELECT * FROM pg_stat_activity;
  50. PL/Python pid | cpu_perc | mem_perc -------+----------+-------------- 14680 | 0

    | 1.4454081372 (1 row) SELECT * FROM get_pid_cpu_mem(14680);
  51. PL/Python pid | cpu_perc | mem_perc | application_name -------+----------+---------------+------------------ 14680

    | 0 | 1.50435626678 | psql (1 row) WITH stats AS ( SELECT psa.*, get_pid_cpu_mem(psa.pid) as attrs from pg_stat_activity psa ) SELECT (stats.attrs).pid, (stats.attrs).cpu_perc, (stats.attrs).mem_perc, stats.application_name FROM stats;
  52. PL/Python - Attention Unsafe -> pas de sandbox Difficile à

    maintenir et à debugger Comment rendre fou votre DBA ;-) Pas de virtualenv et demande les privilèges superuser.
  53. PL/Python - Pistes ;-) Traitement sur grosses masses de données

    Ajouter des contraintes fortes et logique dans le système. Ajouter de nouvelles fonctionnalités dans PostgreSQL “str_title” Triggers ? Utilisation des libs de PyPI (requests, redis, zmq, smtp…) Accès complet à la base de données !!!