Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Python & PostgreSQL - A Wonderful Wedding (English)

Python & PostgreSQL - A Wonderful Wedding (English)

Python and PostgreSQL, two tools we like to use for our projects but do you know everything about them?

The talk will give an overview of psycopg2, Peewee, SQLAlchemy, Alembic and PL/Python, these libraries can be used with PostgreSQL.

* psycopg2, the well known connector, this basic component is really useful, well documented and battle-tested and used by the most famous toolkits of the Python ecosystem.
* Peewee is a basic but awesome Object Relational Mapper.
* SQLAlchemy, a Python SQL toolkit and Object Relational Mapper, you can use this library to create your models and interact with them.
* Alembic, a lightweight database migration tool for usage with SQLAlchemy, allows to create some migration scripts for your project.
* PL/Python, a procedural language for PostgreSQL, allows to write functions in the Python language.
* MultiCorn, a Foreign Data Wrapper in Python

Stéphane Wirtel

January 31, 2015
Tweet

More Decks by Stéphane Wirtel

Other Decks in Programming

Transcript

  1. Python, what is it? Programming language Clear and easy syntax

    Interpreted Multi-plateforms Dynamic Typing Multi-paradigms Garbage Collector
  2. PostgreSQL - Story Developed at Berkeley Rewriting of Ingres 1985

    -> Postgres (come from Post-Ingres) 1995 -> Postgres95 (0.01) (Add the SQL Interpreter) 1997 -> PostgreSQL 6.0 2013 -> PostgreSQL 9.3 2014 -> PostgreSQL 9.4 (b4)
  3. PostgreSQL - TL;DR ORDBMS, ACID SQL:2011 DataTypes Transactional DDL Concurrent

    Index Extensions Common Table Expression MultiVersion Concurrency Control Cross-Platform
  4. PostgreSQL - TL;DR Replication Foreign Data Wrappers Procedural Languages Triggers

    Full text Search Views (Materialized) Table Inheritance Listen/Notify
  5. PostgreSQL, FDW ? Allow to use external resources from PostgreSQL

    Twitter, RSS, CSV, XML FileSystem Processes
  6. PostgreSQL, PL ? PL => Procedural Languages Stored Functions Safe

    (sandbox, SQL, PL/pgSQL) / Unsafe (C) PL/Python, PL/V8, PL/PERL
  7. DB-API 2.0 aka PEP-249 API for the Database Engines Connectors

    Easy to use and to understand Use 2 concepts: • Connection • Cursor http://legacy.python.org/dev/peps/pep-0249/
  8. Connection Wrap a database connection Handle the transactions and their

    internal state (commit and rollback) You cant’ execute SQL queries.
  9. Connection - Example import driver conn = driver.connect(database='database', host='localhost', port=5432)

    try: # create the cursor # use the cursor except Exception: conn.rollback() else: conn.commit() conn.close()
  10. Cursor Created from an existing connection Used for DML (Database

    Manipulation Language), INSERT, SELECT, UPDATE, DELETE cursor = conn.cursor() cursor.execute(""" SELECT column1, column2 FROM tableA """) for column1, column2 in cursor.fetchall(): print(column1, column2)
  11. Cursor - API • callproc(procname[, parameters]) • execute(query[, parameters]) •

    fetchone(), fetchmany([size=cursor.arraysize]), fetchall() • close()
  12. Cursor - execute execute(query[, parameters]) Performance and Security NEVER use

    the string interpolation (%) and the concatenation (+) => SQL Injection
  13. Cursor - execute (2) Accept string formatting for the request.

    qmark Question mark WHERE field = ? numeric Numeric positional WHERE field = :1 named Named WHERE field = :code format ANSI C print format WHERE field = %s pyformat Python format WHERE field = %(name)s
  14. “Advanced” Example ;-) import driver conn = driver.connect(database='database', host='localhost', port=5432)

    cursor = conn.cursor() try: cursor.execute("SELECT column1, column2 FROM tableA") for column1, column2 in cursor.fetchall(): print(column1, column2) except Exception: conn.rollback() else: conn.commit() finally: cursor.close() conn.close()
  15. Introduction PostgreSQL Adaptor, based on libpq (C lib for PostgreSQL)

    DB-API 2.0 Compliant Multi-thread / Thread-safe Connection Pool Full Asynchronous, Coroutines Support ALL the data types(Json, Hstore, …) Python 2.5+, 3.1+, PostgreSQL 7.4+
  16. SQL Requests cursor.execute("SELECT * FROM table") cursor.execute("INSERT INTO table (field1,

    field2) VALUES (%s, %s)", (field1, field2)) cursor.execute("DELETE FROM table") cursor.execute("UPDATE table SET field1=%s", ('value',)) cursor.execute("CREATE DATABASE database") cursor.execute("DROP DATABASE database")
  17. Example import psycopg2 conn = psycopg2.connect(host='localhost', port=5432, user='username', password='password', database='database')

    cursor = conn.cursor() try: cursor.execute("SELECT column1, column2 FROM tableA") for column1, column2 in cursor.fetchall(): print(column1, column2) except Exception: conn.rollback() else: conn.commit() finally: cursor.close() conn.close()
  18. Example - Context Manager import psycopg2 DSN = dict(host='localhost', port=5432,

    user='username', password='password', database='database') with psycopg2.connect(**DSN) as conn: with conn.cursor() as cursor: cursor.execute("SELECT column1, column2 FROM tableA") for column1, column2 in cursor.fetchall(): print(column1, column2) conn.commit()
  19. Introduction “Object-relational mapping in computer science is a programming technique

    for converting data between incompatible type systems in object- oriented programming languages.” Wikipedia
  20. In brief • Map a Business Class on a SQL

    Table! • Add methods for add, modify, delete and search
  21. Peewee ORM Simple, Easy to understand and extensible Python 2.6+

    et 3.2+ Extensions CLOC = 3200 (MySQL, Sqlite3 et PostgreSQL) Document is very good
  22. Models class BaseModel(peewee.Model): id = peewee.PrimaryKeyField() class Meta: database =

    database class Address(BaseModel): street = peewee.CharField() zipcode = peewee.CharField() country = peewee.CharField() class Contact(BaseModel): firstname = peewee.CharField(null=False) lastname = peewee.CharField(null=False) email = peewee.CharField(unique=True, null=False, index=True) address = peewee.ForeignKeyField(Address, null=False, related_name=‘contacts')
  23. Create with database.transaction(): address = Address.create( street='Avenue Franklin D. Roosevelt

    50', zipcode='1050', country='Brussels') contact = Contact.create( firstname='Stephane', lastname='Wirtel', email='[email protected]', address=address ) print(contact.id)
  24. Read contact = Contact.get(Contact.email == '[email protected]') print(contact.firstname) contacts = Contact.select()

    for contact in contacts: print(contact.firstname) for contact in contacts.where(Contact.email == '[email protected]'): print(contact.firstname)
  25. Delete # Remove an entire collection contacts = Contact.delete().where(Contact.email ==

    '[email protected]') contacts.execute() # Remove one record contact = Contact.get(Contact.email == '[email protected]') contact.delete_instance()
  26. Models from sqlalchemy import Column, Integer, String, ForeignKey from sqlalchemy.ext.declarative

    import declarative_base from sqlalchemy.orm import relationship Base = declarative_base() class Address(Base): __tablename__ = 'address' id = Column(Integer, primary_key=True) street = Column(String) zipcode = Column(String) country = Column(String, nullable=False) class Contact(Base): __tablename__ = 'contact' id = Column(Integer, primary_key=True) firstname = Column(String, nullable=False) lastname = Column(String, nullable=False) email = Column(String, nullable=False) address_id = Column(Integer, ForeignKey(Address.id), nullable=False) address = relationship('Address')
  27. Create session = Session(engine) address = Address(street='SQLAlchemy Street', zipcode='5432', country='Belgique')

    contact = Contact( firstname='Stephane', lastname='Wirtel', email='[email protected]', address=address ) session.add(contact) session.commit()
  28. Read contact = session.query(Contact).filter_by(email='[email protected]').first() print(contact.firstname) contacts = session.query(Contact).all() for contact

    in contacts: print(contact.firstname) contacts = session.query(Contact).filter_by(email='[email protected]').all() for contact in contacts: print(contact.firstname)
  29. Alembic Based on SQLAlchemy Use a Migration environment Configuration File

    INI Migration Scripts http://alembic.readthedocs.org/
  30. Alembic cd votreproject alembic init migrations > tree migrations/ !""

    README !"" alembic.ini !"" env.py !"" script.py.mako #"" versions
  31. Alembic revision = '22630db6f519' down_revision = None from alembic import

    op import sqlalchemy as sa def upgrade(): op.create_table('user', sa.Column('id', sa.Integer(), nullable=False), sa.Column('created_at', sa.DateTime(), nullable=False), sa.Column('name', sa.String(length=255), nullable=False), sa.Column('email', sa.String(length=255), nullable=True), sa.Column('password', sa.String(length=255), nullable=True), sa.Column('active', sa.Boolean(), nullable=True), sa.Column('confirmed_at', sa.DateTime(), nullable=True), sa.PrimaryKeyConstraint('id'), sa.UniqueConstraint('email') ) def downgrade(): op.drop_table('user') alembic revision -m “bootstrap" alembic upgrade head
  32. Alembic revision = '7e4b6a43e6c' down_revision = '22630db6f519' from alembic import

    op import sqlalchemy as sa def upgrade(): op.add_column('user', sa.Column('slug', sa.Unicode(length=255), nullable=False, server_default='')) def downgrade(): op.drop_column('user', 'slug') alembic revision -m “add_slug” alembic upgrade head
  33. Alembic > tree migrations/ !"" README !"" alembic.ini !"" env.py

    !"" script.py.mako #"" versions !"" 22630db6f519_bootstrap.py #"" 7e4b6a43e6c_add_slug.py
  34. Alembic from alembic import op from sqlalchemy.orm import Session from

    youproject import models def upgrade(): engine = op.get_bind() session = Session(bind=engine) for contact in session.query(models.Contact).all(): session.add( models.WifiUser( login=contact.name.lower(), password=random_string() ) ) session.commit() alembic upgrade head
  35. Alembic alembic revision alembic upgrade head alembic upgrade +2 alembic

    downgrade -1 alembic downgrade base alembic current alembic history
  36. Reminder: Foreign Data Wrappers Use an external resource from PostgreSQL

    Twitter, RSS, CSV, XML FileSystem Processes PostgreSQL, Oracle, MongoDB
  37. declarations for dynamic loading */ _MODULE_MAGIC; G_FUNCTION_INFO_V1(mongo_fdw_handler); PG_FUNCTION_INFO_V1(mongo_fdw_validator); /* *

    mongo_fdw_handler creates and returns a struct with pointers to foreign table * callback functions. */ Datum mongo_fdw_handler(PG_FUNCTION_ARGS) { FdwRoutine *fdwRoutine = makeNode(FdwRoutine); fdwRoutine->GetForeignRelSize = MongoGetForeignRelSize; fdwRoutine->GetForeignPaths = MongoGetForeignPaths; fdwRoutine->GetForeignPlan = MongoGetForeignPlan; fdwRoutine->ExplainForeignScan = MongoExplainForeignScan; fdwRoutine->BeginForeignScan = MongoBeginForeignScan; fdwRoutine->IterateForeignScan = MongoIterateForeignScan; fdwRoutine->ReScanForeignScan = MongoReScanForeignScan; fdwRoutine->EndForeignScan = MongoEndForeignScan; fdwRoutine->AnalyzeForeignTable = MongoAnalyzeForeignTable; PG_RETURN_POINTER(fdwRoutine);
  38. Multicorn • PostgreSQL Extension • Allow to write a Foreign

    Data Wrapper in Python • You can use your favourites SQL tools with FDW • Support FULL SQL ;-)
  39. Examples of Basic FDW • RSS, CSV, XML • LDAP

    • Gmail, IMAP • Google Search • SQLAlchemy (mysql, sqlite, oracle, …)
  40. Simple Example (2) CREATE SERVER wirtel_be_srv FOREIGN DATA WRAPPER multicorn

    OPTIONS ( wrapper 'multicorn.rssfdw.RssFdw' ); CREATE FOREIGN TABLE wirtel_be_rss ( "pubDate" TIMESTAMP, description CHARACTER VARYING, title CHARACTER VARYING, link CHARACTER VARYING ) SERVER wirtel_be_srv OPTIONS( url 'http://wirtel.be/feeds/python.rss.xml' );
  41. Simple Example (3) SELECT "pubDate", title, link FROM wirtel_be_rss LIMIT

    1; -[ RECORD 1 ]------------------------------------------------------------------------ pubDate | 2014-10-19 00:00:00 title | Python @ FOSDEM 2015 - Call For Proposals link | http://wirtel.be/posts/en/2014/10/19/python-fosdem-2015-call-for-proposals/
  42. with Odoo (OpenERP) from multicorn import ForeignDataWrapper import erppeek class

    OdooForeignDataWrapper(ForeignDataWrapper): def __init__(self, options, columns): super(OdooForeignDataWrapper, self).__init__(options, columns) url = 'http://{hostname}:{password}'.format(**options) self.client = erppeek.Client(url, options['database'], options['username'], options['password']) self.object_name = options['object'] def execute(self, quals, columns): proxy = self.client.model(self.object_name) item = {} for record in proxy.browse([]): for column in columns: item[column] = record[column] yield item
  43. with Odoo (OpenERP) (2) CREATE EXTENSION multicorn; CREATE SERVER multicorn_odoo

    FOREIGN DATA WRAPPER multicorn OPTIONS (wrapper 'multicorn.odoofdw.OdooForeignDataWrapper'); CREATE FOREIGN TABLE odoo_users ( login character varying, name character varying) SERVER multicorn_odoo OPTIONS ( hostname 'localhost', port '8069', database 'odoo', username 'admin', password 'admin', object 'res.users' ); SELECT id, login, name, active FROM odoo_users WHERE login = 'admin';
  44. PL/Python Python 2 or Python 3 Use all your favourite

    Python libraries (PyPI) Learning Curve very small PL/pgSQL Untrusted
  45. PL/Python CREATE OR REPLACE FUNCTION str_title(s VARCHAR) RETURNS VARCHAR AS

    $$ return s.title() $$ LANGUAGE plpythonu; demo=# select str_title('hello world'); -[ RECORD 1 ]---------- str_title | Hello World
  46. PL/Python - DataT ypes PostgreSQL Python integer, bigint int, long

    boolean bool text types str SQL Array list Custom Types dict
  47. PL/Python - Debug Built-in Functions • plpy.notice(“<msg>”) • plpy.debug(“<msg>”) •

    plpy.error(“<msg>”) • plpy.fatal(“<msg>”) Don’t use ”print”
  48. PL/Python CREATE OR REPLACE FUNCTION get_pid_cpu_mem(pid INT) RETURNS TABLE(pid INT,

    cpu_perc FLOAT, mem_perc FLOAT) AS $$ import psutil process = psutil.Process(pid) return [ { 'pid': pid, 'cpu_perc': process.get_cpu_percent(interval=0), 'mem_perc': process.get_memory_percent() } ] $$ LANGUAGE plpythonu;
  49. PL/Python -[ RECORD 1 ]----+--------------------------------- datid | 16416 datname |

    demo pid | 14680 usesysid | 16384 usename | stephane application_name | psql client_addr | client_hostname | client_port | -1 backend_start | 2014-10-25 04:22:08.235532+02 xact_start | 2014-10-25 04:25:28.712812+02 query_start | 2014-10-25 04:25:28.712812+02 state_change | 2014-10-25 04:25:28.712815+02 waiting | f state | active query | select * from pg_stat_activity ; SELECT * FROM pg_stat_activity;
  50. PL/Python pid | cpu_perc | mem_perc -------+----------+-------------- 14680 | 0

    | 1.4454081372 (1 row) SELECT * FROM get_pid_cpu_mem(14680);
  51. PL/Python pid | cpu_perc | mem_perc | application_name -------+----------+---------------+------------------ 14680

    | 0 | 1.50435626678 | psql (1 row) WITH stats AS ( SELECT psa.*, get_pid_cpu_mem(psa.pid) as attrs from pg_stat_activity psa ) SELECT (stats.attrs).pid, (stats.attrs).cpu_perc, (stats.attrs).mem_perc, stats.application_name FROM stats;
  52. PL/Python - Attention Unsafe -> pas de sandbox Difficult to

    maintain and to debug Your DBA will become Crazy ;-) No virtualenv and ask the superuser privileges.
  53. PL/Python - Pistes ;-) Treatment of big results Add hard

    constraints and business logic in the system. Add new features to PostgreSQL (see “str_title”) Triggers ? Use external libs from PyPI (requests, redis, zeromq, smtp…) Full Access to the database !!!