Party Like it's ANSI 1999: Using SQL from Python Applications

Party Like it's ANSI 1999: Using SQL from Python Applications

SQL is back! Well it never went away. Relational database systems development is accelerating more so than ever before. To keep up and access the latest features from your application, you have to write SQL directly, bypassing ORMs. But how do you do that without filling your code with SQL strings?

4553a14dba1de0cb759092543a124e32?s=128

HUSSTECH

June 09, 2018
Tweet

Transcript

  1. Party Like it’s ANSI 1999 1

  2. I used to work at an Investment Bank 2

  3. ...definitely not this! 3

  4. ...more like this 4

  5. ...specifically 5

  6. Hi I’m Huss Co-founder & CTO @ 9fin Investment Banking

    Technology • Equities Trading • Bespoke Trader Desk tools • Risk and Control systems Aeronautical Engineering • Lots of aeroplane stuff 9fin.com @9finHQ 6 @HUSSTECH
  7. American National Standards Institute We’re at SQL: 2016… but also,

    I like puns This better be fun…. 7 Party Like it’s ANSI 1999
  8. Let’s talk about Databases • Relational DBMS specifically • A

    long history since the early 1970s 8 Berkley Ingress Sybase SQL Server - fork - PostgreSQL MS SQL Server MySQL mSQL 1970s 1980s 1990s PEP 248: DB 1.0 API (1996)
  9. Software Archeology: BigCo EE • Silo groups with big databases

    • Technology functions replicated for each system • System of record and final destination for all data ◦ So was treated very carefully • Specific SQL developers and DBAs (Database Administrators aka wizards) 9 Sales Finance Inventory
  10. Software Archeology: The “fat client” • Direct connection from UI

    to Database • Tightly coupled ◦ Roles/Permissions were on the Database ◦ Database down = application down • Not just a pattern in the enterprise • Web Servers were also multi-tasking ◦ Hosting, executing app code, database calls 10 Application
  11. Software Archeology: Three Tier • Intermediate layer between UI/Client and

    Datastore ◦ Server, Cache, Proxy, another client ◦ Typically endpoints on enterprise message bus or direct HTTP ◦ Decoupled the data from the application layer • Evolved further today with microservices and mesh like layers • Decoupling helped an explosion in dynamic languages with rich tooling for web development 11 Application Server
  12. The Problem with State • Application code has been living

    in source control for decades now ◦ Code is stateless ◦ Data only passes through it ◦ Arbitrary progression backwards and forwards through time ◦ Amazing tooling and support • Source control of a database still unsolved problem ◦ Database holds code as well as data ◦ Data is resident in the database ◦ Points in time are linked to a database state ◦ Some good solutions which are source control for the structure of the database exist • So it’s best to assume nothing of the database and treat it as a dumb container? 12 Application nope! Source Control Source Control
  13. SQL and Apps Today: Object Relational Mapping • Model your

    entities using an ORM library for your application • Takes care of data retrieval and persistence • Great for bootstrapping ◦ Focus on app code not storage ◦ Strong control of your business objects • Rich libraries in many languages 13 Application runtime Models ORM Backend DB Driver
  14. SQL and Apps Today: We can do more! • It

    is a high performance machine...why not use it? ◦ Make use of advanced data structures ▪ JSON, XML, Hstore, IP, Geo… ◦ Do your computation where the data is ▪ Aggregates, pivots, windowing ▪ Save bandwidth ◦ Enforce business rules ▪ Guarantee the integrity of your data ▪ Asynchronously act on events • How can we use all of this in a simple and easy way from our Python code? • How do we solve the source control problem? 14
  15. SQL and Apps Today: JSON Example 15 { "forms": [

    { "url": "...api/v2/pokemon-form/25/", "name": "pikachu" } ], "abilities": [ { "slot": 3, "is_hidden": true, "ability": { "url": ".../api/v2/ability/31/", "name": "lightning-rod" } }, ... SELECT jsonb_array_elements( data->'abilities' ) ->'ability' ->>'name' FROM pokemon ability | --------------| lightning-rod | static |
  16. SQLpy • Write SQL *next* to your application code •

    Prepares a ready to execute functools.partial object for you • No query strings dotted all over source files • A light wrapper around your DB2.0 API • Genesis from YeSQL 16 In source control! queries.sql Application runtime Whatever you want DB Driver Source Control
  17. That awkward moment when you need SQL 17 # set

    up 2 query objects one for subquery one for window function subquery = db.session.query(func.DATE(Payment.payment_date).label('payme nt_date'), func.sum(Payment.amount).label('amount') ).group_by(func.DATE(Payment.payment_date))\ .order_by(func.DATE(Payment.payment_date))\ .subquery('s') window = db.session.query(subquery.c.payment_date, subquery.c.amount, func.sum(subquery.c.amount).over( order_by=subquery.c.payment_date) ).select_entity_from(subquery).order_by(subquery.c.payment_date) • A nested data question • 10+ (readable) lines • This is quite nice syntax actually • But the semantics are halfway between Python object orientation and SQL declarative style How to do the same in SQL?
  18. That awkward moment when you need SQL 18 How to

    do the same in SQL? SELECT payment_date, amount, sum(amount) OVER (ORDER BY payment_date) FROM ( SELECT CAST(payment_date AS DATE) AS payment_date, SUM(amount) AS amount FROM payment GROUP BY CAST(payment_date AS DATE)) p ORDER BY payment_date; from itertools import accumulate # List of dates and amount pairs payments = [(payment_date, amount)] for (pay_dt, amt), cum_pay in zip(payments, accumulate(payments)): print(f'{pay_dt} {amt} {cum_pay}') Easier in Python!
  19. SQLpy + Models • Still like linking behaviours and actions

    via objects? • Use the DAO pattern - define the models how you want ◦ Add extra methods to serialize or validate eg. Marshmallow 19 class Hello: def __init__(self, queryfunc, cur, args): self._query = queryfunc self._result = self.result(cur, args) @property def result(self): return self._result @result.setter def result(self, cur, args): self._result = self._query(cur, args) def refresh(self, cur, args): return self._query(cur, args) def jsonout(self): return json.dumps(self.result)
  20. SQLpy + Composition • Constructing SQL queries is legitimately dangerous!

    • How to do this in SQLpy, but not destroy the valid SQL syntax in queries file? • Answer: BUILT SQL ◦ Clauses added based on parameters passed to the wrapping function ◦ Transparent and lightweight • Query still passes through the DB2 .0 API library sanitiser for safe execution 20
  21. SQLpy + Composition 21 -- name: built_sql_query$ SELECT * FROM

    messages WHERE 1=1 AND id = %(id_low)s OR id = %(id_high)s AND message = %(msg)s; sql = sqlpy.Queries('queries.sql') .... args = {'id_low': 1} results = sql.BUILT_SQL_QUERY(cur, args) SELECT * FROM messages WHERE 1=1 AND id = 1; id message next_message 1 hello there 2 SQLpy NULL 3 databases rule! 4 hello friend
  22. Beyond web apps: Python taking over! • Multitude of tools

    have made Python the interface language of choice • But the glue has been clunky bash scripts, python scripts, CSV dumps and loads • Why not use use Python all the way through and really integrate your systems and pipelines with one universal language? 22 Database Object Store --> Log Streaming --> Distributed Computation
  23. Data Science • Moving data from A to B is

    a very frequent task ◦ Data cleaning and transformation takes up a much higher portion of our time than we’d like :( ◦ Any efficiency gain on the ingestion and processing pipeline is a win • ORMs sometimes do not fit very well here as there are no “entities” to model ◦ We’re experimenting: cutting, pivoting, adding and dropping attributes of interest on the fly • Use SQLpy to have your data modelling language next to your computation language 23
  24. Data Science - SQL is coming for you • Next

    generation datastores are cloud native, distributed and big in scale • They’re not traditionally relational ◦ But many are converging on SQL as their data modelling language ◦ Not full SQL standard, but most common operations present ◦ No traditional relational Database ORM support • All provide Python libraries that expose their query language interfaces ◦ kSQL (kafka) ◦ sparkSQL (Apache Spark) ◦ BigQuery (Google Cloud) ◦ Athena (AWS S3) • Let’s modify SQLpy to work with these too 24
  25. Google BigQuery API with SQL • Since SQLpy has no

    opinion on the backend datastore interface mechanism, it just expects a DB 2.0 API compliant cursor object ◦ Make an object defining execute, fetchone, fetchall etc… methods 25 https://github.com/HUSSTECH/talks-code/tree/master/party-like-its-ansi-1999
  26. sqlpy.readthedocs.io • github.com/9fin/sqlpy (MIT) • Made for PostgreSQL first ◦

    Other DBs working in theory, but not tested! • Still <v1.0 ◦ ...so critical comments and other contributions welcome v0.3.0 stable today! (develop-0-3-0 branch) 26 /9fin/sqlpy
  27. Thank You! References and Credits [slide#] [3] image: By Own

    Oil Industry News - Own Own work, Public Domain, https://commons.wikimedia.org/w/index.php?curid=8266714 [4] image: CC0 license [5] image: An unnamed colleague [8] Database timeline based on: https://hpi.de/naumann/projects/rdbms-genealogy.html [15] PokeAPI for json data: https://pokeapi.co/docsv2/ [24] ksql: https://www.confluent.io/product/ksql/ [24] SparkSQL: https://spark.apache.org/sql/ [24] BigQuery: https://cloud.google.com/bigquery/docs/reference/standard-sql/functions-and-operators [24] Athena: https://aws.amazon.com/athena/ [25] Google BigQuery sample applciation: https://cloud.google.com/bigquery/create-simple-app-api 27