Party Like it's ANSI 1999: Using SQL from Python Applications

Party Like it’s ANSI 1999 1

I used to work at an Investment Bank 2

...definitely not this! 3

...more like this 4

...specifically 5

Hi I’m Huss Co-founder & CTO @ 9fin Investment Banking
Technology • Equities Trading • Bespoke Trader Desk tools • Risk and Control systems Aeronautical Engineering • Lots of aeroplane stuff 9fin.com @9finHQ 6 @HUSSTECH

American National Standards Institute We’re at SQL: 2016… but also,
I like puns This better be fun…. 7 Party Like it’s ANSI 1999

Let’s talk about Databases • Relational DBMS specifically • A
long history since the early 1970s 8 Berkley Ingress Sybase SQL Server - fork - PostgreSQL MS SQL Server MySQL mSQL 1970s 1980s 1990s PEP 248: DB 1.0 API (1996)

Software Archeology: BigCo EE • Silo groups with big databases
• Technology functions replicated for each system • System of record and final destination for all data ◦ So was treated very carefully • Specific SQL developers and DBAs (Database Administrators aka wizards) 9 Sales Finance Inventory

Software Archeology: The “fat client” • Direct connection from UI
to Database • Tightly coupled ◦ Roles/Permissions were on the Database ◦ Database down = application down • Not just a pattern in the enterprise • Web Servers were also multi-tasking ◦ Hosting, executing app code, database calls 10 Application

Software Archeology: Three Tier • Intermediate layer between UI/Client and
Datastore ◦ Server, Cache, Proxy, another client ◦ Typically endpoints on enterprise message bus or direct HTTP ◦ Decoupled the data from the application layer • Evolved further today with microservices and mesh like layers • Decoupling helped an explosion in dynamic languages with rich tooling for web development 11 Application Server

The Problem with State • Application code has been living
in source control for decades now ◦ Code is stateless ◦ Data only passes through it ◦ Arbitrary progression backwards and forwards through time ◦ Amazing tooling and support • Source control of a database still unsolved problem ◦ Database holds code as well as data ◦ Data is resident in the database ◦ Points in time are linked to a database state ◦ Some good solutions which are source control for the structure of the database exist • So it’s best to assume nothing of the database and treat it as a dumb container? 12 Application nope! Source Control Source Control

SQL and Apps Today: Object Relational Mapping • Model your
entities using an ORM library for your application • Takes care of data retrieval and persistence • Great for bootstrapping ◦ Focus on app code not storage ◦ Strong control of your business objects • Rich libraries in many languages 13 Application runtime Models ORM Backend DB Driver

SQL and Apps Today: We can do more! • It
is a high performance machine...why not use it? ◦ Make use of advanced data structures ▪ JSON, XML, Hstore, IP, Geo… ◦ Do your computation where the data is ▪ Aggregates, pivots, windowing ▪ Save bandwidth ◦ Enforce business rules ▪ Guarantee the integrity of your data ▪ Asynchronously act on events • How can we use all of this in a simple and easy way from our Python code? • How do we solve the source control problem? 14

SQL and Apps Today: JSON Example 15 { "forms": [
{ "url": "...api/v2/pokemon-form/25/", "name": "pikachu" } ], "abilities": [ { "slot": 3, "is_hidden": true, "ability": { "url": ".../api/v2/ability/31/", "name": "lightning-rod" } }, ... SELECT jsonb_array_elements( data->'abilities' ) ->'ability' ->>'name' FROM pokemon ability | --------------| lightning-rod | static |

SQLpy • Write SQL *next* to your application code •
Prepares a ready to execute functools.partial object for you • No query strings dotted all over source files • A light wrapper around your DB2.0 API • Genesis from YeSQL 16 In source control! queries.sql Application runtime Whatever you want DB Driver Source Control

That awkward moment when you need SQL 17 # set
up 2 query objects one for subquery one for window function subquery = db.session.query(func.DATE(Payment.payment_date).label('payme nt_date'), func.sum(Payment.amount).label('amount') ).group_by(func.DATE(Payment.payment_date))\ .order_by(func.DATE(Payment.payment_date))\ .subquery('s') window = db.session.query(subquery.c.payment_date, subquery.c.amount, func.sum(subquery.c.amount).over( order_by=subquery.c.payment_date) ).select_entity_from(subquery).order_by(subquery.c.payment_date) • A nested data question • 10+ (readable) lines • This is quite nice syntax actually • But the semantics are halfway between Python object orientation and SQL declarative style How to do the same in SQL?

That awkward moment when you need SQL 18 How to
do the same in SQL? SELECT payment_date, amount, sum(amount) OVER (ORDER BY payment_date) FROM ( SELECT CAST(payment_date AS DATE) AS payment_date, SUM(amount) AS amount FROM payment GROUP BY CAST(payment_date AS DATE)) p ORDER BY payment_date; from itertools import accumulate # List of dates and amount pairs payments = [(payment_date, amount)] for (pay_dt, amt), cum_pay in zip(payments, accumulate(payments)): print(f'{pay_dt} {amt} {cum_pay}') Easier in Python!

SQLpy + Models • Still like linking behaviours and actions
via objects? • Use the DAO pattern - define the models how you want ◦ Add extra methods to serialize or validate eg. Marshmallow 19 class Hello: def __init__(self, queryfunc, cur, args): self._query = queryfunc self._result = self.result(cur, args) @property def result(self): return self._result @result.setter def result(self, cur, args): self._result = self._query(cur, args) def refresh(self, cur, args): return self._query(cur, args) def jsonout(self): return json.dumps(self.result)

SQLpy + Composition • Constructing SQL queries is legitimately dangerous!
• How to do this in SQLpy, but not destroy the valid SQL syntax in queries file? • Answer: BUILT SQL ◦ Clauses added based on parameters passed to the wrapping function ◦ Transparent and lightweight • Query still passes through the DB2 .0 API library sanitiser for safe execution 20

SQLpy + Composition 21 -- name: built_sql_query$ SELECT * FROM
messages WHERE 1=1 AND id = %(id_low)s OR id = %(id_high)s AND message = %(msg)s; sql = sqlpy.Queries('queries.sql') .... args = {'id_low': 1} results = sql.BUILT_SQL_QUERY(cur, args) SELECT * FROM messages WHERE 1=1 AND id = 1; id message next_message 1 hello there 2 SQLpy NULL 3 databases rule! 4 hello friend

Beyond web apps: Python taking over! • Multitude of tools
have made Python the interface language of choice • But the glue has been clunky bash scripts, python scripts, CSV dumps and loads • Why not use use Python all the way through and really integrate your systems and pipelines with one universal language? 22 Database Object Store --> Log Streaming --> Distributed Computation

Data Science • Moving data from A to B is
a very frequent task ◦ Data cleaning and transformation takes up a much higher portion of our time than we’d like :( ◦ Any efficiency gain on the ingestion and processing pipeline is a win • ORMs sometimes do not fit very well here as there are no “entities” to model ◦ We’re experimenting: cutting, pivoting, adding and dropping attributes of interest on the fly • Use SQLpy to have your data modelling language next to your computation language 23

Data Science - SQL is coming for you • Next
generation datastores are cloud native, distributed and big in scale • They’re not traditionally relational ◦ But many are converging on SQL as their data modelling language ◦ Not full SQL standard, but most common operations present ◦ No traditional relational Database ORM support • All provide Python libraries that expose their query language interfaces ◦ kSQL (kafka) ◦ sparkSQL (Apache Spark) ◦ BigQuery (Google Cloud) ◦ Athena (AWS S3) • Let’s modify SQLpy to work with these too 24

Google BigQuery API with SQL • Since SQLpy has no
opinion on the backend datastore interface mechanism, it just expects a DB 2.0 API compliant cursor object ◦ Make an object defining execute, fetchone, fetchall etc… methods 25 https://github.com/HUSSTECH/talks-code/tree/master/party-like-its-ansi-1999

sqlpy.readthedocs.io • github.com/9fin/sqlpy (MIT) • Made for PostgreSQL first ◦
Other DBs working in theory, but not tested! • Still <v1.0 ◦ ...so critical comments and other contributions welcome v0.3.0 stable today! (develop-0-3-0 branch) 26 /9fin/sqlpy

Thank You! References and Credits [slide#] [3] image: By Own
Oil Industry News - Own Own work, Public Domain, https://commons.wikimedia.org/w/index.php?curid=8266714 [4] image: CC0 license [5] image: An unnamed colleague [8] Database timeline based on: https://hpi.de/naumann/projects/rdbms-genealogy.html [15] PokeAPI for json data: https://pokeapi.co/docsv2/ [24] ksql: https://www.confluent.io/product/ksql/ [24] SparkSQL: https://spark.apache.org/sql/ [24] BigQuery: https://cloud.google.com/bigquery/docs/reference/standard-sql/functions-and-operators [24] Athena: https://aws.amazon.com/athena/ [25] Google BigQuery sample applciation: https://cloud.google.com/bigquery/create-simple-app-api 27

Party Like it's ANSI 1999: Using SQL from Pytho...

Party Like it's ANSI 1999: Using SQL from Python Applications

HUSSTECH

More Decks by HUSSTECH

Other Decks in Programming

Featured

Transcript

Party Like it’s ANSI 1999 1

I used to work at an Investment Bank 2

...definitely not this! 3

...more like this 4

...specifically 5

Hi I’m Huss Co-founder & CTO @ 9fin Investment Banking

American National Standards Institute We’re at SQL: 2016… but also,

Let’s talk about Databases • Relational DBMS specifically • A

Software Archeology: BigCo EE • Silo groups with big databases

Software Archeology: The “fat client” • Direct connection from UI

Software Archeology: Three Tier • Intermediate layer between UI/Client and

The Problem with State • Application code has been living

SQL and Apps Today: Object Relational Mapping • Model your

SQL and Apps Today: We can do more! • It

SQL and Apps Today: JSON Example 15 { "forms": [

SQLpy • Write SQL next to your application code •

That awkward moment when you need SQL 17 # set

That awkward moment when you need SQL 18 How to

SQLpy + Models • Still like linking behaviours and actions

SQLpy + Composition • Constructing SQL queries is legitimately dangerous!

SQLpy + Composition 21 -- name: built_sql_query$ SELECT * FROM

Beyond web apps: Python taking over! • Multitude of tools

Data Science • Moving data from A to B is

Data Science - SQL is coming for you • Next

Google BigQuery API with SQL • Since SQLpy has no

sqlpy.readthedocs.io • github.com/9fin/sqlpy (MIT) • Made for PostgreSQL first ◦

Thank You! References and Credits [slide#] [3] image: By Own