Software Archeology ● BigCo EE: Silo groups with big databases ● “Fat Client” architecture ● 3 tier relatively recent ○ Typically endpoints on enterprise message bus ○ SOAP! 8
BigCo Org Structure ● Hierarchical ● Long communication and decision chains ● “shared services” idea emerges 2008+ ● DBAs and Sysadmins moved here ○ Away from business aligned teams ● Now hard to diagnose issues quickly ○ Everyone is not sitting together anymore 9
WebCo 2.0 ● Flatter structure - mixed teams, practitioners of many technologies ● Databases are out-of-the-box “good enough” ○ Heroku, RDS, et al ● The DBA function merged into Sr. Devs usually ● Explosion in dynamic languages with rich tooling for web development 10
Object Relational Mapping ● Model your entities using an ORM library for your application ● Takes care of data retrieval and persistence ● Great for bootstrapping ○ Focus on app code not storage ○ Strong control of your business objects ● Rich libraries in many languages 11
That awkward moment when you need SQL ● At some point you’re going to want to take control of the query generation ○ Performance tune ○ Force an index use ○ Use advanced DB features ■ WITH RECURSIVE, Subqueries/CTEs, PARTITION OVER, JSON ● Raw SQL support in ORMs varies Example: Cumulative total of revenue made at a fictional DVD rental business? 12 from flask import Flask from flask_sqlalchemy import SQLAlchemy from sqlalchemy.sql import func # setup SQLAlchemy and model object app = Flask(__name__) app.config['SQLALCHEMY_DATABASE_URI'] = 'postgresql://xxx' db = SQLAlchemy(app) class Payment(db.Model): __tablename__ = 'payment' payment_id = db.Column(db.Integer, nullable=False, primary_key=True) rental_id = db.Column(db.SmallInteger, nullable=False) amount = db.Column(db.Numeric(scale=(4, 2), asdecimal=True), nullable=False) payment_date = db.Column(db.DateTime, nullable=False)
That awkward moment when you need SQL 13 # set up two query objects one for a subquery one for the window function subquery = db.session.query(func.DATE(Payment.payment_date).label('payment_date'), func.sum(Payment.amount).label('amount') ).group_by(func.DATE(Payment.payment_date))\ .order_by(func.DATE(Payment.payment_date))\ .subquery('s') window = db.session.query(subquery.c.payment_date, subquery.c.amount, func.sum(subquery.c.amount).over( order_by=subquery.c.payment_date) ).select_entity_from(subquery).order_by(subquery.c.payment_date) ● 10+ (readable) lines ● This is quite nice syntax actually ● But the semantics are halfway between Python object orientation and SQL declarative style How to do the same in SQL? SELECT payment_date, amount, sum(amount) OVER (ORDER BY payment_date) FROM ( SELECT CAST(payment_date AS DATE) AS payment_date, SUM(amount) AS amount FROM payment GROUP BY CAST(payment_date AS DATE)) p ORDER BY payment_date;
SQLpy - a (not) new idea ● Write SQL *next* to your application code ● No query strings dotted all over source files ● A light wrapper around your DB2.0 API ● Bases on YeSQL by Kris Jenkins 14
SQLpy + Functions ● Prefer your app only make calls to stored procedures? ● At least the list of sprocs used is all in one place 16 -- name: cool_proc SELECT public.cool_proc(); -- name: cooler_proc SELECT public.cooler_proc(%s, %s); sql = sqlpy.Queries('queries.sql') .... resultsA = sql.COOL_PROC(cur, 0) resultsB = sql.COOLER_PROC(cur, 0, args)
SQLpy + Composition ● Constructing SQL queries is legitimately dangerous! ● How to do this in SQLpy, but not destroy the valid SQL syntax in queries file? ● BUILT SQL ○ Clauses added based on parameters passed to wrapping function ○ Transparent and lightweight ● Query still passes through the DB2.0 API sanitiser for safe execution 17
SQLpy + Composition ● Constructing SQL queries is legitimately dangerous! ● How to do this in SQLpy, but not destroy the valid SQL syntax in queries file? ● BUILT SQL ○ Clauses added based on parameters passed to wrapping function ○ Transparent and lightweight ● Query still passes through the DB2.0 API sanitiser for safe execution 18 -- name: built_sql_statement$ -- a built up sql statement SELECT * FROM hello WHERE 1=1 AND id = %(id_low)s OR id = %(id_high)s AND message = %(msg)s; sql = sqlpy.Queries('queries.sql') .... kwargs = {'id_low': 1} results = sql.BUILT_SQL_STATEMENT(cur, 0, **kwargs) SELECT * FROM hello WHERE 1=1 AND id = 1;
SQLpy + Your SQL ● Essentially this is the one central idea behind SQLpy ● Let’s play with PostgreSQL ○ Correlate pg_stat_statements with the queries in SQLpy ○ Because you know what was logged (pg_stat_statements) and what was run (sqlpy) ○ See how your database is actually being used ○ Critical data for indexing and optimization decisions 19 SQLpy pg_stat_statements REGEX All the query context!
sqlpy.readthedocs.io ● github.com/9fin/sqlpy (MIT) ● Made for PostgreSQL first ○ Other DBs working in theory, but not tested! ● Still ○ ...so critical comments and other contributions welcome ● Check out other ports for YeSQL ○ Js, Go, C#, Ruby, Erlang, Clojure, PHP ○ github.com/krisajenkins/yesql/#othe r-languages 20
Thank You! References and Credits [slide#] [3] image: By Raiffeisenverband Salzburg reg. Gen. m. b. H., Schwarzstr. 13-15, 5024 Salzburg - Raiffeisenverband Salzburg reg. Gen. m. b. H., Schwarzstr. 13-15, 5024 Salzburg (transferred from de:Image:RVS Handelsraum.jpg), CC BY 2.0 at, https://commons.wikimedia.org/w/index.php?curid=5009720 [4] image: CC0 license [5] image: An unnamed colleague 21