Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Party Like it's ANSI 1999: Using SQL from Python Applications

Party Like it's ANSI 1999: Using SQL from Python Applications

SQL is back! Well it never went away. Relational database systems development is accelerating more so than ever before. To keep up and access the latest features from your application, you have to write SQL directly, bypassing ORMs. But how do you do that without filling your code with SQL strings?

HUSSTECH

June 09, 2018
Tweet

More Decks by HUSSTECH

Other Decks in Programming

Transcript

  1. Party Like it’s ANSI 1999
    1

    View full-size slide

  2. I used to work at an Investment Bank
    2

    View full-size slide

  3. ...definitely not this!
    3

    View full-size slide

  4. ...more like this
    4

    View full-size slide

  5. ...specifically
    5

    View full-size slide

  6. Hi I’m Huss
    Co-founder & CTO @ 9fin
    Investment Banking Technology
    ● Equities Trading
    ● Bespoke Trader Desk tools
    ● Risk and Control systems
    Aeronautical Engineering
    ● Lots of aeroplane stuff
    9fin.com
    @9finHQ
    6
    @HUSSTECH

    View full-size slide

  7. American National Standards Institute
    We’re at SQL: 2016…
    but also, I like puns
    This better be fun….
    7
    Party Like it’s ANSI 1999

    View full-size slide

  8. Let’s talk about Databases
    ● Relational DBMS specifically
    ● A long history since the early 1970s
    8
    Berkley Ingress
    Sybase SQL
    Server
    - fork - PostgreSQL
    MS SQL Server
    MySQL
    mSQL
    1970s 1980s 1990s
    PEP 248: DB
    1.0 API (1996)

    View full-size slide

  9. Software Archeology: BigCo EE
    ● Silo groups with big databases
    ● Technology functions replicated for each system
    ● System of record and final destination for all data
    ○ So was treated very carefully
    ● Specific SQL developers and DBAs (Database
    Administrators aka wizards)
    9
    Sales Finance Inventory

    View full-size slide

  10. Software Archeology: The “fat client”
    ● Direct connection from UI to Database
    ● Tightly coupled
    ○ Roles/Permissions were on the Database
    ○ Database down = application down
    ● Not just a pattern in the enterprise
    ● Web Servers were also multi-tasking
    ○ Hosting, executing app code, database calls
    10
    Application

    View full-size slide

  11. Software Archeology: Three Tier
    ● Intermediate layer between UI/Client and Datastore
    ○ Server, Cache, Proxy, another client
    ○ Typically endpoints on enterprise message bus
    or direct HTTP
    ○ Decoupled the data from the application layer
    ● Evolved further today with microservices and mesh
    like layers
    ● Decoupling helped an explosion in dynamic languages
    with rich tooling for web development
    11
    Application
    Server

    View full-size slide

  12. The Problem with State
    ● Application code has been living in source control for decades now
    ○ Code is stateless
    ○ Data only passes through it
    ○ Arbitrary progression backwards and forwards through time
    ○ Amazing tooling and support
    ● Source control of a database still unsolved problem
    ○ Database holds code as well as data
    ○ Data is resident in the database
    ○ Points in time are linked to a database state
    ○ Some good solutions which are source control for the
    structure of the database exist
    ● So it’s best to assume nothing of the database and treat it as a
    dumb container? 12
    Application
    nope!
    Source Control
    Source Control

    View full-size slide

  13. SQL and Apps Today: Object Relational
    Mapping
    ● Model your entities using an ORM library
    for your application
    ● Takes care of data retrieval and
    persistence
    ● Great for bootstrapping
    ○ Focus on app code not storage
    ○ Strong control of your business
    objects
    ● Rich libraries in many languages
    13
    Application runtime
    Models
    ORM
    Backend
    DB Driver

    View full-size slide

  14. SQL and Apps Today: We can do more!
    ● It is a high performance machine...why not use it?
    ○ Make use of advanced data structures
    ■ JSON, XML, Hstore, IP, Geo…
    ○ Do your computation where the data is
    ■ Aggregates, pivots, windowing
    ■ Save bandwidth
    ○ Enforce business rules
    ■ Guarantee the integrity of your data
    ■ Asynchronously act on events
    ● How can we use all of this in a simple and easy way from our Python code?
    ● How do we solve the source control problem?
    14

    View full-size slide

  15. SQL and Apps Today: JSON Example
    15
    {
    "forms": [
    {
    "url": "...api/v2/pokemon-form/25/",
    "name": "pikachu"
    }
    ],
    "abilities": [
    {
    "slot": 3,
    "is_hidden": true,
    "ability": {
    "url": ".../api/v2/ability/31/",
    "name": "lightning-rod"
    }
    },
    ...
    SELECT
    jsonb_array_elements(
    data->'abilities'
    )
    ->'ability'
    ->>'name'
    FROM pokemon
    ability |
    --------------|
    lightning-rod |
    static |

    View full-size slide

  16. SQLpy
    ● Write SQL *next* to your
    application code
    ● Prepares a ready to execute
    functools.partial object
    for you
    ● No query strings dotted all over
    source files
    ● A light wrapper around your
    DB2.0 API
    ● Genesis from YeSQL
    16
    In source control!
    queries.sql
    Application runtime
    Whatever you want
    DB Driver
    Source Control

    View full-size slide

  17. That awkward moment when you need SQL
    17
    # set up 2 query objects one for subquery one for window function
    subquery =
    db.session.query(func.DATE(Payment.payment_date).label('payme
    nt_date'),
    func.sum(Payment.amount).label('amount')
    ).group_by(func.DATE(Payment.payment_date))\
    .order_by(func.DATE(Payment.payment_date))\
    .subquery('s')
    window = db.session.query(subquery.c.payment_date,
    subquery.c.amount,
    func.sum(subquery.c.amount).over(
    order_by=subquery.c.payment_date)
    ).select_entity_from(subquery).order_by(subquery.c.payment_date)
    ● A nested data question
    ● 10+ (readable) lines
    ● This is quite nice syntax
    actually
    ● But the semantics are
    halfway between Python
    object orientation and SQL
    declarative style
    How to do the same in SQL?

    View full-size slide

  18. That awkward moment when you need SQL
    18
    How to do the same in SQL?
    SELECT payment_date, amount, sum(amount) OVER (ORDER BY payment_date)
    FROM ( SELECT CAST(payment_date AS DATE) AS payment_date, SUM(amount)
    AS amount
    FROM payment
    GROUP BY CAST(payment_date AS DATE)) p
    ORDER BY payment_date;
    from itertools import accumulate
    # List of dates and amount pairs
    payments = [(payment_date, amount)]
    for (pay_dt, amt), cum_pay in zip(payments, accumulate(payments)):
    print(f'{pay_dt} {amt} {cum_pay}')
    Easier in Python!

    View full-size slide

  19. SQLpy + Models
    ● Still like linking behaviours and actions via
    objects?
    ● Use the DAO pattern - define the models
    how you want
    ○ Add extra methods to serialize or
    validate eg. Marshmallow
    19
    class Hello:
    def __init__(self, queryfunc, cur, args):
    self._query = queryfunc
    self._result = self.result(cur, args)
    @property
    def result(self):
    return self._result
    @result.setter
    def result(self, cur, args):
    self._result = self._query(cur, args)
    def refresh(self, cur, args):
    return self._query(cur, args)
    def jsonout(self):
    return json.dumps(self.result)

    View full-size slide

  20. SQLpy + Composition
    ● Constructing SQL queries is legitimately dangerous!
    ● How to do this in SQLpy, but not destroy the valid SQL syntax in queries file?
    ● Answer: BUILT SQL
    ○ Clauses added based on parameters passed to the wrapping function
    ○ Transparent and lightweight
    ● Query still passes through the DB2 .0 API library sanitiser for safe execution
    20

    View full-size slide

  21. SQLpy + Composition
    21
    -- name: built_sql_query$
    SELECT * FROM messages
    WHERE 1=1
    AND id = %(id_low)s
    OR id = %(id_high)s
    AND message = %(msg)s;
    sql = sqlpy.Queries('queries.sql')
    ....
    args = {'id_low': 1}
    results = sql.BUILT_SQL_QUERY(cur, args)
    SELECT * FROM messages
    WHERE 1=1
    AND id = 1;
    id message next_message
    1 hello there
    2 SQLpy NULL
    3 databases rule!
    4 hello friend

    View full-size slide

  22. Beyond web apps: Python taking over!
    ● Multitude of tools have made Python the interface language of choice
    ● But the glue has been clunky bash scripts, python scripts, CSV dumps and loads
    ● Why not use use Python all the way through and really integrate your systems and
    pipelines with one universal language?
    22
    Database
    Object
    Store
    --> Log Streaming -->
    Distributed Computation

    View full-size slide

  23. Data Science
    ● Moving data from A to B is a very frequent task
    ○ Data cleaning and transformation takes up a much higher portion of our time
    than we’d like :(
    ○ Any efficiency gain on the ingestion and processing pipeline is a win
    ● ORMs sometimes do not fit very well here as there are no “entities” to model
    ○ We’re experimenting: cutting, pivoting, adding and dropping attributes of
    interest on the fly
    ● Use SQLpy to have your data modelling language next to your computation language
    23

    View full-size slide

  24. Data Science - SQL is coming for you
    ● Next generation datastores are cloud native, distributed and big in scale
    ● They’re not traditionally relational
    ○ But many are converging on SQL as their data modelling language
    ○ Not full SQL standard, but most common operations present
    ○ No traditional relational Database ORM support
    ● All provide Python libraries that expose their query language interfaces
    ○ kSQL (kafka)
    ○ sparkSQL (Apache Spark)
    ○ BigQuery (Google Cloud)
    ○ Athena (AWS S3)
    ● Let’s modify SQLpy to work with these too
    24

    View full-size slide

  25. Google BigQuery API with SQL
    ● Since SQLpy has no opinion on the backend datastore interface mechanism, it just
    expects a DB 2.0 API compliant cursor object
    ○ Make an object defining execute, fetchone, fetchall etc… methods
    25
    https://github.com/HUSSTECH/talks-code/tree/master/party-like-its-ansi-1999

    View full-size slide

  26. sqlpy.readthedocs.io
    ● github.com/9fin/sqlpy (MIT)
    ● Made for PostgreSQL first
    ○ Other DBs working in theory, but not
    tested!
    ● Still ○ ...so critical comments and other
    contributions welcome
    v0.3.0 stable today!
    (develop-0-3-0 branch)
    26
    /9fin/sqlpy

    View full-size slide

  27. Thank You!
    References and Credits [slide#]
    [3] image: By Own Oil Industry News - Own Own work, Public Domain,
    https://commons.wikimedia.org/w/index.php?curid=8266714
    [4] image: CC0 license
    [5] image: An unnamed colleague
    [8] Database timeline based on: https://hpi.de/naumann/projects/rdbms-genealogy.html
    [15] PokeAPI for json data: https://pokeapi.co/docsv2/
    [24] ksql: https://www.confluent.io/product/ksql/
    [24] SparkSQL: https://spark.apache.org/sql/
    [24] BigQuery: https://cloud.google.com/bigquery/docs/reference/standard-sql/functions-and-operators
    [24] Athena: https://aws.amazon.com/athena/
    [25] Google BigQuery sample applciation: https://cloud.google.com/bigquery/create-simple-app-api
    27

    View full-size slide