Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Instant serverless APIs, powered by SQLite

Instant serverless APIs, powered by SQLite

Serverless computing is all about paying only for what you use: it can scale up to handle millions of requests, but it can also scale down to 0, costing you nothing if your application is not receiving any traffic.

Serverless tends to get expensive when databases are involved.... but if your data is static or changes infrequently, you can use serverless tools to provide powerful interactive APIs extremely cheaply.

Datasette https://datasette.readthedocs.io/ is an open-source Python tool that provides an instant, read-only JSON API for any SQLite database. It also provides tools for packaging the database up as a Docker container and instantly deploying that container to a number of different serverless hosting platforms.

This makes it a powerful tool for sharing interesting data online, in a way that allows users to both explore that data themselves and build their own interpretations of the data using the Datasette JSON API.

In this session I'll show you how to use Datasette to publish data, and illustrate examples of the exciting things people have already built using the tool - including a number of real-world data journalism projects.

Presented at PyCon US 2019

E172168287724cd3051588354ded616b?s=128

Simon Willison

May 04, 2019
Tweet

Transcript

  1. Instant serverless APIs, powered by SQLite Simon Willison @simonw PyCon

    US 2019
  2. Serverless?

  3. Serverless Scale-to-zero

  4. Serverless Only pay for the computing resources you use. Scale-to-zero

  5. None
  6. None
  7. Stateless

  8. What if you ship static data as part of your

    app?
  9. Data Journalism

  10. None
  11. None
  12. None
  13. A better way of publishing data?

  14. Datasette A tool for exploring and publishing data datasette.io

  15. None
  16. None
  17. None
  18. None
  19. Some handy features • Filtering and faceting • Custom SQL

    queries • JSON API to everything • Export table (or query results) as CSV
  20. The secret sauce is SQLite • Small, fast, ubiquitous •

    A database is a single file • Doesn’t scale well for writes… • … but who cares if your data is read-only? • Ship your data and code in the same container!
  21. pip install datasette (Python 3 only)

  22. find ~/Library -iname '*.sqlite*' \ -type f -exec du -h

    {} + | sort -r -h
  23. find ~/Library -iname '*.sqlite*' \ -type f -exec du -h

    {} + | sort -r -h
  24. CLSBusinessCategoryCache.db ?

  25. None
  26. None
  27. None
  28. Creating SQLite databases

  29. csvs-to-sqlite

  30. None
  31. None
  32. None
  33. csvs-to-sqlite \ SORP_2012.csv \ ohio.db

  34. https://glitch.com/edit/#!/datasette-csvs

  35. None
  36. sqlite-utils

  37. Building a database of PyPI packages in Jupyter

  38. None
  39. None
  40. None
  41. Publish it with Google Cloud Run

  42. datasette publish cloudrun \ /tmp/pypi.db \ --title="PyPI Top 1500 Packages"

    \ --name=pypi \ --install=datasette-vega \ --branch=master \ --extra-options="--cors" Thanks, Romain Primet!
  43. None
  44. Let’s do something with that API

  45. https://glitch.com/edit/#!/pypi-top-1500-autocomplete Based on the technique described in https://24ways.org/2018/fast-autocomplete-search-for- your-website/

  46. None
  47. Some more interesting projects

  48. Russian IRA ads https://simonwillison.net/2018/Aug/6/russian-facebook-ads/ Credit: Ed Summers, for cleaning the

    data
  49. None
  50. Baltimore Salaries https://salaries.news.baltimoresun.com/ Credit: Carl Johnson, Baltimore Sun

  51. None
  52. None
  53. The world is full of interesting data…

  54. … let’s publish it in the most useful way possible

  55. simonwillison.net @simonw on Twitter