Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Datasette (CSVConf 2019)

Datasette (CSVConf 2019)

Datasette is a tool for instantly publishing structured data on the internet. It makes it easy to construct and execute arbitrary SQL queries (using SQLite) and export the results as CSV. It's accompanying tool csvs-to-sqlite makes it easy to convert CSV files into a SQLite database. More info at https://github.com/simonw/datasette

This talk presented at csv,conf,v4 in Portland on 9th May 2019

E172168287724cd3051588354ded616b?s=128

Simon Willison

May 09, 2019
Tweet

Transcript

  1. Datasette Simon Willison @simonw csv,conf,v4 - May 9th 2019

  2. None
  3. None
  4. None
  5. A better way of publishing data?

  6. Datasette A tool for exploring and publishing data datasette.io

  7. None
  8. None
  9. None
  10. None
  11. Some handy features • Filtering and faceting • Custom SQL

    queries • JSON API to everything • Export table (or query results) as CSV
  12. The secret sauce is SQLite • Small, fast, ubiquitous •

    A database is a single file • Doesn’t scale well for writes… • … but who cares if your data is read-only? • Ship your data and code in the same container!
  13. pip install datasette (Python 3 only)

  14. find ~/Library -iname '*.sqlite*' \ -type f -exec du -h

    {} + | sort -r -h
  15. find ~/Library -iname '*.sqlite*' \ -type f -exec du -h

    {} + | sort -r -h
  16. CLSBusinessCategoryCache.db ?

  17. None
  18. None
  19. None
  20. Creating SQLite databases

  21. csvs-to-sqlite

  22. None
  23. csvs-to-sqlite \ WashSqPark2.csv \ washington.db

  24. https://glitch.com/edit/#!/datasette-csvs

  25. None
  26. sqlite-utils

  27. Building now-and- next for CSVConf

  28. Hacky on-a-plane Jupyter scraping script https://nbviewer.jupyter.org/gist/simonw/1d0aa0e7c434cb8cb161b918f56d9440

  29. Publish it with Google Cloud Run

  30. datasette publish cloudrun csvconf.db \ --title="csv,conf,v4" \ --name=csvconf \ --source_url="https://csvconf.com/"

    \ --extra-options="--cors" \ --service="csvconf"
  31. datasette publish cloudrun csvconf.db \ --title="csv,conf,v4" \ --name=csvconf \ --source_url="https://csvconf.com/"

    \ --extra-options="--cors" \ --service="csvconf"
  32. datasette publish cloudrun csvconf.db \ --title="csv,conf,v4" \ --name=csvconf \ --source_url="https://csvconf.com/"

    \ --extra-options="--cors" \ --service="csvconf" Thanks, Romain Primet!
  33. None
  34. Let’s do something with that API

  35. select "now" as nownext, * from talks where datetime("now", "-7

    hours") > datetime([datetime]) and datetime("now", "-7 hours") < datetime([datetime], "+30 minutes") union select "next" as nownext, * from talks where datetime(datetime("now", "-7 hours"), "+30 minutes") > datetime([datetime]) and datetime(datetime("now", "-7 hours"), "+30 minutes") < datetime([datetime], "+30 minutes") union select "later" as nownext, * from talks where datetime(datetime("now", "-7 hours"), "+60 minutes") < datetime([datetime]) order by [datetime] limit 12;
  36. https://csvconf-j7hipcg4aq-uc.a.run.app/csvconf-5ae783b?sql=select+ %22now%22+as+nownext%2C+*+from+talks+where%0D%0A+ +datetime%28%22now%22%2C+%22-7+hours%22%29+ %3E+datetime%28%5Bdatetime%5D%29+and%0D%0A+ +datetime%28%22now%22%2C+%22-7+hours%22%29+ %3C+datetime%28%5Bdatetime%5D%2C+ %22%2B30+minutes%22%29%0D%0Aunion+select+ %22next%22+as+nownext%2C+*+from+talks+where%0D%0A+ +datetime%28datetime%28%22now%22%2C+%22-7+hours%22%29%2C+ %22%2B30+minutes%22%29+

    %3E+datetime%28%5Bdatetime%5D%29+and%0D%0A+ +datetime%28datetime%28%22now%22%2C+%22-7+hours%22%29%2C+ %22%2B30+minutes%22%29+%3C+datetime%28%5Bdatetime%5D%2C+ %22%2B30+minutes%22%29%0D%0Aunion+select+ %22later%22+as+nownext%2C+*+from+talks+where%0D%0A+ +datetime%28datetime%28%22now%22%2C+%22-7+hours%22%29%2C+ %22%2B60+minutes%22%29+ %3C+datetime%28%5Bdatetime%5D%29%0D%0Aorder+by+ %5Bdatetime%5D+limit+12%3B Thank goodness for URL shorteners…
  37. An entire application in a URL you can bookmark or

    share
  38. .. but let’s build something a bit more friendly

  39. https://glitch.com/~csvconf-now-and-next

  40. Some more interesting projects

  41. Russian IRA ads https://simonwillison.net/2018/Aug/6/russian-facebook-ads/ Credit: Ed Summers, for cleaning the

    data
  42. None
  43. Baltimore Salaries https://salaries.news.baltimoresun.com/ Credit: Carl Johnson, Baltimore Sun

  44. None
  45. None
  46. The world is full of interesting data…

  47. … let’s publish it in the most useful way possible

  48. simonwillison.net @simonw on Twitter