Slide 1

Slide 1 text

Datasette Simon Willison @simonw csv,conf,v4 - May 9th 2019

Slide 2

Slide 2 text

No content

Slide 3

Slide 3 text

No content

Slide 4

Slide 4 text

No content

Slide 5

Slide 5 text

A better way of publishing data?

Slide 6

Slide 6 text

Datasette A tool for exploring and publishing data datasette.io

Slide 7

Slide 7 text

No content

Slide 8

Slide 8 text

No content

Slide 9

Slide 9 text

No content

Slide 10

Slide 10 text

No content

Slide 11

Slide 11 text

Some handy features • Filtering and faceting • Custom SQL queries • JSON API to everything • Export table (or query results) as CSV

Slide 12

Slide 12 text

The secret sauce is SQLite • Small, fast, ubiquitous • A database is a single file • Doesn’t scale well for writes… • … but who cares if your data is read-only? • Ship your data and code in the same container!

Slide 13

Slide 13 text

pip install datasette (Python 3 only)

Slide 14

Slide 14 text

find ~/Library -iname '*.sqlite*' \ -type f -exec du -h {} + | sort -r -h

Slide 15

Slide 15 text

find ~/Library -iname '*.sqlite*' \ -type f -exec du -h {} + | sort -r -h

Slide 16

Slide 16 text

CLSBusinessCategoryCache.db ?

Slide 17

Slide 17 text

No content

Slide 18

Slide 18 text

No content

Slide 19

Slide 19 text

No content

Slide 20

Slide 20 text

Creating SQLite databases

Slide 21

Slide 21 text

csvs-to-sqlite

Slide 22

Slide 22 text

No content

Slide 23

Slide 23 text

csvs-to-sqlite \ WashSqPark2.csv \ washington.db

Slide 24

Slide 24 text

https://glitch.com/edit/#!/datasette-csvs

Slide 25

Slide 25 text

No content

Slide 26

Slide 26 text

sqlite-utils

Slide 27

Slide 27 text

Building now-and- next for CSVConf

Slide 28

Slide 28 text

Hacky on-a-plane Jupyter scraping script https://nbviewer.jupyter.org/gist/simonw/1d0aa0e7c434cb8cb161b918f56d9440

Slide 29

Slide 29 text

Publish it with Google Cloud Run

Slide 30

Slide 30 text

datasette publish cloudrun csvconf.db \ --title="csv,conf,v4" \ --name=csvconf \ --source_url="https://csvconf.com/" \ --extra-options="--cors" \ --service="csvconf"

Slide 31

Slide 31 text

datasette publish cloudrun csvconf.db \ --title="csv,conf,v4" \ --name=csvconf \ --source_url="https://csvconf.com/" \ --extra-options="--cors" \ --service="csvconf"

Slide 32

Slide 32 text

datasette publish cloudrun csvconf.db \ --title="csv,conf,v4" \ --name=csvconf \ --source_url="https://csvconf.com/" \ --extra-options="--cors" \ --service="csvconf" Thanks, Romain Primet!

Slide 33

Slide 33 text

No content

Slide 34

Slide 34 text

Let’s do something with that API

Slide 35

Slide 35 text

select "now" as nownext, * from talks where datetime("now", "-7 hours") > datetime([datetime]) and datetime("now", "-7 hours") < datetime([datetime], "+30 minutes") union select "next" as nownext, * from talks where datetime(datetime("now", "-7 hours"), "+30 minutes") > datetime([datetime]) and datetime(datetime("now", "-7 hours"), "+30 minutes") < datetime([datetime], "+30 minutes") union select "later" as nownext, * from talks where datetime(datetime("now", "-7 hours"), "+60 minutes") < datetime([datetime]) order by [datetime] limit 12;

Slide 36

Slide 36 text

https://csvconf-j7hipcg4aq-uc.a.run.app/csvconf-5ae783b?sql=select+ %22now%22+as+nownext%2C+*+from+talks+where%0D%0A+ +datetime%28%22now%22%2C+%22-7+hours%22%29+ %3E+datetime%28%5Bdatetime%5D%29+and%0D%0A+ +datetime%28%22now%22%2C+%22-7+hours%22%29+ %3C+datetime%28%5Bdatetime%5D%2C+ %22%2B30+minutes%22%29%0D%0Aunion+select+ %22next%22+as+nownext%2C+*+from+talks+where%0D%0A+ +datetime%28datetime%28%22now%22%2C+%22-7+hours%22%29%2C+ %22%2B30+minutes%22%29+ %3E+datetime%28%5Bdatetime%5D%29+and%0D%0A+ +datetime%28datetime%28%22now%22%2C+%22-7+hours%22%29%2C+ %22%2B30+minutes%22%29+%3C+datetime%28%5Bdatetime%5D%2C+ %22%2B30+minutes%22%29%0D%0Aunion+select+ %22later%22+as+nownext%2C+*+from+talks+where%0D%0A+ +datetime%28datetime%28%22now%22%2C+%22-7+hours%22%29%2C+ %22%2B60+minutes%22%29+ %3C+datetime%28%5Bdatetime%5D%29%0D%0Aorder+by+ %5Bdatetime%5D+limit+12%3B Thank goodness for URL shorteners…

Slide 37

Slide 37 text

An entire application in a URL you can bookmark or share

Slide 38

Slide 38 text

.. but let’s build something a bit more friendly

Slide 39

Slide 39 text

https://glitch.com/~csvconf-now-and-next

Slide 40

Slide 40 text

Some more interesting projects

Slide 41

Slide 41 text

Russian IRA ads https://simonwillison.net/2018/Aug/6/russian-facebook-ads/ Credit: Ed Summers, for cleaning the data

Slide 42

Slide 42 text

No content

Slide 43

Slide 43 text

Baltimore Salaries https://salaries.news.baltimoresun.com/ Credit: Carl Johnson, Baltimore Sun

Slide 44

Slide 44 text

No content

Slide 45

Slide 45 text

No content

Slide 46

Slide 46 text

The world is full of interesting data…

Slide 47

Slide 47 text

… let’s publish it in the most useful way possible

Slide 48

Slide 48 text

simonwillison.net @simonw on Twitter