Big Opportunities in
Small Data
Simon Willison, Citus Con 2023
Slide 2
Slide 2 text
Small Data?
Slide 3
Slide 3 text
Data Journalism
Slide 4
Slide 4 text
What’s the best possible way of
publishing structured data?
Slide 5
Slide 5 text
No content
Slide 6
Slide 6 text
Demo: SF City Facilities
Slide 7
Slide 7 text
Read-only SQL queries
via an API
Slide 8
Slide 8 text
No content
Slide 9
Slide 9 text
No content
Slide 10
Slide 10 text
No content
Slide 11
Slide 11 text
No content
Slide 12
Slide 12 text
No content
Slide 13
Slide 13 text
No content
Slide 14
Slide 14 text
SQLite
Slide 15
Slide 15 text
It’s installed already (even on my phone)
A SQLite database is a single file
Very stable file format
Slide 16
Slide 16 text
Datasette Python web app
SQLite database
fi
les
(read-only data)
One deployable container
Docker, K8s, Vercel, AWS Lambda,
Google Cloud Run, Fly…
Baked Data
Slide 17
Slide 17 text
HTTP load balancer
Datasette Python web app
SQLite database
fi
les
Datasette Python web app
SQLite database
fi
les
Datasette Python web app
SQLite database
fi
les
Slide 18
Slide 18 text
SQL plus HTTP is a
fantastic integration tool
Slide 19
Slide 19 text
No content
Slide 20
Slide 20 text
No content
Slide 21
Slide 21 text
No content
Slide 22
Slide 22 text
No content
Slide 23
Slide 23 text
Baked Data with PostgreSQL?
Deploy a container with
a full read-only database
Slide 24
Slide 24 text
Datasette in WebAssembly
in your browser
Slide 25
Slide 25 text
Demo: Datasette Lite
Slide 26
Slide 26 text
PostgreSQL WebAssembly
Slide 27
Slide 27 text
SQLite is serverless
(before serverless was cool)
Slide 28
Slide 28 text
SQLite, DuckDB… PostgreSQL?
Slide 29
Slide 29 text
SQLite, DuckDB… PostgreSQL?
Slide 30
Slide 30 text
PostgreSQL as a library?
$ pip install postgreslib (or npm, etc)
import postgreslib
db = postgreslib.connect(
"/mnt/data.postgresql"
)
db.execute("create table …")
Slide 31
Slide 31 text
Ideas to take away
• Small Data deserves more tooling
• Scale read-only data up and down with the Baked Data pattern
• Read-only SQL APIs are a really good idea!
• Especially since you can reformat data with a SQL query
• You can run databases in the browser now
• Please build me PostgreSQL as a library!