Andrew Godwin
@andrewgodwin
GOOD SCHEMA DESIGN
WHY IT MATTERS
and
Slide 2
Slide 2 text
Andrew Godwin
Core Developer
Senior Engineer
Author & Maintainer
Slide 3
Slide 3 text
Schemas
Explicit & Implicit
Slide 4
Slide 4 text
Explicit
PostgreSQL
MySQL
Oracle
SQLite
CouchDB
MongoDB
Redis
ZODB
Implicit
Slide 5
Slide 5 text
Explicit Schema
ID int Name text Weight uint
1
2
3
Alice
Bob
Charles
76
84
65
Implicit Schema
{
"id": 342,
"name": "David",
"weight": 44,
}
Slide 6
Slide 6 text
Explicit Schema
Normalised or semi normalised structure
JOINs to retrieve related data
Implicit Schema
Embedded structure
Related data retrieved naturally with object
You don't have to use a document DB
(like CouchDB, MongoDB, etc.)
Slide 23
Slide 23 text
Schemaless Columns
ID int Name text Weight uint Data json
1 Alice 76 { "nickname": "Al",
"bgcolor": "#ff0033" }
Slide 24
Slide 24 text
But that must be slower...
Right?
Slide 25
Slide 25 text
Comparison
(never trust benchmarks)
Loading 1.2 million records
PostgreSQL
MongoDB
76 sec
8 min
Sequential scan
PostgreSQL
MongoDB
980 ms
980 ms
Index scan (Postgres GINhash)
PostgreSQL
MongoDB
0.7 ms
1 ms
Slide 26
Slide 26 text
Load Shapes
Slide 27
Slide 27 text
Read-heavy
Write-heavy Large size
Slide 28
Slide 28 text
Read-heavy
Write-heavy Large size
Wikipedia
TV show page
Minecraft
Forums
Amazon Glacier
Eventbrite
Logging
Slide 29
Slide 29 text
Read-heavy
Write-heavy Large size
Offline storage
Append
formats
In-memory cache
Many indexes
Fewer indexes
Slide 30
Slide 30 text
Your load changes over time
Scaling is not just a flat multiplier
Slide 31
Slide 31 text
General Advice
Write heavy → Fewer indexes
Read heavy → Denormalise
Keep large data away from read/write heavy data
Blob stores/filesystems are DBs too
Slide 32
Slide 32 text
Lessons
They're near the end so you remember them.
Slide 33
Slide 33 text
Re-evaulate as you grow
Different things matter at different sizes
Slide 34
Slide 34 text
Adding NULL columns is great
Always prefer this if nothing else
Slide 35
Slide 35 text
You'll need more than one DBMS
But don't use too many, you'll be swamped
Slide 36
Slide 36 text
Indexes aren't free
You pay the price at write/restore time
Slide 37
Slide 37 text
Relational DBs are flexible
They can do a lot more than JOINing normalised tables
Slide 38
Slide 38 text
Thanks!
Andrew Godwin
@andrewgodwin
eventbrite.com/jobs
are hiring: