Optimising Queries
Optimising Queries
For Fun And Profit
For Fun And Profit
Andrew Godwin / @andrewgodwin
Andrew Godwin / @andrewgodwin
flickr.com/stuckincustoms
flickr.com/stuckincustoms
Slide 2
Slide 2 text
· Django core dev
· Django core dev
· South
· South
· Lanyrd
· Lanyrd
Slide 3
Slide 3 text
Databases are old.
Databases are old.
(and they're not magic!)
(and they're not magic!)
flickr.com/stuckincustoms
flickr.com/stuckincustoms
Slide 4
Slide 4 text
· Computers are slow
· Computers are slow
Otherwise databases would be useless
Otherwise databases would be useless
Slide 5
Slide 5 text
· Indexes, joins or replication
· Indexes, joins or replication
Things a database gives you over a flat file
Things a database gives you over a flat file
Slide 6
Slide 6 text
Databases still need you.
Databases still need you.
They can't solve every problem without hints
They can't solve every problem without hints
flickr.com/96dpi
flickr.com/96dpi
Slide 7
Slide 7 text
Basic Problems
Basic Problems
Slide 8
Slide 8 text
· n+1 queries
· n+1 queries
Far too easy to do!
Far too easy to do!
Slide 9
Slide 9 text
photos = []
photos = []
for person in People.objects.all():
for person in People.objects.all():
photos.append(person.photo.file)
photos.append(person.photo.file)
photos = []
photos = []
for person in People.objects.select_related("photo"):
for person in People.objects.select_related("photo"):
photos.append(person.photo.file)
photos.append(person.photo.file)
Slide 12
Slide 12 text
· Unrestricted joins
· Unrestricted joins
Those exponentials add up
Those exponentials add up
Slide 13
Slide 13 text
1 million rows JOINed with 1 million rows
1 million rows JOINed with 1 million rows
...is 1,000,000,000,000 potential comparisons
...is 1,000,000,000,000 potential comparisons
Slide 14
Slide 14 text
· Denormalise
· Denormalise
It'll save you a lot of time.
It'll save you a lot of time.
Slide 15
Slide 15 text
Finer Optimisation
Finer Optimisation
Slide 16
Slide 16 text
· Your goal: Less queries
· Your goal: Less queries
Databases are speedier than your code
Databases are speedier than your code
Slide 17
Slide 17 text
· Log queries for each page
· Log queries for each page
Django Debug Toolbar works well for this
Django Debug Toolbar works well for this
Slide 18
Slide 18 text
SELECT * FROM people_person WHERE age > 13 AND foo;
SELECT * FROM people_person WHERE age > 13 AND foo;
SELECT * FROM people_photo WHERE id = 42;
SELECT * FROM people_photo WHERE id = 42;
SELECT * FROM people_photo WHERE id = 31;
SELECT * FROM people_photo WHERE id = 31;
SELECT * FROM people_photo WHERE id = 214;
SELECT * FROM people_photo WHERE id = 214;
SELECT * FROM people_photo WHERE id = 84;
SELECT * FROM people_photo WHERE id = 84;
SELECT * FROM people_photo WHERE id = 28;
SELECT * FROM people_photo WHERE id = 28;
SELECT * FROM people_photo WHERE id = 90;
SELECT * FROM people_photo WHERE id = 90;
Slide 19
Slide 19 text
SELECT * FROM people_person AS pr
SELECT * FROM people_person AS pr
INNER JOIN people_photo AS ph ON ph.id = pr.photo_id
INNER JOIN people_photo AS ph ON ph.id = pr.photo_id
WHERE age > 13 AND foo;
WHERE age > 13 AND foo;
Slide 20
Slide 20 text
· At most 100 per page
· At most 100 per page
Preferably, you want 20 or less
Preferably, you want 20 or less
Slide 21
Slide 21 text
· Log slow queries
· Log slow queries
There's settings for this in PostgreSQL/MySQL
There's settings for this in PostgreSQL/MySQL
Slide 22
Slide 22 text
· Analyse slow queries
· Analyse slow queries
See if they need indexes
See if they need indexes
Slide 23
Slide 23 text
# explain select * from aeblog_post where id = 50;
# explain select * from aeblog_post where id = 50;
QUERY PLAN
QUERY PLAN
---------------------------------------------------
---------------------------------------------------
Index Scan using aeblog_post_pkey on aeblog_post
Index Scan using aeblog_post_pkey on aeblog_post
(cost=0.00..8.27 rows=1 width=1812)
(cost=0.00..8.27 rows=1 width=1812)
Index Cond: (id = 50)
Index Cond: (id = 50)
Slide 24
Slide 24 text
# explain select * from aeblog_post
# explain select * from aeblog_post
where section = 'trip-notes';
where section = 'trip-notes';
QUERY PLAN
QUERY PLAN
---------------------------------------------------
---------------------------------------------------
Seq Scan on aeblog_post (cost=0.00..14.21 rows=1
Seq Scan on aeblog_post (cost=0.00..14.21 rows=1
width=1812)
width=1812)
Filter: ((section)::text = 'trip-notes'::text)
Filter: ((section)::text = 'trip-notes'::text)
Slide 25
Slide 25 text
· Don't fetch every column
· Don't fetch every column
Especially if you have big ones
Especially if you have big ones
Slide 26
Slide 26 text
Less Downtime
Less Downtime
Slide 27
Slide 27 text
· Watch out for locks
· Watch out for locks
While doing big operations (e.g. adding columns)
While doing big operations (e.g. adding columns)
Slide 28
Slide 28 text
· Try not to remove columns
· Try not to remove columns
It's a lot easier for code to ignore new ones
It's a lot easier for code to ignore new ones
Slide 29
Slide 29 text
· Have a warm failover
· Have a warm failover
But not automatic failover, unless you know why
But not automatic failover, unless you know why
Slide 30
Slide 30 text
· Have a read-only mode
· Have a read-only mode
It means you never truly go down
It means you never truly go down
Slide 31
Slide 31 text
Column Types
Column Types
Slide 32
Slide 32 text
· Don't use text for everything
· Don't use text for everything
Especially numbers and dates
Especially numbers and dates
Slide 33
Slide 33 text
· Store all times in UTC
· Store all times in UTC
Convert to and from in the code.
Convert to and from in the code.
Slide 34
Slide 34 text
· Make columns NULLable
· Make columns NULLable
Rather than using blank strings.
Rather than using blank strings.
Slide 35
Slide 35 text
· Think about custom types
· Think about custom types
Geometry, IP addresses, JSON...
Geometry, IP addresses, JSON...
Slide 36
Slide 36 text
'NoSQL'
'NoSQL'
Slide 37
Slide 37 text
· Use non-relational wisely
· Use non-relational wisely
Otherwise you'll make everything slower
Otherwise you'll make everything slower
Slide 38
Slide 38 text
· Have a single source of truth
· Have a single source of truth
So you can rebuild if something goes awry
So you can rebuild if something goes awry
Slide 39
Slide 39 text
· Enforce constraints in code
· Enforce constraints in code
Before your data becomes useless
Before your data becomes useless
Slide 40
Slide 40 text
· Postgres does schemaless too
· Postgres does schemaless too
hstore and now JSON column types
hstore and now JSON column types