Optimising Queries For Fun And Profit

Optimising Queries For Fun And Profit

A talk I gave at PyCon Ukraine 2012

077e9a0cb34fa3eba2699240c9509717?s=128

Andrew Godwin

October 20, 2012
Tweet

Transcript

  1. Optimising Queries Optimising Queries For Fun And Profit For Fun

    And Profit Andrew Godwin / @andrewgodwin Andrew Godwin / @andrewgodwin flickr.com/stuckincustoms flickr.com/stuckincustoms
  2. · Django core dev · Django core dev · South

    · South · Lanyrd · Lanyrd
  3. Databases are old. Databases are old. (and they're not magic!)

    (and they're not magic!) flickr.com/stuckincustoms flickr.com/stuckincustoms
  4. · Computers are slow · Computers are slow Otherwise databases

    would be useless Otherwise databases would be useless
  5. · Indexes, joins or replication · Indexes, joins or replication

    Things a database gives you over a flat file Things a database gives you over a flat file
  6. Databases still need you. Databases still need you. They can't

    solve every problem without hints They can't solve every problem without hints flickr.com/96dpi flickr.com/96dpi
  7. Basic Problems Basic Problems

  8. · n+1 queries · n+1 queries Far too easy to

    do! Far too easy to do!
  9. photos = [] photos = [] for person in People.objects.all():

    for person in People.objects.all(): photos.append(person.photo.file) photos.append(person.photo.file)
  10. photo_ids = People.objects.values_list( photo_ids = People.objects.values_list( "photo", flat=True) "photo", flat=True)

    photos = Photo.objects.filter(id__in=photo_ids) photos = Photo.objects.filter(id__in=photo_ids)
  11. photos = [] photos = [] for person in People.objects.select_related("photo"):

    for person in People.objects.select_related("photo"): photos.append(person.photo.file) photos.append(person.photo.file)
  12. · Unrestricted joins · Unrestricted joins Those exponentials add up

    Those exponentials add up
  13. 1 million rows JOINed with 1 million rows 1 million

    rows JOINed with 1 million rows ...is 1,000,000,000,000 potential comparisons ...is 1,000,000,000,000 potential comparisons
  14. · Denormalise · Denormalise It'll save you a lot of

    time. It'll save you a lot of time.
  15. Finer Optimisation Finer Optimisation

  16. · Your goal: Less queries · Your goal: Less queries

    Databases are speedier than your code Databases are speedier than your code
  17. · Log queries for each page · Log queries for

    each page Django Debug Toolbar works well for this Django Debug Toolbar works well for this
  18. SELECT * FROM people_person WHERE age > 13 AND foo;

    SELECT * FROM people_person WHERE age > 13 AND foo; SELECT * FROM people_photo WHERE id = 42; SELECT * FROM people_photo WHERE id = 42; SELECT * FROM people_photo WHERE id = 31; SELECT * FROM people_photo WHERE id = 31; SELECT * FROM people_photo WHERE id = 214; SELECT * FROM people_photo WHERE id = 214; SELECT * FROM people_photo WHERE id = 84; SELECT * FROM people_photo WHERE id = 84; SELECT * FROM people_photo WHERE id = 28; SELECT * FROM people_photo WHERE id = 28; SELECT * FROM people_photo WHERE id = 90; SELECT * FROM people_photo WHERE id = 90;
  19. SELECT * FROM people_person AS pr SELECT * FROM people_person

    AS pr INNER JOIN people_photo AS ph ON ph.id = pr.photo_id INNER JOIN people_photo AS ph ON ph.id = pr.photo_id WHERE age > 13 AND foo; WHERE age > 13 AND foo;
  20. · At most 100 per page · At most 100

    per page Preferably, you want 20 or less Preferably, you want 20 or less
  21. · Log slow queries · Log slow queries There's settings

    for this in PostgreSQL/MySQL There's settings for this in PostgreSQL/MySQL
  22. · Analyse slow queries · Analyse slow queries See if

    they need indexes See if they need indexes
  23. # explain select * from aeblog_post where id = 50;

    # explain select * from aeblog_post where id = 50; QUERY PLAN QUERY PLAN --------------------------------------------------- --------------------------------------------------- Index Scan using aeblog_post_pkey on aeblog_post Index Scan using aeblog_post_pkey on aeblog_post (cost=0.00..8.27 rows=1 width=1812) (cost=0.00..8.27 rows=1 width=1812) Index Cond: (id = 50) Index Cond: (id = 50)
  24. # explain select * from aeblog_post # explain select *

    from aeblog_post where section = 'trip-notes'; where section = 'trip-notes'; QUERY PLAN QUERY PLAN --------------------------------------------------- --------------------------------------------------- Seq Scan on aeblog_post (cost=0.00..14.21 rows=1 Seq Scan on aeblog_post (cost=0.00..14.21 rows=1 width=1812) width=1812) Filter: ((section)::text = 'trip-notes'::text) Filter: ((section)::text = 'trip-notes'::text)
  25. · Don't fetch every column · Don't fetch every column

    Especially if you have big ones Especially if you have big ones
  26. Less Downtime Less Downtime

  27. · Watch out for locks · Watch out for locks

    While doing big operations (e.g. adding columns) While doing big operations (e.g. adding columns)
  28. · Try not to remove columns · Try not to

    remove columns It's a lot easier for code to ignore new ones It's a lot easier for code to ignore new ones
  29. · Have a warm failover · Have a warm failover

    But not automatic failover, unless you know why But not automatic failover, unless you know why
  30. · Have a read-only mode · Have a read-only mode

    It means you never truly go down It means you never truly go down
  31. Column Types Column Types

  32. · Don't use text for everything · Don't use text

    for everything Especially numbers and dates Especially numbers and dates
  33. · Store all times in UTC · Store all times

    in UTC Convert to and from in the code. Convert to and from in the code.
  34. · Make columns NULLable · Make columns NULLable Rather than

    using blank strings. Rather than using blank strings.
  35. · Think about custom types · Think about custom types

    Geometry, IP addresses, JSON... Geometry, IP addresses, JSON...
  36. 'NoSQL' 'NoSQL'

  37. · Use non-relational wisely · Use non-relational wisely Otherwise you'll

    make everything slower Otherwise you'll make everything slower
  38. · Have a single source of truth · Have a

    single source of truth So you can rebuild if something goes awry So you can rebuild if something goes awry
  39. · Enforce constraints in code · Enforce constraints in code

    Before your data becomes useless Before your data becomes useless
  40. · Postgres does schemaless too · Postgres does schemaless too

    hstore and now JSON column types hstore and now JSON column types
  41. Basically... Basically... flickr.com/stuckincustoms flickr.com/stuckincustoms

  42. · Less queries · Less queries

  43. · Less queries · Less queries · Less JOINs ·

    Less JOINs
  44. · Less queries · Less queries · Less JOINs ·

    Less JOINs · Better indexes · Better indexes
  45. · Less queries · Less queries · Less JOINs ·

    Less JOINs · Better indexes · Better indexes · (Faster servers) · (Faster servers)
  46. Thanks. Thanks. Andrew Godwin / @andrewgodwin Andrew Godwin / @andrewgodwin

    flickr.com/oimax flickr.com/oimax