Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Optimising Queries For Fun And Profit

Optimising Queries For Fun And Profit

A talk I gave at PyCon Ukraine 2012

077e9a0cb34fa3eba2699240c9509717?s=128

Andrew Godwin

October 20, 2012
Tweet

More Decks by Andrew Godwin

Other Decks in Programming

Transcript

  1. Optimising Queries Optimising Queries For Fun And Profit For Fun

    And Profit Andrew Godwin / @andrewgodwin Andrew Godwin / @andrewgodwin flickr.com/stuckincustoms flickr.com/stuckincustoms
  2. · Django core dev · Django core dev · South

    · South · Lanyrd · Lanyrd
  3. Databases are old. Databases are old. (and they're not magic!)

    (and they're not magic!) flickr.com/stuckincustoms flickr.com/stuckincustoms
  4. · Computers are slow · Computers are slow Otherwise databases

    would be useless Otherwise databases would be useless
  5. · Indexes, joins or replication · Indexes, joins or replication

    Things a database gives you over a flat file Things a database gives you over a flat file
  6. Databases still need you. Databases still need you. They can't

    solve every problem without hints They can't solve every problem without hints flickr.com/96dpi flickr.com/96dpi
  7. Basic Problems Basic Problems

  8. · n+1 queries · n+1 queries Far too easy to

    do! Far too easy to do!
  9. photos = [] photos = [] for person in People.objects.all():

    for person in People.objects.all(): photos.append(person.photo.file) photos.append(person.photo.file)
  10. photo_ids = People.objects.values_list( photo_ids = People.objects.values_list( "photo", flat=True) "photo", flat=True)

    photos = Photo.objects.filter(id__in=photo_ids) photos = Photo.objects.filter(id__in=photo_ids)
  11. photos = [] photos = [] for person in People.objects.select_related("photo"):

    for person in People.objects.select_related("photo"): photos.append(person.photo.file) photos.append(person.photo.file)
  12. · Unrestricted joins · Unrestricted joins Those exponentials add up

    Those exponentials add up
  13. 1 million rows JOINed with 1 million rows 1 million

    rows JOINed with 1 million rows ...is 1,000,000,000,000 potential comparisons ...is 1,000,000,000,000 potential comparisons
  14. · Denormalise · Denormalise It'll save you a lot of

    time. It'll save you a lot of time.
  15. Finer Optimisation Finer Optimisation

  16. · Your goal: Less queries · Your goal: Less queries

    Databases are speedier than your code Databases are speedier than your code
  17. · Log queries for each page · Log queries for

    each page Django Debug Toolbar works well for this Django Debug Toolbar works well for this
  18. SELECT * FROM people_person WHERE age > 13 AND foo;

    SELECT * FROM people_person WHERE age > 13 AND foo; SELECT * FROM people_photo WHERE id = 42; SELECT * FROM people_photo WHERE id = 42; SELECT * FROM people_photo WHERE id = 31; SELECT * FROM people_photo WHERE id = 31; SELECT * FROM people_photo WHERE id = 214; SELECT * FROM people_photo WHERE id = 214; SELECT * FROM people_photo WHERE id = 84; SELECT * FROM people_photo WHERE id = 84; SELECT * FROM people_photo WHERE id = 28; SELECT * FROM people_photo WHERE id = 28; SELECT * FROM people_photo WHERE id = 90; SELECT * FROM people_photo WHERE id = 90;
  19. SELECT * FROM people_person AS pr SELECT * FROM people_person

    AS pr INNER JOIN people_photo AS ph ON ph.id = pr.photo_id INNER JOIN people_photo AS ph ON ph.id = pr.photo_id WHERE age > 13 AND foo; WHERE age > 13 AND foo;
  20. · At most 100 per page · At most 100

    per page Preferably, you want 20 or less Preferably, you want 20 or less
  21. · Log slow queries · Log slow queries There's settings

    for this in PostgreSQL/MySQL There's settings for this in PostgreSQL/MySQL
  22. · Analyse slow queries · Analyse slow queries See if

    they need indexes See if they need indexes
  23. # explain select * from aeblog_post where id = 50;

    # explain select * from aeblog_post where id = 50; QUERY PLAN QUERY PLAN --------------------------------------------------- --------------------------------------------------- Index Scan using aeblog_post_pkey on aeblog_post Index Scan using aeblog_post_pkey on aeblog_post (cost=0.00..8.27 rows=1 width=1812) (cost=0.00..8.27 rows=1 width=1812) Index Cond: (id = 50) Index Cond: (id = 50)
  24. # explain select * from aeblog_post # explain select *

    from aeblog_post where section = 'trip-notes'; where section = 'trip-notes'; QUERY PLAN QUERY PLAN --------------------------------------------------- --------------------------------------------------- Seq Scan on aeblog_post (cost=0.00..14.21 rows=1 Seq Scan on aeblog_post (cost=0.00..14.21 rows=1 width=1812) width=1812) Filter: ((section)::text = 'trip-notes'::text) Filter: ((section)::text = 'trip-notes'::text)
  25. · Don't fetch every column · Don't fetch every column

    Especially if you have big ones Especially if you have big ones
  26. Less Downtime Less Downtime

  27. · Watch out for locks · Watch out for locks

    While doing big operations (e.g. adding columns) While doing big operations (e.g. adding columns)
  28. · Try not to remove columns · Try not to

    remove columns It's a lot easier for code to ignore new ones It's a lot easier for code to ignore new ones
  29. · Have a warm failover · Have a warm failover

    But not automatic failover, unless you know why But not automatic failover, unless you know why
  30. · Have a read-only mode · Have a read-only mode

    It means you never truly go down It means you never truly go down
  31. Column Types Column Types

  32. · Don't use text for everything · Don't use text

    for everything Especially numbers and dates Especially numbers and dates
  33. · Store all times in UTC · Store all times

    in UTC Convert to and from in the code. Convert to and from in the code.
  34. · Make columns NULLable · Make columns NULLable Rather than

    using blank strings. Rather than using blank strings.
  35. · Think about custom types · Think about custom types

    Geometry, IP addresses, JSON... Geometry, IP addresses, JSON...
  36. 'NoSQL' 'NoSQL'

  37. · Use non-relational wisely · Use non-relational wisely Otherwise you'll

    make everything slower Otherwise you'll make everything slower
  38. · Have a single source of truth · Have a

    single source of truth So you can rebuild if something goes awry So you can rebuild if something goes awry
  39. · Enforce constraints in code · Enforce constraints in code

    Before your data becomes useless Before your data becomes useless
  40. · Postgres does schemaless too · Postgres does schemaless too

    hstore and now JSON column types hstore and now JSON column types
  41. Basically... Basically... flickr.com/stuckincustoms flickr.com/stuckincustoms

  42. · Less queries · Less queries

  43. · Less queries · Less queries · Less JOINs ·

    Less JOINs
  44. · Less queries · Less queries · Less JOINs ·

    Less JOINs · Better indexes · Better indexes
  45. · Less queries · Less queries · Less JOINs ·

    Less JOINs · Better indexes · Better indexes · (Faster servers) · (Faster servers)
  46. Thanks. Thanks. Andrew Godwin / @andrewgodwin Andrew Godwin / @andrewgodwin

    flickr.com/oimax flickr.com/oimax