Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Optimising Queries For Fun And Profit

Optimising Queries For Fun And Profit

A talk I gave at PyCon Ukraine 2012

Andrew Godwin

October 20, 2012
Tweet

More Decks by Andrew Godwin

Other Decks in Programming

Transcript

  1. Optimising Queries Optimising Queries For Fun And Profit For Fun

    And Profit Andrew Godwin / @andrewgodwin Andrew Godwin / @andrewgodwin flickr.com/stuckincustoms flickr.com/stuckincustoms
  2. · Django core dev · Django core dev · South

    · South · Lanyrd · Lanyrd
  3. Databases are old. Databases are old. (and they're not magic!)

    (and they're not magic!) flickr.com/stuckincustoms flickr.com/stuckincustoms
  4. · Computers are slow · Computers are slow Otherwise databases

    would be useless Otherwise databases would be useless
  5. · Indexes, joins or replication · Indexes, joins or replication

    Things a database gives you over a flat file Things a database gives you over a flat file
  6. Databases still need you. Databases still need you. They can't

    solve every problem without hints They can't solve every problem without hints flickr.com/96dpi flickr.com/96dpi
  7. photos = [] photos = [] for person in People.objects.all():

    for person in People.objects.all(): photos.append(person.photo.file) photos.append(person.photo.file)
  8. photo_ids = People.objects.values_list( photo_ids = People.objects.values_list( "photo", flat=True) "photo", flat=True)

    photos = Photo.objects.filter(id__in=photo_ids) photos = Photo.objects.filter(id__in=photo_ids)
  9. photos = [] photos = [] for person in People.objects.select_related("photo"):

    for person in People.objects.select_related("photo"): photos.append(person.photo.file) photos.append(person.photo.file)
  10. 1 million rows JOINed with 1 million rows 1 million

    rows JOINed with 1 million rows ...is 1,000,000,000,000 potential comparisons ...is 1,000,000,000,000 potential comparisons
  11. · Denormalise · Denormalise It'll save you a lot of

    time. It'll save you a lot of time.
  12. · Your goal: Less queries · Your goal: Less queries

    Databases are speedier than your code Databases are speedier than your code
  13. · Log queries for each page · Log queries for

    each page Django Debug Toolbar works well for this Django Debug Toolbar works well for this
  14. SELECT * FROM people_person WHERE age > 13 AND foo;

    SELECT * FROM people_person WHERE age > 13 AND foo; SELECT * FROM people_photo WHERE id = 42; SELECT * FROM people_photo WHERE id = 42; SELECT * FROM people_photo WHERE id = 31; SELECT * FROM people_photo WHERE id = 31; SELECT * FROM people_photo WHERE id = 214; SELECT * FROM people_photo WHERE id = 214; SELECT * FROM people_photo WHERE id = 84; SELECT * FROM people_photo WHERE id = 84; SELECT * FROM people_photo WHERE id = 28; SELECT * FROM people_photo WHERE id = 28; SELECT * FROM people_photo WHERE id = 90; SELECT * FROM people_photo WHERE id = 90;
  15. SELECT * FROM people_person AS pr SELECT * FROM people_person

    AS pr INNER JOIN people_photo AS ph ON ph.id = pr.photo_id INNER JOIN people_photo AS ph ON ph.id = pr.photo_id WHERE age > 13 AND foo; WHERE age > 13 AND foo;
  16. · At most 100 per page · At most 100

    per page Preferably, you want 20 or less Preferably, you want 20 or less
  17. · Log slow queries · Log slow queries There's settings

    for this in PostgreSQL/MySQL There's settings for this in PostgreSQL/MySQL
  18. · Analyse slow queries · Analyse slow queries See if

    they need indexes See if they need indexes
  19. # explain select * from aeblog_post where id = 50;

    # explain select * from aeblog_post where id = 50; QUERY PLAN QUERY PLAN --------------------------------------------------- --------------------------------------------------- Index Scan using aeblog_post_pkey on aeblog_post Index Scan using aeblog_post_pkey on aeblog_post (cost=0.00..8.27 rows=1 width=1812) (cost=0.00..8.27 rows=1 width=1812) Index Cond: (id = 50) Index Cond: (id = 50)
  20. # explain select * from aeblog_post # explain select *

    from aeblog_post where section = 'trip-notes'; where section = 'trip-notes'; QUERY PLAN QUERY PLAN --------------------------------------------------- --------------------------------------------------- Seq Scan on aeblog_post (cost=0.00..14.21 rows=1 Seq Scan on aeblog_post (cost=0.00..14.21 rows=1 width=1812) width=1812) Filter: ((section)::text = 'trip-notes'::text) Filter: ((section)::text = 'trip-notes'::text)
  21. · Don't fetch every column · Don't fetch every column

    Especially if you have big ones Especially if you have big ones
  22. · Watch out for locks · Watch out for locks

    While doing big operations (e.g. adding columns) While doing big operations (e.g. adding columns)
  23. · Try not to remove columns · Try not to

    remove columns It's a lot easier for code to ignore new ones It's a lot easier for code to ignore new ones
  24. · Have a warm failover · Have a warm failover

    But not automatic failover, unless you know why But not automatic failover, unless you know why
  25. · Have a read-only mode · Have a read-only mode

    It means you never truly go down It means you never truly go down
  26. · Don't use text for everything · Don't use text

    for everything Especially numbers and dates Especially numbers and dates
  27. · Store all times in UTC · Store all times

    in UTC Convert to and from in the code. Convert to and from in the code.
  28. · Make columns NULLable · Make columns NULLable Rather than

    using blank strings. Rather than using blank strings.
  29. · Think about custom types · Think about custom types

    Geometry, IP addresses, JSON... Geometry, IP addresses, JSON...
  30. · Use non-relational wisely · Use non-relational wisely Otherwise you'll

    make everything slower Otherwise you'll make everything slower
  31. · Have a single source of truth · Have a

    single source of truth So you can rebuild if something goes awry So you can rebuild if something goes awry
  32. · Enforce constraints in code · Enforce constraints in code

    Before your data becomes useless Before your data becomes useless
  33. · Postgres does schemaless too · Postgres does schemaless too

    hstore and now JSON column types hstore and now JSON column types
  34. · Less queries · Less queries · Less JOINs ·

    Less JOINs · Better indexes · Better indexes
  35. · Less queries · Less queries · Less JOINs ·

    Less JOINs · Better indexes · Better indexes · (Faster servers) · (Faster servers)