Optimising Queries For Fun And Profit

Optimising Queries For Fun And Profit

A talk I gave at PyCon Ukraine 2012

077e9a0cb34fa3eba2699240c9509717?s=128

Andrew Godwin

October 20, 2012
Tweet

Transcript

  1. 1.

    Optimising Queries Optimising Queries For Fun And Profit For Fun

    And Profit Andrew Godwin / @andrewgodwin Andrew Godwin / @andrewgodwin flickr.com/stuckincustoms flickr.com/stuckincustoms
  2. 2.

    · Django core dev · Django core dev · South

    · South · Lanyrd · Lanyrd
  3. 3.

    Databases are old. Databases are old. (and they're not magic!)

    (and they're not magic!) flickr.com/stuckincustoms flickr.com/stuckincustoms
  4. 4.

    · Computers are slow · Computers are slow Otherwise databases

    would be useless Otherwise databases would be useless
  5. 5.

    · Indexes, joins or replication · Indexes, joins or replication

    Things a database gives you over a flat file Things a database gives you over a flat file
  6. 6.

    Databases still need you. Databases still need you. They can't

    solve every problem without hints They can't solve every problem without hints flickr.com/96dpi flickr.com/96dpi
  7. 9.

    photos = [] photos = [] for person in People.objects.all():

    for person in People.objects.all(): photos.append(person.photo.file) photos.append(person.photo.file)
  8. 10.

    photo_ids = People.objects.values_list( photo_ids = People.objects.values_list( "photo", flat=True) "photo", flat=True)

    photos = Photo.objects.filter(id__in=photo_ids) photos = Photo.objects.filter(id__in=photo_ids)
  9. 11.

    photos = [] photos = [] for person in People.objects.select_related("photo"):

    for person in People.objects.select_related("photo"): photos.append(person.photo.file) photos.append(person.photo.file)
  10. 13.

    1 million rows JOINed with 1 million rows 1 million

    rows JOINed with 1 million rows ...is 1,000,000,000,000 potential comparisons ...is 1,000,000,000,000 potential comparisons
  11. 14.

    · Denormalise · Denormalise It'll save you a lot of

    time. It'll save you a lot of time.
  12. 16.

    · Your goal: Less queries · Your goal: Less queries

    Databases are speedier than your code Databases are speedier than your code
  13. 17.

    · Log queries for each page · Log queries for

    each page Django Debug Toolbar works well for this Django Debug Toolbar works well for this
  14. 18.

    SELECT * FROM people_person WHERE age > 13 AND foo;

    SELECT * FROM people_person WHERE age > 13 AND foo; SELECT * FROM people_photo WHERE id = 42; SELECT * FROM people_photo WHERE id = 42; SELECT * FROM people_photo WHERE id = 31; SELECT * FROM people_photo WHERE id = 31; SELECT * FROM people_photo WHERE id = 214; SELECT * FROM people_photo WHERE id = 214; SELECT * FROM people_photo WHERE id = 84; SELECT * FROM people_photo WHERE id = 84; SELECT * FROM people_photo WHERE id = 28; SELECT * FROM people_photo WHERE id = 28; SELECT * FROM people_photo WHERE id = 90; SELECT * FROM people_photo WHERE id = 90;
  15. 19.

    SELECT * FROM people_person AS pr SELECT * FROM people_person

    AS pr INNER JOIN people_photo AS ph ON ph.id = pr.photo_id INNER JOIN people_photo AS ph ON ph.id = pr.photo_id WHERE age > 13 AND foo; WHERE age > 13 AND foo;
  16. 20.

    · At most 100 per page · At most 100

    per page Preferably, you want 20 or less Preferably, you want 20 or less
  17. 21.

    · Log slow queries · Log slow queries There's settings

    for this in PostgreSQL/MySQL There's settings for this in PostgreSQL/MySQL
  18. 22.

    · Analyse slow queries · Analyse slow queries See if

    they need indexes See if they need indexes
  19. 23.

    # explain select * from aeblog_post where id = 50;

    # explain select * from aeblog_post where id = 50; QUERY PLAN QUERY PLAN --------------------------------------------------- --------------------------------------------------- Index Scan using aeblog_post_pkey on aeblog_post Index Scan using aeblog_post_pkey on aeblog_post (cost=0.00..8.27 rows=1 width=1812) (cost=0.00..8.27 rows=1 width=1812) Index Cond: (id = 50) Index Cond: (id = 50)
  20. 24.

    # explain select * from aeblog_post # explain select *

    from aeblog_post where section = 'trip-notes'; where section = 'trip-notes'; QUERY PLAN QUERY PLAN --------------------------------------------------- --------------------------------------------------- Seq Scan on aeblog_post (cost=0.00..14.21 rows=1 Seq Scan on aeblog_post (cost=0.00..14.21 rows=1 width=1812) width=1812) Filter: ((section)::text = 'trip-notes'::text) Filter: ((section)::text = 'trip-notes'::text)
  21. 25.

    · Don't fetch every column · Don't fetch every column

    Especially if you have big ones Especially if you have big ones
  22. 27.

    · Watch out for locks · Watch out for locks

    While doing big operations (e.g. adding columns) While doing big operations (e.g. adding columns)
  23. 28.

    · Try not to remove columns · Try not to

    remove columns It's a lot easier for code to ignore new ones It's a lot easier for code to ignore new ones
  24. 29.

    · Have a warm failover · Have a warm failover

    But not automatic failover, unless you know why But not automatic failover, unless you know why
  25. 30.

    · Have a read-only mode · Have a read-only mode

    It means you never truly go down It means you never truly go down
  26. 32.

    · Don't use text for everything · Don't use text

    for everything Especially numbers and dates Especially numbers and dates
  27. 33.

    · Store all times in UTC · Store all times

    in UTC Convert to and from in the code. Convert to and from in the code.
  28. 34.

    · Make columns NULLable · Make columns NULLable Rather than

    using blank strings. Rather than using blank strings.
  29. 35.

    · Think about custom types · Think about custom types

    Geometry, IP addresses, JSON... Geometry, IP addresses, JSON...
  30. 37.

    · Use non-relational wisely · Use non-relational wisely Otherwise you'll

    make everything slower Otherwise you'll make everything slower
  31. 38.

    · Have a single source of truth · Have a

    single source of truth So you can rebuild if something goes awry So you can rebuild if something goes awry
  32. 39.

    · Enforce constraints in code · Enforce constraints in code

    Before your data becomes useless Before your data becomes useless
  33. 40.

    · Postgres does schemaless too · Postgres does schemaless too

    hstore and now JSON column types hstore and now JSON column types
  34. 44.

    · Less queries · Less queries · Less JOINs ·

    Less JOINs · Better indexes · Better indexes
  35. 45.

    · Less queries · Less queries · Less JOINs ·

    Less JOINs · Better indexes · Better indexes · (Faster servers) · (Faster servers)