Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Optimising Queries For Fun And Profit

Optimising Queries For Fun And Profit

A talk I gave at PyCon Ukraine 2012

Andrew Godwin

October 20, 2012
Tweet

More Decks by Andrew Godwin

Other Decks in Programming

Transcript

  1. Optimising Queries
    Optimising Queries
    For Fun And Profit
    For Fun And Profit
    Andrew Godwin / @andrewgodwin
    Andrew Godwin / @andrewgodwin
    flickr.com/stuckincustoms
    flickr.com/stuckincustoms

    View Slide

  2. · Django core dev
    · Django core dev
    · South
    · South
    · Lanyrd
    · Lanyrd

    View Slide

  3. Databases are old.
    Databases are old.
    (and they're not magic!)
    (and they're not magic!)
    flickr.com/stuckincustoms
    flickr.com/stuckincustoms

    View Slide

  4. · Computers are slow
    · Computers are slow
    Otherwise databases would be useless
    Otherwise databases would be useless

    View Slide

  5. · Indexes, joins or replication
    · Indexes, joins or replication
    Things a database gives you over a flat file
    Things a database gives you over a flat file

    View Slide

  6. Databases still need you.
    Databases still need you.
    They can't solve every problem without hints
    They can't solve every problem without hints
    flickr.com/96dpi
    flickr.com/96dpi

    View Slide

  7. Basic Problems
    Basic Problems

    View Slide

  8. · n+1 queries
    · n+1 queries
    Far too easy to do!
    Far too easy to do!

    View Slide

  9. photos = []
    photos = []
    for person in People.objects.all():
    for person in People.objects.all():
    photos.append(person.photo.file)
    photos.append(person.photo.file)

    View Slide

  10. photo_ids = People.objects.values_list(
    photo_ids = People.objects.values_list(
    "photo", flat=True)
    "photo", flat=True)
    photos = Photo.objects.filter(id__in=photo_ids)
    photos = Photo.objects.filter(id__in=photo_ids)

    View Slide

  11. photos = []
    photos = []
    for person in People.objects.select_related("photo"):
    for person in People.objects.select_related("photo"):
    photos.append(person.photo.file)
    photos.append(person.photo.file)

    View Slide

  12. · Unrestricted joins
    · Unrestricted joins
    Those exponentials add up
    Those exponentials add up

    View Slide

  13. 1 million rows JOINed with 1 million rows
    1 million rows JOINed with 1 million rows
    ...is 1,000,000,000,000 potential comparisons
    ...is 1,000,000,000,000 potential comparisons

    View Slide

  14. · Denormalise
    · Denormalise
    It'll save you a lot of time.
    It'll save you a lot of time.

    View Slide

  15. Finer Optimisation
    Finer Optimisation

    View Slide

  16. · Your goal: Less queries
    · Your goal: Less queries
    Databases are speedier than your code
    Databases are speedier than your code

    View Slide

  17. · Log queries for each page
    · Log queries for each page
    Django Debug Toolbar works well for this
    Django Debug Toolbar works well for this

    View Slide

  18. SELECT * FROM people_person WHERE age > 13 AND foo;
    SELECT * FROM people_person WHERE age > 13 AND foo;
    SELECT * FROM people_photo WHERE id = 42;
    SELECT * FROM people_photo WHERE id = 42;
    SELECT * FROM people_photo WHERE id = 31;
    SELECT * FROM people_photo WHERE id = 31;
    SELECT * FROM people_photo WHERE id = 214;
    SELECT * FROM people_photo WHERE id = 214;
    SELECT * FROM people_photo WHERE id = 84;
    SELECT * FROM people_photo WHERE id = 84;
    SELECT * FROM people_photo WHERE id = 28;
    SELECT * FROM people_photo WHERE id = 28;
    SELECT * FROM people_photo WHERE id = 90;
    SELECT * FROM people_photo WHERE id = 90;

    View Slide

  19. SELECT * FROM people_person AS pr
    SELECT * FROM people_person AS pr
    INNER JOIN people_photo AS ph ON ph.id = pr.photo_id
    INNER JOIN people_photo AS ph ON ph.id = pr.photo_id
    WHERE age > 13 AND foo;
    WHERE age > 13 AND foo;

    View Slide

  20. · At most 100 per page
    · At most 100 per page
    Preferably, you want 20 or less
    Preferably, you want 20 or less

    View Slide

  21. · Log slow queries
    · Log slow queries
    There's settings for this in PostgreSQL/MySQL
    There's settings for this in PostgreSQL/MySQL

    View Slide

  22. · Analyse slow queries
    · Analyse slow queries
    See if they need indexes
    See if they need indexes

    View Slide

  23. # explain select * from aeblog_post where id = 50;
    # explain select * from aeblog_post where id = 50;
    QUERY PLAN
    QUERY PLAN
    ---------------------------------------------------
    ---------------------------------------------------
    Index Scan using aeblog_post_pkey on aeblog_post
    Index Scan using aeblog_post_pkey on aeblog_post
    (cost=0.00..8.27 rows=1 width=1812)
    (cost=0.00..8.27 rows=1 width=1812)
    Index Cond: (id = 50)
    Index Cond: (id = 50)

    View Slide

  24. # explain select * from aeblog_post
    # explain select * from aeblog_post
    where section = 'trip-notes';
    where section = 'trip-notes';
    QUERY PLAN
    QUERY PLAN
    ---------------------------------------------------
    ---------------------------------------------------
    Seq Scan on aeblog_post (cost=0.00..14.21 rows=1
    Seq Scan on aeblog_post (cost=0.00..14.21 rows=1
    width=1812)
    width=1812)
    Filter: ((section)::text = 'trip-notes'::text)
    Filter: ((section)::text = 'trip-notes'::text)

    View Slide

  25. · Don't fetch every column
    · Don't fetch every column
    Especially if you have big ones
    Especially if you have big ones

    View Slide

  26. Less Downtime
    Less Downtime

    View Slide

  27. · Watch out for locks
    · Watch out for locks
    While doing big operations (e.g. adding columns)
    While doing big operations (e.g. adding columns)

    View Slide

  28. · Try not to remove columns
    · Try not to remove columns
    It's a lot easier for code to ignore new ones
    It's a lot easier for code to ignore new ones

    View Slide

  29. · Have a warm failover
    · Have a warm failover
    But not automatic failover, unless you know why
    But not automatic failover, unless you know why

    View Slide

  30. · Have a read-only mode
    · Have a read-only mode
    It means you never truly go down
    It means you never truly go down

    View Slide

  31. Column Types
    Column Types

    View Slide

  32. · Don't use text for everything
    · Don't use text for everything
    Especially numbers and dates
    Especially numbers and dates

    View Slide

  33. · Store all times in UTC
    · Store all times in UTC
    Convert to and from in the code.
    Convert to and from in the code.

    View Slide

  34. · Make columns NULLable
    · Make columns NULLable
    Rather than using blank strings.
    Rather than using blank strings.

    View Slide

  35. · Think about custom types
    · Think about custom types
    Geometry, IP addresses, JSON...
    Geometry, IP addresses, JSON...

    View Slide

  36. 'NoSQL'
    'NoSQL'

    View Slide

  37. · Use non-relational wisely
    · Use non-relational wisely
    Otherwise you'll make everything slower
    Otherwise you'll make everything slower

    View Slide

  38. · Have a single source of truth
    · Have a single source of truth
    So you can rebuild if something goes awry
    So you can rebuild if something goes awry

    View Slide

  39. · Enforce constraints in code
    · Enforce constraints in code
    Before your data becomes useless
    Before your data becomes useless

    View Slide

  40. · Postgres does schemaless too
    · Postgres does schemaless too
    hstore and now JSON column types
    hstore and now JSON column types

    View Slide

  41. Basically...
    Basically...
    flickr.com/stuckincustoms
    flickr.com/stuckincustoms

    View Slide

  42. · Less queries
    · Less queries

    View Slide

  43. · Less queries
    · Less queries
    · Less JOINs
    · Less JOINs

    View Slide

  44. · Less queries
    · Less queries
    · Less JOINs
    · Less JOINs
    · Better indexes
    · Better indexes

    View Slide

  45. · Less queries
    · Less queries
    · Less JOINs
    · Less JOINs
    · Better indexes
    · Better indexes
    · (Faster servers)
    · (Faster servers)

    View Slide

  46. Thanks.
    Thanks.
    Andrew Godwin / @andrewgodwin
    Andrew Godwin / @andrewgodwin
    flickr.com/oimax
    flickr.com/oimax

    View Slide