Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Optimizing your Django QuerySet Performance

Ben Greene
November 21, 2013

Optimizing your Django QuerySet Performance

Breaking the Speed Limit with Django QuerySets: learn no-duh, oh-okay, and holy-moly methods to make your Django QuerySets fast.

Ben Greene

November 21, 2013
Tweet

More Decks by Ben Greene

Other Decks in Programming

Transcript

  1. Breaking the Speed Limit with Django QuerySets Django Boston /

    November 21st, 2013 Ben Greene / [email protected] / @bengreene / #heybengreenewassup
  2. Me.

  3. Test datasets S M L XL Employees Campaigns OpportunityStages Leads

    CampaignLeads Opportunities Deals 25 25 25 25 25 25 25 25 10 10 10 10 1,000 10,000 100,000 1,000,000 3,009 30,064 299,856 2,998,044 100 1,000 10,000 100,000 17 103 975 10,055
  4. Profiling • Django Debug Toolbar • Repoze • New Relic

    • Hotshot • Django Live Profiler • django-snippetscream • ... and of course, MySQL
  5. Data size and complexity • Don't load objects you don't

    need: opportunity.lead.id vs. opportunity.lead_id • Don't load fields you don't need: qs.only('name') • Load specific data, not entire objects: qs.values(), qs.values_list() Some basic techniques
  6. Puzzler: Caching (cont'd) deals = list(Deal.objects.order_by("-value")) And how many queries

    now? 1 print deals[0].value print deals[0].value print deals[0].value print deals[1].value print deals[1].value print deals[1].value print deals[2].value print deals[2].value print deals[2].value . . .
  7. Beware of queries in loops! data = [] opps =

    Opportunity.objects.filter(value__gte=5000) for opp in opps: data.append('%s ($%s): %s' % (opp.name, opp.value, opp.stage.name)) SELECT `main_opportunity`.`id`, `main_opportunity`.`owner_id`, `main_opportunity`.`stage_id`, `main_opportunity`.`lead_id`, `main_opportunity`.`created_date`, `main_opportunity`.`name`, `main_opportunity`.`value`, `main_opportunity`.`closed_date` FROM `main_opportunity` WHERE`main_opportunity`.`value` >= 5000 SELECT `main_opportunitystage`.`id`, `main_opportunitystage`.`name`, `main_opportunitystage`.`is_active`, `main_opportunitystage`.`is_closed` FROM `main_opportunitystage` WHERE `main_opportunitystage`.`id` = 75 SELECT `main_opportunitystage`.`id`, `main_opportunitystage`.`name`, `main_opportunitystage`.`is_active`, `main_opportunitystage`.`is_closed` FROM `main_opportunitystage` WHERE `main_opportunitystage`.`id` = 73 SELECT `main_opportunitystage`.`id`, `main_opportunitystage`.`name`, `main_opportunitystage`.`is_active`, `main_opportunitystage`.`is_closed` FROM `main_opportunitystage` WHERE `main_opportunitystage`.`id` = 79 . . . N+1 queries
  8. Reduce queries with select_related() data = [] opps = Opportunity.objects.filter(value__gte=5000).select_related('stage')

    for opp in opps: data.append('%s ($%s): %s' % (opp.name, opp.value, opp.stage.name)) SELECT `main_opportunity`.`id`, `main_opportunity`.`owner_id`, `main_opportunity`.`stage_id`, `main_opportunity`.`lead_id`, `main_opportunity`.`created_date`, `main_opportunity`.`name`, `main_opportunity`.`value`, `main_opportunity`.`closed_date`, `main_opportunitystage`.`id`, `main_opportunitystage`.`name`, `main_opportunitystage`.`is_active`, `main_opportunitystage`.`is_closed` FROM `main_opportunity` INNER JOIN`main_opportunitystage` ON ( `main_opportunity`.`stage_id` = `main_opportunitystage`.`id` )WHERE `main_opportunity`.`value` >= 5000 1 query
  9. More complex querysets Lead.objects.filter(campaign_set__campaign__name__startswith='PyCon') explain SELECT `main_lead`.`id`, `main_lead`.`owner_id`, `main_lead`.`name`, `main_lead`.`created_date`,

    `main_lead`.`lead_score` FROM `main_lead` INNER JOIN `main_campaignlead` ON ( `main_lead`.`id` = `main_campaignlead`.`lead_id` ) INNER JOIN `main_campaign` ON ( `main_campaignlead`.`campaign_id` = `main_campaign`.`id` ) WHERE `main_campaign`.`name` LIKE BINARY "PyCon%"; +----+-------------+-------------------+--------+-------------------------------------------------------+----------------------------+--------- +-----------------------------------------+------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------------------+--------+-------------------------------------------------------+----------------------------+--------- +-----------------------------------------+------+-------------+ | 1 | SIMPLE | main_campaign | ALL | PRIMARY | NULL | NULL | NULL | 25 | Using where | | 1 | SIMPLE | main_campaignlead | ref | main_campaignlead_b6460dac,main_campaignlead_6cd0afe2 | main_campaignlead_6cd0afe2 | 4 | yalytics_1000.main_campaign.id | 120 | NULL | | 1 | SIMPLE | main_lead | eq_ref | PRIMARY | PRIMARY | 4 | yalytics_1000.main_campaignlead.lead_id | 1 | NULL | +----+-------------+-------------------+--------+-------------------------------------------------------+----------------------------+--------- +-----------------------------------------+------+-------------+
  10. class Campaign(models.Model): name = models.CharField(max_length=255) external_id = models.CharField(max_length=255) class Meta:

    index_together = [ ['name', 'id'], ] . . . ben@nero $ ./manage.py schemamigration main --auto . . . class Migration(SchemaMigration): def forwards(self, orm): # Adding index on 'Campaign', fields ['name', u'id'] db.create_index(u'main_campaign', ['name', u'id']) . . . ben@nero $ ./manage.py migrate main
  11. Explain, revisited Lead.objects.filter(campaign_set__campaign__name__startswith='PyCon') explain SELECT `main_lead`.`id`, `main_lead`.`owner_id`, `main_lead`.`name`, `main_lead`.`created_date`, `main_lead`.`lead_score`

    FROM `main_lead` INNER JOIN `main_campaignlead` ON ( `main_lead`.`id` = `main_campaignlead`.`lead_id` ) INNER JOIN `main_campaign` ON ( `main_campaignlead`.`campaign_id` = `main_campaign`.`id` ) WHERE `main_campaign`.`name` LIKE BINARY "PyCon%"; +----+-------------+-------------------+--------+-------------------------------------------------------+----------------------------+--------- +--------------------------------------------+------+--------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------------------+--------+-------------------------------------------------------+----------------------------+--------- +--------------------------------------------+------+--------------------------+ | 1 | SIMPLE | main_campaign | index | PRIMARY,name_id | name_id | 771 | NULL | 25 | Using where; Using index | | 1 | SIMPLE | main_campaignlead | ref | main_campaignlead_b6460dac,main_campaignlead_6cd0afe2 | main_campaignlead_6cd0afe2 | 4 | yalytics_million.main_campaign.id | 200 | NULL | | 1 | SIMPLE | main_lead | eq_ref | PRIMARY | PRIMARY | 4 | yalytics_million.main_campaignlead.lead_id | 1 | NULL | +----+-------------+-------------------+--------+-------------------------------------------------------+----------------------------+--------- +--------------------------------------------+------+--------------------------+
  12. Another scenario # Given very_complex_leads_queryset = ... # Need deals

    off opps from those leads, and then some Deal.objects.filter(opportunity__lead__in=very_complex_leads_queryset) .filter(...) .order_by(...) Sad truth: There does come a point at which MySQL can't handle our the madness anymore.
  13. Possible solution! # Given very_complex_leads_queryset = ... very_complex_leads_queryset_ids = list(very_complex_leads_queryset.values('id')

    # Need deals off opps from those leads, and then some Deal.objects.filter(opportunity__lead_id__in=very_complex_leads_queryset_ids) .filter(...) .order_by(...) Bad news: This ALSO sucks, because Django.
  14. Django vs. MySQL Django QuerySets aren't great at loading large

    amounts of data from MySQL, and they're even worse at sending them back. Additionally, MySQL can't optimize a join against 100,000 random ids. What it CAN do is sh*t the bed.
  15. class TemporaryID(models.Model): class Meta: abstract = True def tempify_queryset(base_queryset, temp_field_name):

    source_id_column_name = 'id' target_queryset = base_queryset.values(source_id_column_name) temp_field = models.ForeignKey(target_queryset.model,primary_key=True,null=False,db_column=source_id_column_name) temp_class_name = "Temp_%s_%s" % (random.randint(100,999), datetime.datetime.now().strftime('%s_%f')[::-1]) temp_class = type(temp_class_name, (TemporaryID,), {temp_field_name:temp_field, '__module__':__name__}) temp_app_label = temp_class._meta.app_label temp_table_name = ("%s_%s" % (temp_app_label, temp_class_name)).lower()[:64] temp_class._meta.db_table = temp_class.Meta.db_table = temp_table_name query_compiler = target_queryset.query.get_compiler(using=target_queryset.db) try: sql, params = query_compiler.as_sql() except EmptyResultSet: return_queryset = base_queryset.none() else: create_table_param_dict = { "temp_table_name": temp_table_name, "source_id_column_name": source_id_column_name, "sql": sql } create_table_sql = "CREATE TEMPORARY TABLE %(temp_table_name)s (%(source_id_column_name)s INT NOT NULL, \ PRIMARY KEY (%(source_id_column_name)s)) AS (%(sql)s)" % create_table_param_dict cursor = connections[target_queryset.db].cursor() cursor.execute(create_table_sql, params) return_queryset = temp_class.objects.values(temp_field_name) return return_queryset
  16. # Given very_complex_leads_queryset = ... # Calculate the query but

    keep it in MySQL and build a queryset to reference it temp_queryset = tempify_queryset(very_complex_leads_queryset) # Now doing a simple join against an indexed already-in-memory table (holla) Deal.objects.filter(opportunity__lead_id__in=temp_queryset) .filter(...) .order_by(...) • calculated IDs are stored in memory by MySQL, so no cost to transfer • temporary table is fully-indexed, making it very fast for joining • temp_queryset is a fully-fledged QuerySet - no functionality lost* • the whole process has negligible overhead.