redesign: changed one of the core models to fit business logic ◦ Schema migration ◦ Data migration • Statistics on the admin page • Successfully deployed to dev and qa
I was ready for it • Production was down during 5 hours ◦ Kernel Panic! • I deployed the previous version and restore DB from snapshot – lost last 3 hours of data
request at all • Memory usage was fine • CPU was fine • Network was fine • Actually, Django was responding with HUGE latency ◦ the best case was 5 minutes, to the simplest request!
40 minutes → 7 minutes ◦ select_related • Move all long running tasks to celery tasks • To prevent race between celery and django we run them on separate instances
the disk space ◦ The free disk space metric has reverse sawtooth form • Super hot fix: turn off metric task ◦ The free disk space metric have the same period as the periodic task for calculating metrics
to use the raw query in django ◦ There is no reasons to do so • Attempts: ◦ Remove metrics that requires CASEs ◦ Reduce amount of COUNTs and JOINs ◦ Remove DISTINCT – Fetch row by row ◦ Use one query for each metric
• Deploy more frequently • Do not use data migrations as is – Use commands • Django admin is not efficient for aggregation queries • Analyze and synthesize are matter