Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Socorro, crash-stats.mozilla.com and Postgres

Socorro, crash-stats.mozilla.com and Postgres

A brain dump of all the stuff related to Postgres and crash-stats.mozilla.com

Selena Deckelmann

March 25, 2014

More Decks by Selena Deckelmann

Other Decks in Technology


  1. Stuff to cover •How Postgres fits in • Migrations •

    Middleware, stored procedures, crontabber • How to love JSON • Writing Queries • Common Admin issues
  2. socorro/external/postgresql/model.py schema socorro1 DB processor01 processor10 … sp-admin01 mware01 mware01

    … socorro1.webapp socorro4.webapp … raw_crashes processed_crashes reports Pretty much all the other tables Django-specific tables webapp-django/crashstats/* models
  3. socorro1 socorro3 WAL Socorro1 .dev Socorro1. stage base_backup copy Sunday

    noon PT streaming rep Prod socorro2 backup4 base_backup & pg_dump backup reporting1 WAL socorro-db-zeus-rw socorro-db-zeus-ro very architecture very architecture such replicas such replicas wow wow
  4. Making a test database PYTHONPATH=. \ socorro/external/postgresql/setupdb_app.py \ --database_name=socorro_test --dropdb

    PYTHONPATH=. \ socorro/external/postgresql/setupdb_app.py \ --database_name=socorro_test --dropdb \ --fakedata --fakedata_days=2
  5. What setupdb_app.py does • Connects as database superuser, creates a

    database, creates socorro users, disconnects • Connects as user, creates a schema based on socorro/external/postgresql/model.py • Optional: loads data using fakedata.py • Stamps alembic revision to latest
  6. Stuff to cover • How Postgres fits in •Migrations •

    Middleware, stored procedures, crontabber • How to love JSON • Writing Queries • Common Admin issues
  7. Running migrations cp config/alembic.ini-dist config/alembic.ini PYTHONPATH=. \ alembic -c config/alembic.ini

    revision \ --autogenerate -m “Fixes bug XX add table” alembic -c config/alembic.ini upgrade HEAD alembic -c config/alembic.ini downgrade -1 alembic -c config/alembic.ini history alembic -c config/alembic.ini current alembic -c config/alembic.ini branches
  8. Important files for alembic • socorro/lib/migrations.py • load_stored_proc(op, [procfile, ...

    ]) • fix_permissions(op, tablename) • socorro/lib/*type.py • citexttype.CitextType • jsontype.JsonType
  9. Most common issues • Failure to test migration (see: Makefile)

    • Problem with non-linear history (two migrations inherit from same parent) • Assumptions that don’t pan out on downgrade (protip: run downgrade then upgrade after a setupdb_app.py run)
  10. Stuff to cover • How Postgres fits in • Migrations

    •Middleware, stored procedures, crontabber • How to love JSON • Writing Queries
  11. Postgres DB table Internal Middleware API Django model webapp view

    Crontabber Job Stored procedure Crontabber Config Data from: reports_clean raw_crashes processed_crashes Data into: Brand new table! Lives in: socorro/cron/jobs Lives in: Socorro puppet module Lives in: socorro/external/ postgresql/*.py Lives in: webapp-django/ crashstats/crashstats/ model.py Lives in: webapp-django/ crashstats/crashstats/ views.py
  12. Stored Procedure • socorro/external/postgresql/raw_sql/procs • update_XXX.sql and backfill_XXX.sql • see

    also: backfill_named_table.sql • backfill_matviews.sql • TODO: invoke a crontabber job to backfill instead (better performance) • Tests? (fakedata may invoke)
  13. Django Model • And here’s where stuff gets hazy... •

    webapp-django/crashstats/crashstats/ models.py
  14. Alternatives • What if we pull out aggregation data into

    a different data store? • Why? Extreme Simplification • Provide JSON aggregates, use D3
  15. Stuff to cover • How Postgres fits in • Migrations

    • Middleware, stored procedures, crontabber •How to love JSON • Writing Queries • Common Admin issues
  16. { "client_crash_date":"2014-03-24 00:05:03.000000", "hang_type":0, "dump":"====PIPE DUMP ENDS===\n", "startedDateTime":"2014-03-24 00:00:13.050209", "java_stack_trace":null,

    "product":"Firefox", "crashedThread":null, "cpu_info":null, "pluginVersion":null, "install_age":2765858, "distributor":null, "topmost_filenames":"", "processor_notes":"sp-processor03_phx1_mozilla_com. 5265:2012; HybridCrashProcessor; MDSW emitted no header l ines; MDSW did not identify the crashing thread; CSignatu
  17. with a as ( select processed_crash from processed_crashes limit 1

    ), b as ( select json_array_elements(processed_crash->'addons') as addons from a ) select addons->>0 from b;
  18. JSON operators • Extract JSON object: -> • Extract array

    element: ->>2 • Extract text element: ->>‘blah’ • JSON type (can be confusing cuz doesn’t automatically cast to TEXT)
  19. But, easier than reports • Big unanswered question: Can we

    kill the reports table? • advanced search • reports/list • update_reports_clean() • 3 reports THAT SHOULD DIE
  20. Upgrading to 9.3 • Would be nice. • Not a

    priority. • Might just go to 9.4 in the fall......... Discuss?
  21. Stuff to cover • How Postgres fits in • Migrations

    • Middleware, stored procedures, crontabber • How to love JSON •Writing Queries • Common Admin issues
  22. Protips! • Don’t ever use now()::timestamp (magical, immediate performance killer)

    • Run EXPLAIN, talk to a buddy • Check for unnecessary JOINs • Don’t expect local/stage performance to be similar to prod performance • Always use “WHERE date_processed ...” (more on this later!)
  23. Writing queries in python • Isolate your queries in a

    function with the first parameter as the connection • DO NOT PASS CURSORS AROUND
  24. Always use date_processed • WHERE date_processed BETWEEN ... AND ...

    • Tables affected: • raw_crashes, processed_crashes • reports, reports_clean • plugins, plugins_modules
  25. Avoid reports • Avoid using the reports table • Deprecation

    target AND usually slower than reports_clean • matview? Use raw_crashes or processed_crashes and make indexes!
  26. Adding indexes to partitioned tables • see report_partition_info • Happens

    weekly • To add to old partitions, do “manually” in a migration based on pg_class data
  27. Use EXPLAIN • Check for unnecessary JOINs (remove!) • Don’t

    SELECT * ever • Start with WITH • Test performance with EXPLAIN ANALYZE • ANALYZE tables
  28. Adding other indexes • Must exactly match predicate! • Best

    practice: Create in a transaction and test exactly the query you want to use the index to verify your index is correct
  29. Altering tables • Never specify a DEFAULT in a CREATE

    or ALTER that creates a column on an existing table • Drop children before parents • Never LOCK EXCLUSIVE on parents unless you want to have a downtime • DROPs and ALTERs on children might deadlock with processors (they retry, NBD)
  30. Stuff to cover • How Postgres fits in • Migrations

    • Middleware, stored procedures, crontabber • How to love JSON • Writing Queries •Common Admin issues
  31. High load on master • Check if a backup is

    running (OOPS) SOLUTION: kill backup LONG TERM: Investigate why backup was running against master, correct configuration/process documents
  32. High load on master • Check if load is coming

    from middleware (cacheable query?) SOLUTION: kill middleware processes, sometimes have to restart middleware LONG TERM: Cache a query or refactor SQL
  33. High load on master • Check the logs :) (checkpoints,

    temp files) • Check for a Suspicious Process or Processes (Long running, memory consuming)
  34. Maintenance of tables • We now drop old partitions without

    an exclusive lock on the parent • This is mostly safe -- might screw up READ ONLY queries looking at the data at the time of the drops • Worth it, because otherwise we need downtime windows • TODO: Now can be automated!
  35. Maintenance of tables • Probably could be doing more •

    Probably not worth the time given that we drop old partitions • Newer Postgres will automatically FREEZE tables, reducing overhead of periodic VACUUMs for transaction wraparound