Socorro, crash-stats.mozilla.com and Postgres

Socorro, crash-stats.mozilla.com and Postgres

A brain dump of all the stuff related to Postgres and crash-stats.mozilla.com

4535e53ad45275fa955c5b05684342c5?s=128

Selena Deckelmann

March 25, 2014
Tweet

Transcript

  1. Postgres and You Selena Deckelmann WebEng Workweek Q1 2014

  2. Stuff to cover •How Postgres fits in • Migrations •

    Middleware, stored procedures, crontabber • How to love JSON • Writing Queries • Common Admin issues
  3. processors rabbitmq webapp crontabber middleware jenkins builds failing setupdb_app.py fakedata

  4. http://uncommonrose.com/ presentations/socorro-crash-flow-2014/

  5. socorro/external/postgresql/model.py schema socorro1 DB processor01 processor10 … sp-admin01 mware01 mware01

    … socorro1.webapp socorro4.webapp … raw_crashes processed_crashes reports Pretty much all the other tables Django-specific tables webapp-django/crashstats/* models
  6. None
  7. None
  8. socorro1 socorro3 WAL Socorro1 .dev Socorro1. stage base_backup copy Sunday

    noon PT streaming rep Prod socorro2 backup4 base_backup & pg_dump backup reporting1 WAL socorro-db-zeus-rw socorro-db-zeus-ro very architecture very architecture such replicas such replicas wow wow
  9. Making a test database PYTHONPATH=. \ socorro/external/postgresql/setupdb_app.py \ --database_name=socorro_test --dropdb

    PYTHONPATH=. \ socorro/external/postgresql/setupdb_app.py \ --database_name=socorro_test --dropdb \ --fakedata --fakedata_days=2
  10. What setupdb_app.py does • Connects as database superuser, creates a

    database, creates socorro users, disconnects • Connects as user, creates a schema based on socorro/external/postgresql/model.py • Optional: loads data using fakedata.py • Stamps alembic revision to latest
  11. Running tests make test-socorro or setupdb_app.py + nose

  12. Stuff to cover • How Postgres fits in •Migrations •

    Middleware, stored procedures, crontabber • How to love JSON • Writing Queries • Common Admin issues
  13. Running migrations cp config/alembic.ini-dist config/alembic.ini PYTHONPATH=. \ alembic -c config/alembic.ini

    revision \ --autogenerate -m “Fixes bug XX add table” alembic -c config/alembic.ini upgrade HEAD alembic -c config/alembic.ini downgrade -1 alembic -c config/alembic.ini history alembic -c config/alembic.ini current alembic -c config/alembic.ini branches
  14. Important files for alembic • socorro/lib/migrations.py • load_stored_proc(op, [procfile, ...

    ]) • fix_permissions(op, tablename) • socorro/lib/*type.py • citexttype.CitextType • jsontype.JsonType
  15. Most common issues • Failure to test migration (see: Makefile)

    • Problem with non-linear history (two migrations inherit from same parent) • Assumptions that don’t pan out on downgrade (protip: run downgrade then upgrade after a setupdb_app.py run)
  16. Stuff to cover • How Postgres fits in • Migrations

    •Middleware, stored procedures, crontabber • How to love JSON • Writing Queries
  17. Postgres DB table Internal Middleware API Django model webapp view

    Crontabber Job Stored procedure Crontabber Config Data from: reports_clean raw_crashes processed_crashes Data into: Brand new table! Lives in: socorro/cron/jobs Lives in: Socorro puppet module Lives in: socorro/external/ postgresql/*.py Lives in: webapp-django/ crashstats/crashstats/ model.py Lives in: webapp-django/ crashstats/crashstats/ views.py
  18. Stored Procedure • socorro/external/postgresql/raw_sql/procs • update_XXX.sql and backfill_XXX.sql • see

    also: backfill_named_table.sql • backfill_matviews.sql • TODO: invoke a crontabber job to backfill instead (better performance) • Tests? (fakedata may invoke)
  19. Adding a Postgres table • socorro/external/postgresql/models.py • socorro/external/postgresql/fakedata.py • maybe

    also: setupdb_app.py (but prob not) • Tests!
  20. Crontabber job • socorro/cron/jobs/matview.py • Remember to check dependencies •

    socorro/cron/crontabber.py - DEFAULT_JOBS
  21. Crontabber config • puppet: modules/socorro/files/stage/etc- socorro/crontabber.ini • puppet: modules/socorro/files/prod/etc- socorro/crontabber.ini

  22. Internal middleware • socorro/external/postgresql/XXX.py • see: socorro/external/postgresql/ crashes.py for an

    example • Expose a route to the new endpoint
  23. Django Model • And here’s where stuff gets hazy... •

    webapp-django/crashstats/crashstats/ models.py
  24. Webapp view • webapp-django/crashstats/crashstats/ views.py

  25. Problems

  26. Caching config? • Has solved performance problems in the past

    *cough* TCBS *cough*
  27. Complex middleware • Rob? :)

  28. Alternatives • What if we pull out aggregation data into

    a different data store? • Why? Extreme Simplification • Provide JSON aggregates, use D3
  29. Stuff to cover • How Postgres fits in • Migrations

    • Middleware, stored procedures, crontabber •How to love JSON • Writing Queries • Common Admin issues
  30. Our JSON is complicated

  31. { "client_crash_date":"2014-03-24 00:05:03.000000", "hang_type":0, "dump":"====PIPE DUMP ENDS===\n", "startedDateTime":"2014-03-24 00:00:13.050209", "java_stack_trace":null,

    "product":"Firefox", "crashedThread":null, "cpu_info":null, "pluginVersion":null, "install_age":2765858, "distributor":null, "topmost_filenames":"", "processor_notes":"sp-processor03_phx1_mozilla_com. 5265:2012; HybridCrashProcessor; MDSW emitted no header l ines; MDSW did not identify the crashing thread; CSignatu
  32. "addons":[ [ "ffxtlbr@metacrawler.com", "1.6.0" ], [ "{60364604-8b4c-42f4-a2ca-a76ca7b61b37}", "9.5.3.1" ], [

    "extension21810@extension21810.com", "0.94.61" ], [ "WebSiteRecommendation@weliketheweb.com", "1.1.2"
  33. with a as ( select processed_crash from processed_crashes limit 1

    ), b as ( select json_array_elements(processed_crash->'addons') as addons from a ) select addons->>0 from b;
  34. JSON operators • Extract JSON object: -> • Extract array

    element: ->>2 • Extract text element: ->>‘blah’ • JSON type (can be confusing cuz doesn’t automatically cast to TEXT)
  35. JSON functions • json_enhancements: https://bitbucket.org/IVC-Inc/ json_enhancements

  36. But, easier than reports • Big unanswered question: Can we

    kill the reports table? • advanced search • reports/list • update_reports_clean() • 3 reports THAT SHOULD DIE
  37. Upgrading to 9.3 • Would be nice. • Not a

    priority. • Might just go to 9.4 in the fall......... Discuss?
  38. Stuff to cover • How Postgres fits in • Migrations

    • Middleware, stored procedures, crontabber • How to love JSON •Writing Queries • Common Admin issues
  39. Protips! • Don’t ever use now()::timestamp (magical, immediate performance killer)

    • Run EXPLAIN, talk to a buddy • Check for unnecessary JOINs • Don’t expect local/stage performance to be similar to prod performance • Always use “WHERE date_processed ...” (more on this later!)
  40. Writing queries in python • Isolate your queries in a

    function with the first parameter as the connection • DO NOT PASS CURSORS AROUND
  41. Always use date_processed • WHERE date_processed BETWEEN ... AND ...

    • Tables affected: • raw_crashes, processed_crashes • reports, reports_clean • plugins, plugins_modules
  42. Avoid reports • Avoid using the reports table • Deprecation

    target AND usually slower than reports_clean • matview? Use raw_crashes or processed_crashes and make indexes!
  43. Adding indexes to partitioned tables • see report_partition_info • Happens

    weekly • To add to old partitions, do “manually” in a migration based on pg_class data
  44. Use EXPLAIN • Check for unnecessary JOINs (remove!) • Don’t

    SELECT * ever • Start with WITH • Test performance with EXPLAIN ANALYZE • ANALYZE tables
  45. Adding other indexes • Must exactly match predicate! • Best

    practice: Create in a transaction and test exactly the query you want to use the index to verify your index is correct
  46. INSERTs • ALWAYS EXPLICITLY NAME COLUMNS

  47. Altering tables • Never specify a DEFAULT in a CREATE

    or ALTER that creates a column on an existing table • Drop children before parents • Never LOCK EXCLUSIVE on parents unless you want to have a downtime • DROPs and ALTERs on children might deadlock with processors (they retry, NBD)
  48. Stuff to cover • How Postgres fits in • Migrations

    • Middleware, stored procedures, crontabber • How to love JSON • Writing Queries •Common Admin issues
  49. High load on master • Check if a backup is

    running (OOPS) SOLUTION: kill backup LONG TERM: Investigate why backup was running against master, correct configuration/process documents
  50. High load on master • Check if load is coming

    from middleware (cacheable query?) SOLUTION: kill middleware processes, sometimes have to restart middleware LONG TERM: Cache a query or refactor SQL
  51. High load on master • Check the logs :) (checkpoints,

    temp files) • Check for a Suspicious Process or Processes (Long running, memory consuming)
  52. Maintenance of tables • We now drop old partitions without

    an exclusive lock on the parent • This is mostly safe -- might screw up READ ONLY queries looking at the data at the time of the drops • Worth it, because otherwise we need downtime windows • TODO: Now can be automated!
  53. Maintenance of tables • Probably could be doing more •

    Probably not worth the time given that we drop old partitions • Newer Postgres will automatically FREEZE tables, reducing overhead of periodic VACUUMs for transaction wraparound