Socorro, and Postgres

Socorro, and Postgres

A brain dump of all the stuff related to Postgres and


Selena Deckelmann

March 25, 2014


  1. Postgres and You Selena Deckelmann WebEng Workweek Q1 2014

  2. Stuff to cover •How Postgres fits in • Migrations •

    Middleware, stored procedures, crontabber • How to love JSON • Writing Queries • Common Admin issues
  3. processors rabbitmq webapp crontabber middleware jenkins builds failing fakedata

  4. presentations/socorro-crash-flow-2014/

  5. socorro/external/postgresql/ schema socorro1 DB processor01 processor10 … sp-admin01 mware01 mware01

    … socorro1.webapp socorro4.webapp … raw_crashes processed_crashes reports Pretty much all the other tables Django-specific tables webapp-django/crashstats/* models
  6. None
  7. None
  8. socorro1 socorro3 WAL Socorro1 .dev Socorro1. stage base_backup copy Sunday

    noon PT streaming rep Prod socorro2 backup4 base_backup & pg_dump backup reporting1 WAL socorro-db-zeus-rw socorro-db-zeus-ro very architecture very architecture such replicas such replicas wow wow
  9. Making a test database PYTHONPATH=. \ socorro/external/postgresql/ \ --database_name=socorro_test --dropdb

    PYTHONPATH=. \ socorro/external/postgresql/ \ --database_name=socorro_test --dropdb \ --fakedata --fakedata_days=2
  10. What does • Connects as database superuser, creates a

    database, creates socorro users, disconnects • Connects as user, creates a schema based on socorro/external/postgresql/ • Optional: loads data using • Stamps alembic revision to latest
  11. Running tests make test-socorro or + nose

  12. Stuff to cover • How Postgres fits in •Migrations •

    Middleware, stored procedures, crontabber • How to love JSON • Writing Queries • Common Admin issues
  13. Running migrations cp config/alembic.ini-dist config/alembic.ini PYTHONPATH=. \ alembic -c config/alembic.ini

    revision \ --autogenerate -m “Fixes bug XX add table” alembic -c config/alembic.ini upgrade HEAD alembic -c config/alembic.ini downgrade -1 alembic -c config/alembic.ini history alembic -c config/alembic.ini current alembic -c config/alembic.ini branches
  14. Important files for alembic • socorro/lib/ • load_stored_proc(op, [procfile, ...

    ]) • fix_permissions(op, tablename) • socorro/lib/* • citexttype.CitextType • jsontype.JsonType
  15. Most common issues • Failure to test migration (see: Makefile)

    • Problem with non-linear history (two migrations inherit from same parent) • Assumptions that don’t pan out on downgrade (protip: run downgrade then upgrade after a run)
  16. Stuff to cover • How Postgres fits in • Migrations

    •Middleware, stored procedures, crontabber • How to love JSON • Writing Queries
  17. Postgres DB table Internal Middleware API Django model webapp view

    Crontabber Job Stored procedure Crontabber Config Data from: reports_clean raw_crashes processed_crashes Data into: Brand new table! Lives in: socorro/cron/jobs Lives in: Socorro puppet module Lives in: socorro/external/ postgresql/*.py Lives in: webapp-django/ crashstats/crashstats/ Lives in: webapp-django/ crashstats/crashstats/
  18. Stored Procedure • socorro/external/postgresql/raw_sql/procs • update_XXX.sql and backfill_XXX.sql • see

    also: backfill_named_table.sql • backfill_matviews.sql • TODO: invoke a crontabber job to backfill instead (better performance) • Tests? (fakedata may invoke)
  19. Adding a Postgres table • socorro/external/postgresql/ • socorro/external/postgresql/ • maybe

    also: (but prob not) • Tests!
  20. Crontabber job • socorro/cron/jobs/ • Remember to check dependencies •

    socorro/cron/ - DEFAULT_JOBS
  21. Crontabber config • puppet: modules/socorro/files/stage/etc- socorro/crontabber.ini • puppet: modules/socorro/files/prod/etc- socorro/crontabber.ini

  22. Internal middleware • socorro/external/postgresql/ • see: socorro/external/postgresql/ for an

    example • Expose a route to the new endpoint
  23. Django Model • And here’s where stuff gets hazy... •

  24. Webapp view • webapp-django/crashstats/crashstats/

  25. Problems

  26. Caching config? • Has solved performance problems in the past

    *cough* TCBS *cough*
  27. Complex middleware • Rob? :)

  28. Alternatives • What if we pull out aggregation data into

    a different data store? • Why? Extreme Simplification • Provide JSON aggregates, use D3
  29. Stuff to cover • How Postgres fits in • Migrations

    • Middleware, stored procedures, crontabber •How to love JSON • Writing Queries • Common Admin issues
  30. Our JSON is complicated

  31. { "client_crash_date":"2014-03-24 00:05:03.000000", "hang_type":0, "dump":"====PIPE DUMP ENDS===\n", "startedDateTime":"2014-03-24 00:00:13.050209", "java_stack_trace":null,

    "product":"Firefox", "crashedThread":null, "cpu_info":null, "pluginVersion":null, "install_age":2765858, "distributor":null, "topmost_filenames":"", "processor_notes":"sp-processor03_phx1_mozilla_com. 5265:2012; HybridCrashProcessor; MDSW emitted no header l ines; MDSW did not identify the crashing thread; CSignatu
  32. "addons":[ [ "", "1.6.0" ], [ "{60364604-8b4c-42f4-a2ca-a76ca7b61b37}", "" ], [

    "", "0.94.61" ], [ "", "1.1.2"
  33. with a as ( select processed_crash from processed_crashes limit 1

    ), b as ( select json_array_elements(processed_crash->'addons') as addons from a ) select addons->>0 from b;
  34. JSON operators • Extract JSON object: -> • Extract array

    element: ->>2 • Extract text element: ->>‘blah’ • JSON type (can be confusing cuz doesn’t automatically cast to TEXT)
  35. JSON functions • json_enhancements: json_enhancements

  36. But, easier than reports • Big unanswered question: Can we

    kill the reports table? • advanced search • reports/list • update_reports_clean() • 3 reports THAT SHOULD DIE
  37. Upgrading to 9.3 • Would be nice. • Not a

    priority. • Might just go to 9.4 in the fall......... Discuss?
  38. Stuff to cover • How Postgres fits in • Migrations

    • Middleware, stored procedures, crontabber • How to love JSON •Writing Queries • Common Admin issues
  39. Protips! • Don’t ever use now()::timestamp (magical, immediate performance killer)

    • Run EXPLAIN, talk to a buddy • Check for unnecessary JOINs • Don’t expect local/stage performance to be similar to prod performance • Always use “WHERE date_processed ...” (more on this later!)
  40. Writing queries in python • Isolate your queries in a

    function with the first parameter as the connection • DO NOT PASS CURSORS AROUND
  41. Always use date_processed • WHERE date_processed BETWEEN ... AND ...

    • Tables affected: • raw_crashes, processed_crashes • reports, reports_clean • plugins, plugins_modules
  42. Avoid reports • Avoid using the reports table • Deprecation

    target AND usually slower than reports_clean • matview? Use raw_crashes or processed_crashes and make indexes!
  43. Adding indexes to partitioned tables • see report_partition_info • Happens

    weekly • To add to old partitions, do “manually” in a migration based on pg_class data
  44. Use EXPLAIN • Check for unnecessary JOINs (remove!) • Don’t

    SELECT * ever • Start with WITH • Test performance with EXPLAIN ANALYZE • ANALYZE tables
  45. Adding other indexes • Must exactly match predicate! • Best

    practice: Create in a transaction and test exactly the query you want to use the index to verify your index is correct

  47. Altering tables • Never specify a DEFAULT in a CREATE

    or ALTER that creates a column on an existing table • Drop children before parents • Never LOCK EXCLUSIVE on parents unless you want to have a downtime • DROPs and ALTERs on children might deadlock with processors (they retry, NBD)
  48. Stuff to cover • How Postgres fits in • Migrations

    • Middleware, stored procedures, crontabber • How to love JSON • Writing Queries •Common Admin issues
  49. High load on master • Check if a backup is

    running (OOPS) SOLUTION: kill backup LONG TERM: Investigate why backup was running against master, correct configuration/process documents
  50. High load on master • Check if load is coming

    from middleware (cacheable query?) SOLUTION: kill middleware processes, sometimes have to restart middleware LONG TERM: Cache a query or refactor SQL
  51. High load on master • Check the logs :) (checkpoints,

    temp files) • Check for a Suspicious Process or Processes (Long running, memory consuming)
  52. Maintenance of tables • We now drop old partitions without

    an exclusive lock on the parent • This is mostly safe -- might screw up READ ONLY queries looking at the data at the time of the drops • Worth it, because otherwise we need downtime windows • TODO: Now can be automated!
  53. Maintenance of tables • Probably could be doing more •

    Probably not worth the time given that we drop old partitions • Newer Postgres will automatically FREEZE tables, reducing overhead of periodic VACUUMs for transaction wraparound