$30 off During Our Annual Pro Sale. View Details »

Socorro, crash-stats.mozilla.com and Postgres

Socorro, crash-stats.mozilla.com and Postgres

A brain dump of all the stuff related to Postgres and crash-stats.mozilla.com

Selena Deckelmann

March 25, 2014
Tweet

More Decks by Selena Deckelmann

Other Decks in Technology

Transcript

  1. Postgres and You
    Selena Deckelmann
    WebEng Workweek Q1 2014

    View Slide

  2. Stuff to cover
    •How Postgres fits in
    • Migrations
    • Middleware, stored procedures, crontabber
    • How to love JSON
    • Writing Queries
    • Common Admin issues

    View Slide

  3. processors
    rabbitmq
    webapp
    crontabber
    middleware
    jenkins builds failing
    setupdb_app.py
    fakedata

    View Slide

  4. http://uncommonrose.com/
    presentations/socorro-crash-flow-2014/

    View Slide

  5. socorro/external/postgresql/model.py
    schema
    socorro1
    DB
    processor01
    processor10

    sp-admin01
    mware01
    mware01

    socorro1.webapp
    socorro4.webapp

    raw_crashes
    processed_crashes
    reports
    Pretty much all
    the other tables
    Django-specific tables
    webapp-django/crashstats/* models

    View Slide

  6. View Slide

  7. View Slide

  8. socorro1 socorro3
    WAL
    Socorro1
    .dev
    Socorro1.
    stage
    base_backup copy
    Sunday noon PT
    streaming rep
    Prod
    socorro2
    backup4
    base_backup &
    pg_dump backup
    reporting1
    WAL
    socorro-db-zeus-rw
    socorro-db-zeus-ro
    very architecture
    very architecture
    such replicas
    such replicas
    wow
    wow

    View Slide

  9. Making a test database
    PYTHONPATH=. \
    socorro/external/postgresql/setupdb_app.py \
    --database_name=socorro_test --dropdb
    PYTHONPATH=. \
    socorro/external/postgresql/setupdb_app.py \
    --database_name=socorro_test --dropdb \
    --fakedata --fakedata_days=2

    View Slide

  10. What setupdb_app.py
    does
    • Connects as database superuser, creates a
    database, creates socorro users,
    disconnects
    • Connects as user, creates a schema based
    on socorro/external/postgresql/model.py
    • Optional: loads data using fakedata.py
    • Stamps alembic revision to latest

    View Slide

  11. Running tests
    make test-socorro
    or
    setupdb_app.py + nose

    View Slide

  12. Stuff to cover
    • How Postgres fits in
    •Migrations
    • Middleware, stored procedures, crontabber
    • How to love JSON
    • Writing Queries
    • Common Admin issues

    View Slide

  13. Running migrations
    cp config/alembic.ini-dist config/alembic.ini
    PYTHONPATH=. \
    alembic -c config/alembic.ini revision \
    --autogenerate -m “Fixes bug XX add table”
    alembic -c config/alembic.ini upgrade HEAD
    alembic -c config/alembic.ini downgrade -1
    alembic -c config/alembic.ini history
    alembic -c config/alembic.ini current
    alembic -c config/alembic.ini branches

    View Slide

  14. Important files for
    alembic
    • socorro/lib/migrations.py
    • load_stored_proc(op, [procfile, ... ])
    • fix_permissions(op, tablename)
    • socorro/lib/*type.py
    • citexttype.CitextType
    • jsontype.JsonType

    View Slide

  15. Most common issues
    • Failure to test migration (see: Makefile)
    • Problem with non-linear history (two
    migrations inherit from same parent)
    • Assumptions that don’t pan out on
    downgrade (protip: run downgrade then
    upgrade after a setupdb_app.py run)

    View Slide

  16. Stuff to cover
    • How Postgres fits in
    • Migrations
    •Middleware, stored procedures,
    crontabber
    • How to love JSON
    • Writing Queries

    View Slide

  17. Postgres DB
    table
    Internal
    Middleware API
    Django model webapp view
    Crontabber Job
    Stored procedure Crontabber Config
    Data from:
    reports_clean
    raw_crashes
    processed_crashes
    Data into:
    Brand new table!
    Lives in:
    socorro/cron/jobs
    Lives in:
    Socorro puppet
    module
    Lives in:
    socorro/external/
    postgresql/*.py
    Lives in:
    webapp-django/
    crashstats/crashstats/
    model.py
    Lives in:
    webapp-django/
    crashstats/crashstats/
    views.py

    View Slide

  18. Stored Procedure
    • socorro/external/postgresql/raw_sql/procs
    • update_XXX.sql and backfill_XXX.sql
    • see also: backfill_named_table.sql
    • backfill_matviews.sql
    • TODO: invoke a crontabber job to
    backfill instead (better performance)
    • Tests? (fakedata may invoke)

    View Slide

  19. Adding a Postgres table
    • socorro/external/postgresql/models.py
    • socorro/external/postgresql/fakedata.py
    • maybe also: setupdb_app.py (but prob not)
    • Tests!

    View Slide

  20. Crontabber job
    • socorro/cron/jobs/matview.py
    • Remember to check dependencies
    • socorro/cron/crontabber.py -
    DEFAULT_JOBS

    View Slide

  21. Crontabber config
    • puppet: modules/socorro/files/stage/etc-
    socorro/crontabber.ini
    • puppet: modules/socorro/files/prod/etc-
    socorro/crontabber.ini

    View Slide

  22. Internal middleware
    • socorro/external/postgresql/XXX.py
    • see: socorro/external/postgresql/
    crashes.py for an example
    • Expose a route to the new endpoint

    View Slide

  23. Django Model
    • And here’s where stuff gets hazy...
    • webapp-django/crashstats/crashstats/
    models.py

    View Slide

  24. Webapp view
    • webapp-django/crashstats/crashstats/
    views.py

    View Slide

  25. Problems

    View Slide

  26. Caching config?
    • Has solved performance problems in the
    past *cough* TCBS *cough*

    View Slide

  27. Complex middleware
    • Rob? :)

    View Slide

  28. Alternatives
    • What if we pull out aggregation data into a
    different data store?
    • Why? Extreme Simplification
    • Provide JSON aggregates, use D3

    View Slide

  29. Stuff to cover
    • How Postgres fits in
    • Migrations
    • Middleware, stored procedures, crontabber
    •How to love JSON
    • Writing Queries
    • Common Admin issues

    View Slide

  30. Our JSON is
    complicated

    View Slide

  31. {
    "client_crash_date":"2014-03-24 00:05:03.000000",
    "hang_type":0,
    "dump":"====PIPE DUMP ENDS===\n",
    "startedDateTime":"2014-03-24 00:00:13.050209",
    "java_stack_trace":null,
    "product":"Firefox",
    "crashedThread":null,
    "cpu_info":null,
    "pluginVersion":null,
    "install_age":2765858,
    "distributor":null,
    "topmost_filenames":"",
    "processor_notes":"sp-processor03_phx1_mozilla_com.
    5265:2012; HybridCrashProcessor; MDSW emitted no header l
    ines; MDSW did not identify the crashing thread; CSignatu

    View Slide

  32. "addons":[
    [
    "[email protected]",
    "1.6.0"
    ],
    [
    "{60364604-8b4c-42f4-a2ca-a76ca7b61b37}",
    "9.5.3.1"
    ],
    [
    "[email protected]",
    "0.94.61"
    ],
    [
    "[email protected]",
    "1.1.2"

    View Slide

  33. with a as (
    select processed_crash
    from processed_crashes limit 1
    ),
    b as (
    select json_array_elements(processed_crash->'addons')
    as addons
    from a
    )
    select addons->>0 from b;

    View Slide

  34. JSON operators
    • Extract JSON object: ->
    • Extract array element: ->>2
    • Extract text element: ->>‘blah’
    • JSON type (can be confusing cuz doesn’t
    automatically cast to TEXT)

    View Slide

  35. JSON functions
    • json_enhancements:
    https://bitbucket.org/IVC-Inc/
    json_enhancements

    View Slide

  36. But, easier than reports
    • Big unanswered question:
    Can we kill the reports table?
    • advanced search
    • reports/list
    • update_reports_clean()
    • 3 reports THAT SHOULD DIE

    View Slide

  37. Upgrading to 9.3
    • Would be nice.
    • Not a priority.
    • Might just go to 9.4 in the fall......... Discuss?

    View Slide

  38. Stuff to cover
    • How Postgres fits in
    • Migrations
    • Middleware, stored procedures, crontabber
    • How to love JSON
    •Writing Queries
    • Common Admin issues

    View Slide

  39. Protips!
    • Don’t ever use now()::timestamp (magical,
    immediate performance killer)
    • Run EXPLAIN, talk to a buddy
    • Check for unnecessary JOINs
    • Don’t expect local/stage performance to be
    similar to prod performance
    • Always use “WHERE
    date_processed ...” (more on this later!)

    View Slide

  40. Writing queries in
    python
    • Isolate your queries in a function with the
    first parameter as the connection
    • DO NOT PASS CURSORS AROUND

    View Slide

  41. Always use
    date_processed
    • WHERE date_processed BETWEEN ...
    AND ...
    • Tables affected:
    • raw_crashes, processed_crashes
    • reports, reports_clean
    • plugins, plugins_modules

    View Slide

  42. Avoid reports
    • Avoid using the reports table
    • Deprecation target AND usually slower
    than reports_clean
    • matview? Use raw_crashes or
    processed_crashes and make indexes!

    View Slide

  43. Adding indexes to
    partitioned tables
    • see report_partition_info
    • Happens weekly
    • To add to old partitions, do “manually” in a
    migration based on pg_class data

    View Slide

  44. Use EXPLAIN
    • Check for unnecessary JOINs (remove!)
    • Don’t SELECT * ever
    • Start with WITH
    • Test performance with EXPLAIN
    ANALYZE
    • ANALYZE tables

    View Slide

  45. Adding other indexes
    • Must exactly match predicate!
    • Best practice: Create in a transaction and
    test exactly the query you want to use the
    index to verify your index is correct

    View Slide

  46. INSERTs
    • ALWAYS EXPLICITLY NAME COLUMNS

    View Slide

  47. Altering tables
    • Never specify a DEFAULT in a CREATE or
    ALTER that creates a column on an existing
    table
    • Drop children before parents
    • Never LOCK EXCLUSIVE on parents
    unless you want to have a downtime
    • DROPs and ALTERs on children might
    deadlock with processors (they retry,
    NBD)

    View Slide

  48. Stuff to cover
    • How Postgres fits in
    • Migrations
    • Middleware, stored procedures, crontabber
    • How to love JSON
    • Writing Queries
    •Common Admin issues

    View Slide

  49. High load on master
    • Check if a backup is running (OOPS)
    SOLUTION: kill backup
    LONG TERM: Investigate why backup was
    running against master, correct
    configuration/process documents

    View Slide

  50. High load on master
    • Check if load is coming from middleware
    (cacheable query?)
    SOLUTION: kill middleware processes,
    sometimes have to restart middleware
    LONG TERM: Cache a query or refactor
    SQL

    View Slide

  51. High load on master
    • Check the logs :) (checkpoints, temp files)
    • Check for a Suspicious Process or
    Processes (Long running, memory
    consuming)

    View Slide

  52. Maintenance of tables
    • We now drop old partitions without an
    exclusive lock on the parent
    • This is mostly safe -- might screw up READ
    ONLY queries looking at the data at the
    time of the drops
    • Worth it, because otherwise we need
    downtime windows
    • TODO: Now can be automated!

    View Slide

  53. Maintenance of tables
    • Probably could be doing more
    • Probably not worth the time given that we
    drop old partitions
    • Newer Postgres will automatically FREEZE
    tables, reducing overhead of periodic
    VACUUMs for transaction wraparound

    View Slide