Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Postgres – A Data Platform

Craig Kerstiens
October 09, 2014
420

Postgres – A Data Platform

Craig Kerstiens

October 09, 2014
Tweet

Transcript

  1. Postgres:
    A Data Platform
    craigkerstiens

    View full-size slide

  2. Who am I
    Product @ Heroku
    craigkerstiens.com
    postgresweekly.com
    postgresguide.com
    craigkerstiens

    View full-size slide

  3. Postgres – TLDR;
    Datatypes
    Conditional Indexes
    Transactional DDL
    Foreign Data Wrappers
    Concurrent Index Creation
    Extensions
    Common Table Expressions
    Fast Column Addition
    Listen/Notify
    Table Inheritance
    Per Transaction sync replication
    Window functions
    NoSQL inside SQL
    Momentum
    craigkerstiens

    View full-size slide

  4. Rough outline
    Context
    Day to day
    Datatypes
    Performance
    Expanding postgres

    View full-size slide

  5. 1989
    • Gas was $1.09
    • Movie ticket $2.75
    • Microsoft releases Windows 1.0
    • Postgres
    • Project called Postgres is released
    • Roots in Ingres (post-ingres)
    Context | Day to day | Datatypes | Performance | Expanding Postgres

    View full-size slide

  6. 1996
    It might help to explain that the pronunciation is "post-gres" or!
    "post-gres-cue-ell", not "post-gray-something".!
    !
    I heard people making this same mistake in presentations at this!
    past weekend's Postgres Anniversary Conference :-( Arguably,!
    the 1996 decision to call it PostgreSQL instead of reverting to!
    plain Postgres was the single worst mistake this project ever
    made.!
    It seems far too late to change now, though.!
    !
    ! ! ! regards, tom lane!
    Context | Day to day | Datatypes | Performance | Expanding Postgres

    View full-size slide

  7. MVCC
    Immutable structure
    !
    Reads don’t block writes, each
    transaction sees what the state was
    when it started
    !
    Context | Day to day | Datatypes | Performance | Expanding Postgres

    View full-size slide

  8. Under the covers
    A giant append only log
    !
    Context | Day to day | Datatypes | Performance | Expanding Postgres
    Writing data adds to the log
    Updating data adds to the log
    Deleting data adds to the log

    View full-size slide

  9. A tangent
    Context | Day to day | Datatypes | Performance | Expanding Postgres

    View full-size slide

  10. What is a relational DB
    • Stores the data and how its related
    • Data is in a flat two dimensional space
    • Has relationships between the data
    !
    • Really though it’s math
    Context | Day to day | Datatypes | Performance | Expanding Postgres

    View full-size slide

  11. What is SQL
    • SQL means SQL to access it
    • SQL also implies relational
    • A car has an owner
    • An owner has an address
    Context | Day to day | Datatypes | Performance | Expanding Postgres

    View full-size slide

  12. NoSQL is overloaded
    It can relate to CAP but doesn’t necessarily
    • Consistency
    • Availability
    • Partition Tolerance
    !
    • It can relate to SQL or user experience
    Context | Day to day | Datatypes | Performance | Expanding Postgres

    View full-size slide

  13. Databases broken down
    • Relational databases
    • Key-Value stores
    • Document databases
    • Text search solutions
    • Distributed data stores
    • Time series
    Context | Day to day | Datatypes | Performance | Expanding Postgres

    View full-size slide

  14. Postgres bag of tricks
    • My editor of choice
    • Writing better SQL
    • Misc tips/tricks
    Context | Day to day | Datatypes | Performance | Expanding Postgres

    View full-size slide

  15. PSQL
    \e - opens your default $EDITOR
    \d - describe something
    \dt - list all tables
    \x auto - pretty results
    \x help - help
    .psqlrc - really geek out
    (pro tip - name your queries)
    Context | Day to day | Datatypes | Performance | Expanding Postgres

    View full-size slide

  16. Formatting SQL
    SELECT foo
    FROM bar a,
    baz b
    WHERE a.this = b.that
    AND condition2 = false
    GROUP BY 1
    ORDER BY 1 ASC;
    Context | Day to day | Datatypes | Performance | Expanding Postgres

    View full-size slide

  17. More readable SQL

    View full-size slide

  18. Common Table Expressions

    View full-size slide

  19. An example
    !
    --- Calculates the projects per each user
    tasks_per_project_per_user AS (
    SELECT
    user_id,
    project_id,
    count(*) as task_count
    FROM tasks
    GROUP BY user_id, project_id
    ),
    !
    --- Gets user ids that have over 50% of tasks assigned
    overloaded_users AS (
    SELECT tasks_per_project_per_user.user_id,
    !
    FROM tasks_per_project_per_user,
    total_tasks_per_project
    WHERE tasks_per_project_per_user.task_count >
    (total_tasks_per_project / 2)
    )
    !
    SELECT
    email,
    task_list,
    title
    FROM
    users_tasks,
    overloaded_users
    WHERE
    users_tasks.user_id = overloaded_users.user_id
    --- Initial query to grab project title and tasks per user
    WITH users_tasks AS (
    SELECT
    users.id as user_id,
    users.email,
    array_agg(tasks.name) as task_list,
    projects.title
    FROM
    users,
    tasks,
    project
    WHERE
    users.id = tasks.user_id
    projects.title = tasks.project_id
    GROUP BY
    users.email,
    projects.title
    ),
    !
    --- Calculates the total tasks per each project
    total_tasks_per_project AS (
    SELECT
    project_id,
    count(*) as task_count
    FROM tasks
    GROUP BY project_id
    ),
    Context | Day to day | Datatypes | Performance | Expanding Postgres

    View full-size slide

  20. An example
    --- Initial query to grab project title and tasks per user
    WITH users_tasks AS (
    SELECT
    users.id as user_id,
    users.email,
    array_agg(tasks.name) as task_list,
    projects.title
    FROM
    users,
    tasks,
    project
    WHERE
    users.id = tasks.user_id
    projects.title = tasks.project_id
    GROUP BY
    users.email,
    projects.title
    ),
    Context | Day to day | Datatypes | Performance | Expanding Postgres

    View full-size slide

  21. An example
    --- Calculates the total tasks per each project
    total_tasks_per_project AS (
    SELECT
    project_id,
    count(*) as task_count
    FROM tasks
    GROUP BY project_id
    ),
    Context | Day to day | Datatypes | Performance | Expanding Postgres

    View full-size slide

  22. An example
    --- Calculates the projects per each user
    tasks_per_project_per_user AS (
    SELECT
    user_id,
    project_id,
    count(*) as task_count
    FROM tasks
    GROUP BY user_id, project_id
    ),
    Context | Day to day | Datatypes | Performance | Expanding Postgres

    View full-size slide

  23. An example
    --- Gets user ids that have over 50% of tasks assigned
    overloaded_users AS (
    SELECT tasks_per_project_per_user.user_id,
    !
    FROM tasks_per_project_per_user,
    total_tasks_per_project
    WHERE tasks_per_project_per_user.task_count
    > (total_tasks_per_project / 2)
    )
    Context | Day to day | Datatypes | Performance | Expanding Postgres

    View full-size slide

  24. An example
    SELECT
    email,
    task_list,
    title
    FROM
    users_tasks,
    overloaded_users
    WHERE
    users_tasks.user_id = overloaded_users.user_id
    Context | Day to day | Datatypes | Performance | Expanding Postgres

    View full-size slide

  25. Window Functions

    View full-size slide

  26. An Example
    SELECT last_name,
    salary,
    department,
    rank() OVER
    (PARTITION BY department
    ORDER BY salary DESC)
    FROM employees;
    !
    !
    !
    last_name salary department rank
    Jones 45000 Accounting 1
    Williams 37000 Accounting 2
    Smith 55000 Sales 1
    Adams 50000 Sales 2
    Johnson 40000 Marketing 1

    View full-size slide

  27. Extensions
    hstore
    citext
    db_link
    uuid_ossp
    cube
    PostGIS
    intarray
    pg_crypto
    isn
    table_func
    unaccent
    dict_int
    dict_xsyn
    earth_distance
    pg_stat_tuple
    ltree
    pg_row_locks
    trigram

    View full-size slide

  28. The list
    integer
    float
    serial
    money
    character
    bytea
    timestamp
    timestamp with tz
    date
    time
    interval
    Context | Day to day | Datatypes | Performance | Expanding Postgres
    boolean
    enums
    point
    line
    box
    path
    polygon
    circle
    inet
    cidr
    bit
    tsvector
    tsquery
    UUID
    XML
    JSON
    array
    rangetypes

    View full-size slide

  29. Money
    • Don’t use it
    • Created when the world was flat
    • Money only knows a single currency
    Context | Day to day | Datatypes | Performance | Expanding Postgres

    View full-size slide

  30. Serial
    Context | Day to day | Datatypes | Performance | Expanding Postgres

    View full-size slide

  31. Serial
    • Serial has a limit
    • UUID scales
    • UUID doesn’t leak your primary keys
    • uuid_ossp extension
    UUID
    Context | Day to day | Datatypes | Performance | Expanding Postgres

    View full-size slide

  32. Shapes
    • Basic ones for a shortcut
    • earth_distance
    !
    • PostGIS for robustness
    Context | Day to day | Datatypes | Performance | Expanding Postgres

    View full-size slide

  33. XML
    Just kidding

    View full-size slide

  34. JSON vs. hStore
    • NoSQL often means schema-less
    !
    • Adding columns is annoying
    • Just add data
    Context | Day to day | Datatypes | Performance | Expanding Postgres

    View full-size slide

  35. hStore
    • Key value store directly in Postgres
    !
    • Can filter where a key does/doesn’t exist
    • Can filter for specific values of keys
    Context | Day to day | Datatypes | Performance | Expanding Postgres

    View full-size slide

  36. JSON
    Postgres 9.2 was the JSON release
    Marketing
    Postgres 9.3
    started to become usable
    operators, basic indexing, etc.
    Postgres 9.4
    Context | Day to day | Datatypes | Performance | Expanding Postgres

    View full-size slide

  37. JSON
    Context | Day to day | Datatypes | Performance | Expanding Postgres

    View full-size slide

  38. JSONB
    Context | Day to day | Datatypes | Performance | Expanding Postgres

    View full-size slide

  39. Others
    • Timestamps with timezone
    • Intervals
    (now() - ‘1 hour’::interval)
    • Array
    • Range types
    Context | Day to day | Datatypes | Performance | Expanding Postgres

    View full-size slide

  40. Mo cache, Less problems

    View full-size slide

  41. Cache
    SELECT
    'index hit rate' as name,
    (sum(idx_blks_hit) - sum(idx_blks_read)) /
    sum(idx_blks_hit + idx_blks_read) as ratio
    FROM pg_statio_user_indexes
    union all
    SELECT
    'cache hit rate' as name,
    case sum(idx_blks_hit)
    when 0 then 'NaN'::numeric
    else to_char((sum(idx_blks_hit) -
    sum(idx_blks_read)) / sum(idx_blks_hit +
    idx_blks_read), '99.99')::numeric
    end as ratio
    FROM pg_statio_user_indexes)
    Context | Day to day | Datatypes | Performance | Expanding Postgres

    View full-size slide

  42. Cache
    name | ratio
    ----------------+------------------------
    cache hit rate | 0.99
    Context | Day to day | Datatypes | Performance | Expanding Postgres

    View full-size slide

  43. Index Hit Rate
    SELECT
    relname,
    100 * idx_scan / (seq_scan + idx_scan),
    n_live_tup
    FROM pg_stat_user_tables
    ORDER BY n_live_tup DESC;
    Context | Day to day | Datatypes | Performance | Expanding Postgres

    View full-size slide

  44. Index Hit Rate
    relname | percent_used | rows_in_table
    ------------------------------------+---------------
    events | 0 | 669917
    app_infos_user_info | 0 | 198218
    app_infos | 50 | 175640
    user_info | 3 | 46718
    rollouts | 0 | 34078
    favorites | 0 | 3059
    schema_migrations | 0 | 2
    authorizations | 0 | 0
    delayed_jobs | 23 | 0
    Context | Day to day | Datatypes | Performance | Expanding Postgres

    View full-size slide

  45. Rule of thumb
    Cache > 99%
    Index hit rate > 95%
    where rows > 10000
    Context | Day to day | Datatypes | Performance | Expanding Postgres

    View full-size slide

  46. Specific Query
    Performance
    Context | Day to day | Datatypes | Performance | Expanding Postgres

    View full-size slide

  47. An example
    SELECT last_name
    FROM employees
    WHERE salary >= 50000;
    Context | Day to day | Datatypes | Performance | Expanding Postgres

    View full-size slide

  48. Context | Day to day | Datatypes | Performance | Expanding Postgres
    # EXPLAIN
    SELECT last_name
    FROM employees
    WHERE salary >= 50000;
    !
    QUERY PLAN
    --------------------------------------------------
    Seq Scan on employees (cost=0.00..35811.00 rows=1
    width=6)
    Filter: (salary >= 50000)
    (3 rows)
    Query plans

    View full-size slide

  49. Query plans
    # EXPLAIN
    SELECT last_name
    FROM employees
    WHERE salary >= 50000;
    !
    QUERY PLAN
    --------------------------------------------------
    Seq Scan on employees (cost=0.00..35811.00 rows=1
    width=6)
    Filter: (salary >= 50000)
    (3 rows)
    Context | Day to day | Datatypes | Performance | Expanding Postgres

    View full-size slide

  50. Query plans
    # EXPLAIN ANALYZE
    SELECT last_name
    FROM employees
    WHERE salary >= 50000;
    QUERY PLAN
    --------------------------------------------------
    Seq Scan on employees (cost=0.00..35811.00 rows=1
    width=6) (actual time=2.401..295.247 rows=1428
    loops=1)
    Filter: (salary >= 50000)
    Total runtime: 295.379
    (3 rows)
    !
    Filter: (salary >= 50000)
    (3 rows)
    Context | Day to day | Datatypes | Performance | Expanding Postgres

    View full-size slide

  51. Indexes fix it
    # CREATE INDEX idx_emps ON employees (salary);
    Context | Day to day | Datatypes | Performance | Expanding Postgres

    View full-size slide

  52. EXPLAIN ANALYZE
    SELECT last_name
    FROM employees
    WHERE salary >= 50000;
    QUERY PLAN
    --------------------------------------------------
    Index Scan using idx_emps on employees
    (cost=0.00..8.49 rows=1 width=6) (actual time =
    0.047..1.603 rows=1428 loops=1)
    Index Cond: (salary >= 50000)
    Total runtime: 1.771 ms
    (3 rows)
    Context | Day to day | Datatypes | Performance | Expanding Postgres
    Indexes fix it

    View full-size slide

  53. pg_stat_statments
    $ select *
    from pg_stat_statements
    where query ~ 'from users where email';
    !
    !
    userid │ 16384
    dbid │ 16388
    query │ select * from users where email = ?;
    calls │ 2
    total_time │ 0.000268
    rows │ 2
    shared_blks_hit │ 16
    shared_blks_read │ 0
    shared_blks_dirtied │ 0
    shared_blks_written │ 0
    local_blks_hit │ 0
    local_blks_read │ 0
    local_blks_dirtied │ 0
    local_blks_written │ 0
    ...
    Context | Day to day | Datatypes | Performance | Expanding Postgres

    View full-size slide

  54. pg_stat_statements
    SELECT
    (total_time / 1000 / 60) as total,
    (total_time/calls) as avg,
    query
    FROM pg_stat_statements
    ORDER BY 1 DESC
    LIMIT 100;
    Context | Day to day | Datatypes | Performance | Expanding Postgres

    View full-size slide

  55. pg_stat_statements
    total | avg | query
    --------+--------+-------------------------
    295.76 | 10.13 | SELECT id FROM users...
    219.13 | 80.24 | SELECT * FROM ...
    (2 rows)
    Context | Day to day | Datatypes | Performance | Expanding Postgres

    View full-size slide

  56. Indexes
    B-Tree
    Generalized Inverted Index (GIN)
    Generalized Search Tree (GIST)
    K Nearest Neighbors (KNN)
    Space Partitioned GIST (SP-GIST)
    Context | Day to day | Datatypes | Performance | Expanding Postgres

    View full-size slide

  57. Indexes
    Which do I use?
    Context | Day to day | Datatypes | Performance | Expanding Postgres
    craigkerstiens

    View full-size slide

  58. BTree
    If you don’t know better, you’re using
    this. This is usually okay.
    Context | Day to day | Datatypes | Performance | Expanding Postgres

    View full-size slide

  59. Generalized Inverted Index (GIN)
    Multiple values in a single column.
    !
    Rule of thumb is Array/hStore
    Context | Day to day | Datatypes | Performance | Expanding Postgres

    View full-size slide

  60. Generalized Search Tree (GiST)
    Values that span across a boundary
    !
    Rule of thumb is full text and shapes
    Context | Day to day | Datatypes | Performance | Expanding Postgres

    View full-size slide

  61. Indexes
    B-Tree
    Generalized Inverted Index (GIN)
    Generalized Search Tree (GIST)
    K Nearest Neighbors (KNN)
    Space Partitioned GIST (SP-GIST)
    VODKA (coming soon)
    Context | Day to day | Datatypes | Performance | Expanding Postgres

    View full-size slide

  62. There’s more
    Context | Day to day | Datatypes | Performance | Expanding Postgres

    View full-size slide

  63. Conditional & Functional
    Conditional
    !
    CREATE INDEX … WHERE deleted_at is null
    !
    Functional
    !
    CREATE INDEX … LOWER(firstname)
    Context | Day to day | Datatypes | Performance | Expanding Postgres

    View full-size slide

  64. Backups
    Logical
    !
    Portable
    Has load on DB
    !
    < 50 GB
    Context | Day to day | Datatypes | Performance | Expanding Postgres
    Phsyical
    !
    More setup
    Limited load on DB
    !
    > 50 GB

    View full-size slide

  65. Replication
    Slony
    Londiste
    Bucardo
    PgPool
    WAL-e
    Barman
    Context | Day to day | Datatypes | Performance | Expanding Postgres

    View full-size slide

  66. Replication
    Slony
    Londiste
    Bucardo
    PgPool
    WAL-e
    Barman
    Context | Day to day | Datatypes | Performance | Expanding Postgres

    View full-size slide

  67. Expanding Postgres

    View full-size slide

  68. Databases broken down
    • Relational databases
    • Key-Value stores
    • Document databases
    • Text search solutions
    • Distributed data stores
    • Time series
    Context | Day to day | Datatypes | Performance | Expanding Postgres

    View full-size slide

  69. Databases broken down
    • Relational databases
    • Key-Value stores
    • Document databases
    • Text search solutions
    • Distributed data stores
    • Time series
    Context | Day to day | Datatypes | Performance | Expanding Postgres

    View full-size slide

  70. What is Postgres
    Context | Day to day | Datatypes | Performance | Expanding Postgres

    View full-size slide

  71. Foreign Data Wrappers
    • Query from Postgres 1 to Postgres 2
    • Query other things from Postgres
    • Redis
    • Mongo
    • LDAP
    • Twitter?
    Context | Day to day | Datatypes | Performance | Expanding Postgres

    View full-size slide

  72. PL’s
    Context | Day to day | Datatypes | Performance | Expanding Postgres

    View full-size slide

  73. PLV8
    Context | Day to day | Datatypes | Performance | Expanding Postgres

    View full-size slide

  74. Postgres – TLDR;
    Datatypes
    Conditional Indexes
    Transactional DDL
    Foreign Data Wrappers
    Concurrent Index Creation
    Extensions
    Common Table Expressions
    Fast Column Addition
    Listen/Notify
    Table Inheritance
    Per Transaction sync replication
    Window functions
    NoSQL inside SQL
    Momentum
    craigkerstiens

    View full-size slide

  75. Fin.
    craigkerstiens

    View full-size slide