Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Postgres – A Data Platform

Craig Kerstiens
October 09, 2014
400

Postgres – A Data Platform

Craig Kerstiens

October 09, 2014
Tweet

Transcript

  1. Postgres:
    A Data Platform
    craigkerstiens

    View Slide

  2. Who am I
    Product @ Heroku
    craigkerstiens.com
    postgresweekly.com
    postgresguide.com
    craigkerstiens

    View Slide

  3. Postgres – TLDR;
    Datatypes
    Conditional Indexes
    Transactional DDL
    Foreign Data Wrappers
    Concurrent Index Creation
    Extensions
    Common Table Expressions
    Fast Column Addition
    Listen/Notify
    Table Inheritance
    Per Transaction sync replication
    Window functions
    NoSQL inside SQL
    Momentum
    craigkerstiens

    View Slide

  4. Rough outline
    Context
    Day to day
    Datatypes
    Performance
    Expanding postgres

    View Slide

  5. 1989
    • Gas was $1.09
    • Movie ticket $2.75
    • Microsoft releases Windows 1.0
    • Postgres
    • Project called Postgres is released
    • Roots in Ingres (post-ingres)
    Context | Day to day | Datatypes | Performance | Expanding Postgres

    View Slide

  6. 1996
    It might help to explain that the pronunciation is "post-gres" or!
    "post-gres-cue-ell", not "post-gray-something".!
    !
    I heard people making this same mistake in presentations at this!
    past weekend's Postgres Anniversary Conference :-( Arguably,!
    the 1996 decision to call it PostgreSQL instead of reverting to!
    plain Postgres was the single worst mistake this project ever
    made.!
    It seems far too late to change now, though.!
    !
    ! ! ! regards, tom lane!
    Context | Day to day | Datatypes | Performance | Expanding Postgres

    View Slide

  7. MVCC
    Immutable structure
    !
    Reads don’t block writes, each
    transaction sees what the state was
    when it started
    !
    Context | Day to day | Datatypes | Performance | Expanding Postgres

    View Slide

  8. Under the covers
    A giant append only log
    !
    Context | Day to day | Datatypes | Performance | Expanding Postgres
    Writing data adds to the log
    Updating data adds to the log
    Deleting data adds to the log

    View Slide

  9. A tangent
    Context | Day to day | Datatypes | Performance | Expanding Postgres

    View Slide

  10. What is a relational DB
    • Stores the data and how its related
    • Data is in a flat two dimensional space
    • Has relationships between the data
    !
    • Really though it’s math
    Context | Day to day | Datatypes | Performance | Expanding Postgres

    View Slide

  11. What is SQL
    • SQL means SQL to access it
    • SQL also implies relational
    • A car has an owner
    • An owner has an address
    Context | Day to day | Datatypes | Performance | Expanding Postgres

    View Slide

  12. NoSQL is overloaded
    It can relate to CAP but doesn’t necessarily
    • Consistency
    • Availability
    • Partition Tolerance
    !
    • It can relate to SQL or user experience
    Context | Day to day | Datatypes | Performance | Expanding Postgres

    View Slide

  13. Databases broken down
    • Relational databases
    • Key-Value stores
    • Document databases
    • Text search solutions
    • Distributed data stores
    • Time series
    Context | Day to day | Datatypes | Performance | Expanding Postgres

    View Slide

  14. Day to Day

    View Slide

  15. Postgres bag of tricks
    • My editor of choice
    • Writing better SQL
    • Misc tips/tricks
    Context | Day to day | Datatypes | Performance | Expanding Postgres

    View Slide

  16. PSQL

    View Slide

  17. PSQL
    \e - opens your default $EDITOR
    \d - describe something
    \dt - list all tables
    \x auto - pretty results
    \x help - help
    .psqlrc - really geek out
    (pro tip - name your queries)
    Context | Day to day | Datatypes | Performance | Expanding Postgres

    View Slide

  18. Formatting SQL
    SELECT foo
    FROM bar a,
    baz b
    WHERE a.this = b.that
    AND condition2 = false
    GROUP BY 1
    ORDER BY 1 ASC;
    Context | Day to day | Datatypes | Performance | Expanding Postgres

    View Slide

  19. More readable SQL

    View Slide

  20. Common Table Expressions

    View Slide

  21. An example
    !
    --- Calculates the projects per each user
    tasks_per_project_per_user AS (
    SELECT
    user_id,
    project_id,
    count(*) as task_count
    FROM tasks
    GROUP BY user_id, project_id
    ),
    !
    --- Gets user ids that have over 50% of tasks assigned
    overloaded_users AS (
    SELECT tasks_per_project_per_user.user_id,
    !
    FROM tasks_per_project_per_user,
    total_tasks_per_project
    WHERE tasks_per_project_per_user.task_count >
    (total_tasks_per_project / 2)
    )
    !
    SELECT
    email,
    task_list,
    title
    FROM
    users_tasks,
    overloaded_users
    WHERE
    users_tasks.user_id = overloaded_users.user_id
    --- Initial query to grab project title and tasks per user
    WITH users_tasks AS (
    SELECT
    users.id as user_id,
    users.email,
    array_agg(tasks.name) as task_list,
    projects.title
    FROM
    users,
    tasks,
    project
    WHERE
    users.id = tasks.user_id
    projects.title = tasks.project_id
    GROUP BY
    users.email,
    projects.title
    ),
    !
    --- Calculates the total tasks per each project
    total_tasks_per_project AS (
    SELECT
    project_id,
    count(*) as task_count
    FROM tasks
    GROUP BY project_id
    ),
    Context | Day to day | Datatypes | Performance | Expanding Postgres

    View Slide

  22. An example
    --- Initial query to grab project title and tasks per user
    WITH users_tasks AS (
    SELECT
    users.id as user_id,
    users.email,
    array_agg(tasks.name) as task_list,
    projects.title
    FROM
    users,
    tasks,
    project
    WHERE
    users.id = tasks.user_id
    projects.title = tasks.project_id
    GROUP BY
    users.email,
    projects.title
    ),
    Context | Day to day | Datatypes | Performance | Expanding Postgres

    View Slide

  23. An example
    --- Calculates the total tasks per each project
    total_tasks_per_project AS (
    SELECT
    project_id,
    count(*) as task_count
    FROM tasks
    GROUP BY project_id
    ),
    Context | Day to day | Datatypes | Performance | Expanding Postgres

    View Slide

  24. An example
    --- Calculates the projects per each user
    tasks_per_project_per_user AS (
    SELECT
    user_id,
    project_id,
    count(*) as task_count
    FROM tasks
    GROUP BY user_id, project_id
    ),
    Context | Day to day | Datatypes | Performance | Expanding Postgres

    View Slide

  25. An example
    --- Gets user ids that have over 50% of tasks assigned
    overloaded_users AS (
    SELECT tasks_per_project_per_user.user_id,
    !
    FROM tasks_per_project_per_user,
    total_tasks_per_project
    WHERE tasks_per_project_per_user.task_count
    > (total_tasks_per_project / 2)
    )
    Context | Day to day | Datatypes | Performance | Expanding Postgres

    View Slide

  26. An example
    SELECT
    email,
    task_list,
    title
    FROM
    users_tasks,
    overloaded_users
    WHERE
    users_tasks.user_id = overloaded_users.user_id
    Context | Day to day | Datatypes | Performance | Expanding Postgres

    View Slide

  27. Window Functions

    View Slide

  28. An Example
    SELECT last_name,
    salary,
    department,
    rank() OVER
    (PARTITION BY department
    ORDER BY salary DESC)
    FROM employees;
    !
    !
    !
    last_name salary department rank
    Jones 45000 Accounting 1
    Williams 37000 Accounting 2
    Smith 55000 Sales 1
    Adams 50000 Sales 2
    Johnson 40000 Marketing 1

    View Slide

  29. Extensions

    View Slide

  30. Extensions
    hstore
    citext
    db_link
    uuid_ossp
    cube
    PostGIS
    intarray
    pg_crypto
    isn
    table_func
    unaccent
    dict_int
    dict_xsyn
    earth_distance
    pg_stat_tuple
    ltree
    pg_row_locks
    trigram

    View Slide

  31. Datatypes

    View Slide

  32. The list
    integer
    float
    serial
    money
    character
    bytea
    timestamp
    timestamp with tz
    date
    time
    interval
    Context | Day to day | Datatypes | Performance | Expanding Postgres
    boolean
    enums
    point
    line
    box
    path
    polygon
    circle
    inet
    cidr
    bit
    tsvector
    tsquery
    UUID
    XML
    JSON
    array
    rangetypes

    View Slide

  33. Money
    • Don’t use it
    • Created when the world was flat
    • Money only knows a single currency
    Context | Day to day | Datatypes | Performance | Expanding Postgres

    View Slide

  34. Serial
    Context | Day to day | Datatypes | Performance | Expanding Postgres

    View Slide

  35. Serial
    • Serial has a limit
    • UUID scales
    • UUID doesn’t leak your primary keys
    • uuid_ossp extension
    UUID
    Context | Day to day | Datatypes | Performance | Expanding Postgres

    View Slide

  36. Shapes
    • Basic ones for a shortcut
    • earth_distance
    !
    • PostGIS for robustness
    Context | Day to day | Datatypes | Performance | Expanding Postgres

    View Slide

  37. XML
    Just kidding

    View Slide

  38. XML

    View Slide

  39. JSON vs. hStore
    • NoSQL often means schema-less
    !
    • Adding columns is annoying
    • Just add data
    Context | Day to day | Datatypes | Performance | Expanding Postgres

    View Slide

  40. hStore
    • Key value store directly in Postgres
    !
    • Can filter where a key does/doesn’t exist
    • Can filter for specific values of keys
    Context | Day to day | Datatypes | Performance | Expanding Postgres

    View Slide

  41. JSON
    Postgres 9.2 was the JSON release
    Marketing
    Postgres 9.3
    started to become usable
    operators, basic indexing, etc.
    Postgres 9.4
    Context | Day to day | Datatypes | Performance | Expanding Postgres

    View Slide

  42. JSON
    Context | Day to day | Datatypes | Performance | Expanding Postgres

    View Slide

  43. JSONB
    Context | Day to day | Datatypes | Performance | Expanding Postgres

    View Slide

  44. Others
    • Timestamps with timezone
    • Intervals
    (now() - ‘1 hour’::interval)
    • Array
    • Range types
    Context | Day to day | Datatypes | Performance | Expanding Postgres

    View Slide

  45. Performance

    View Slide

  46. Mo cache, Less problems

    View Slide

  47. Cache
    SELECT
    'index hit rate' as name,
    (sum(idx_blks_hit) - sum(idx_blks_read)) /
    sum(idx_blks_hit + idx_blks_read) as ratio
    FROM pg_statio_user_indexes
    union all
    SELECT
    'cache hit rate' as name,
    case sum(idx_blks_hit)
    when 0 then 'NaN'::numeric
    else to_char((sum(idx_blks_hit) -
    sum(idx_blks_read)) / sum(idx_blks_hit +
    idx_blks_read), '99.99')::numeric
    end as ratio
    FROM pg_statio_user_indexes)
    Context | Day to day | Datatypes | Performance | Expanding Postgres

    View Slide

  48. Cache
    name | ratio
    ----------------+------------------------
    cache hit rate | 0.99
    Context | Day to day | Datatypes | Performance | Expanding Postgres

    View Slide

  49. Index Hit Rate
    SELECT
    relname,
    100 * idx_scan / (seq_scan + idx_scan),
    n_live_tup
    FROM pg_stat_user_tables
    ORDER BY n_live_tup DESC;
    Context | Day to day | Datatypes | Performance | Expanding Postgres

    View Slide

  50. Index Hit Rate
    relname | percent_used | rows_in_table
    ------------------------------------+---------------
    events | 0 | 669917
    app_infos_user_info | 0 | 198218
    app_infos | 50 | 175640
    user_info | 3 | 46718
    rollouts | 0 | 34078
    favorites | 0 | 3059
    schema_migrations | 0 | 2
    authorizations | 0 | 0
    delayed_jobs | 23 | 0
    Context | Day to day | Datatypes | Performance | Expanding Postgres

    View Slide

  51. Rule of thumb
    Cache > 99%
    Index hit rate > 95%
    where rows > 10000
    Context | Day to day | Datatypes | Performance | Expanding Postgres

    View Slide

  52. Specific Query
    Performance
    Context | Day to day | Datatypes | Performance | Expanding Postgres

    View Slide

  53. An example
    SELECT last_name
    FROM employees
    WHERE salary >= 50000;
    Context | Day to day | Datatypes | Performance | Expanding Postgres

    View Slide

  54. Context | Day to day | Datatypes | Performance | Expanding Postgres
    # EXPLAIN
    SELECT last_name
    FROM employees
    WHERE salary >= 50000;
    !
    QUERY PLAN
    --------------------------------------------------
    Seq Scan on employees (cost=0.00..35811.00 rows=1
    width=6)
    Filter: (salary >= 50000)
    (3 rows)
    Query plans

    View Slide

  55. Query plans
    # EXPLAIN
    SELECT last_name
    FROM employees
    WHERE salary >= 50000;
    !
    QUERY PLAN
    --------------------------------------------------
    Seq Scan on employees (cost=0.00..35811.00 rows=1
    width=6)
    Filter: (salary >= 50000)
    (3 rows)
    Context | Day to day | Datatypes | Performance | Expanding Postgres

    View Slide

  56. Query plans
    # EXPLAIN ANALYZE
    SELECT last_name
    FROM employees
    WHERE salary >= 50000;
    QUERY PLAN
    --------------------------------------------------
    Seq Scan on employees (cost=0.00..35811.00 rows=1
    width=6) (actual time=2.401..295.247 rows=1428
    loops=1)
    Filter: (salary >= 50000)
    Total runtime: 295.379
    (3 rows)
    !
    Filter: (salary >= 50000)
    (3 rows)
    Context | Day to day | Datatypes | Performance | Expanding Postgres

    View Slide

  57. Indexes fix it
    # CREATE INDEX idx_emps ON employees (salary);
    Context | Day to day | Datatypes | Performance | Expanding Postgres

    View Slide

  58. EXPLAIN ANALYZE
    SELECT last_name
    FROM employees
    WHERE salary >= 50000;
    QUERY PLAN
    --------------------------------------------------
    Index Scan using idx_emps on employees
    (cost=0.00..8.49 rows=1 width=6) (actual time =
    0.047..1.603 rows=1428 loops=1)
    Index Cond: (salary >= 50000)
    Total runtime: 1.771 ms
    (3 rows)
    Context | Day to day | Datatypes | Performance | Expanding Postgres
    Indexes fix it

    View Slide

  59. pg_stat_statments
    $ select *
    from pg_stat_statements
    where query ~ 'from users where email';
    !
    !
    userid │ 16384
    dbid │ 16388
    query │ select * from users where email = ?;
    calls │ 2
    total_time │ 0.000268
    rows │ 2
    shared_blks_hit │ 16
    shared_blks_read │ 0
    shared_blks_dirtied │ 0
    shared_blks_written │ 0
    local_blks_hit │ 0
    local_blks_read │ 0
    local_blks_dirtied │ 0
    local_blks_written │ 0
    ...
    Context | Day to day | Datatypes | Performance | Expanding Postgres

    View Slide

  60. pg_stat_statements
    SELECT
    (total_time / 1000 / 60) as total,
    (total_time/calls) as avg,
    query
    FROM pg_stat_statements
    ORDER BY 1 DESC
    LIMIT 100;
    Context | Day to day | Datatypes | Performance | Expanding Postgres

    View Slide

  61. pg_stat_statements
    total | avg | query
    --------+--------+-------------------------
    295.76 | 10.13 | SELECT id FROM users...
    219.13 | 80.24 | SELECT * FROM ...
    (2 rows)
    Context | Day to day | Datatypes | Performance | Expanding Postgres

    View Slide

  62. Indexes
    B-Tree
    Generalized Inverted Index (GIN)
    Generalized Search Tree (GIST)
    K Nearest Neighbors (KNN)
    Space Partitioned GIST (SP-GIST)
    Context | Day to day | Datatypes | Performance | Expanding Postgres

    View Slide

  63. Indexes
    Which do I use?
    Context | Day to day | Datatypes | Performance | Expanding Postgres
    craigkerstiens

    View Slide

  64. BTree
    If you don’t know better, you’re using
    this. This is usually okay.
    Context | Day to day | Datatypes | Performance | Expanding Postgres

    View Slide

  65. Generalized Inverted Index (GIN)
    Multiple values in a single column.
    !
    Rule of thumb is Array/hStore
    Context | Day to day | Datatypes | Performance | Expanding Postgres

    View Slide

  66. Generalized Search Tree (GiST)
    Values that span across a boundary
    !
    Rule of thumb is full text and shapes
    Context | Day to day | Datatypes | Performance | Expanding Postgres

    View Slide

  67. Indexes
    B-Tree
    Generalized Inverted Index (GIN)
    Generalized Search Tree (GIST)
    K Nearest Neighbors (KNN)
    Space Partitioned GIST (SP-GIST)
    VODKA (coming soon)
    Context | Day to day | Datatypes | Performance | Expanding Postgres

    View Slide

  68. There’s more
    Context | Day to day | Datatypes | Performance | Expanding Postgres

    View Slide

  69. Conditional & Functional
    Conditional
    !
    CREATE INDEX … WHERE deleted_at is null
    !
    Functional
    !
    CREATE INDEX … LOWER(firstname)
    Context | Day to day | Datatypes | Performance | Expanding Postgres

    View Slide

  70. Backups
    Logical
    !
    Portable
    Has load on DB
    !
    < 50 GB
    Context | Day to day | Datatypes | Performance | Expanding Postgres
    Phsyical
    !
    More setup
    Limited load on DB
    !
    > 50 GB

    View Slide

  71. Replication
    Slony
    Londiste
    Bucardo
    PgPool
    WAL-e
    Barman
    Context | Day to day | Datatypes | Performance | Expanding Postgres

    View Slide

  72. Replication
    Slony
    Londiste
    Bucardo
    PgPool
    WAL-e
    Barman
    Context | Day to day | Datatypes | Performance | Expanding Postgres

    View Slide

  73. Expanding Postgres

    View Slide

  74. Databases broken down
    • Relational databases
    • Key-Value stores
    • Document databases
    • Text search solutions
    • Distributed data stores
    • Time series
    Context | Day to day | Datatypes | Performance | Expanding Postgres

    View Slide

  75. Databases broken down
    • Relational databases
    • Key-Value stores
    • Document databases
    • Text search solutions
    • Distributed data stores
    • Time series
    Context | Day to day | Datatypes | Performance | Expanding Postgres

    View Slide

  76. What is Postgres
    Context | Day to day | Datatypes | Performance | Expanding Postgres

    View Slide

  77. Foreign Data Wrappers
    • Query from Postgres 1 to Postgres 2
    • Query other things from Postgres
    • Redis
    • Mongo
    • LDAP
    • Twitter?
    Context | Day to day | Datatypes | Performance | Expanding Postgres

    View Slide

  78. PL’s
    Context | Day to day | Datatypes | Performance | Expanding Postgres

    View Slide

  79. PLV8
    Context | Day to day | Datatypes | Performance | Expanding Postgres

    View Slide

  80. Postgres – TLDR;
    Datatypes
    Conditional Indexes
    Transactional DDL
    Foreign Data Wrappers
    Concurrent Index Creation
    Extensions
    Common Table Expressions
    Fast Column Addition
    Listen/Notify
    Table Inheritance
    Per Transaction sync replication
    Window functions
    NoSQL inside SQL
    Momentum
    craigkerstiens

    View Slide

  81. Fin.
    craigkerstiens

    View Slide