Upgrade to Pro — share decks privately, control downloads, hide ads and more …

The State of Postgres | Strata Data Conference San Jose 2018 | Umur Cubukcu

The State of Postgres | Strata Data Conference San Jose 2018 | Umur Cubukcu

PostgreSQL is often regarded as the world’s most advanced open source database—and it’s on fire. Umur Cubukcu, the CEO of Citus Data, moves beyond the typical list of features in the next release to explore why so many new projects “just use Postgres” as their system of record (or system of engagement) at scale. Along the way, you’ll learn how PostgreSQL’s extension APIs are fueling innovations in relational databases.

Topics include: a framework for thinking about modern workloads, the evolution of database infrastructure, extensibility for the database and PostgreSQL as an ecosystem.

Citus Data

March 08, 2018
Tweet

More Decks by Citus Data

Other Decks in Technology

Transcript

  1. The State of Postgres
    For Modern, Scalable Applications
    Umur Cubukcu | Citus Data | Strata Data Conference 2018
    @umurc | @citusdata | citusdata.com

    View Slide

  2. 2 Umur Cubukcu | Citus Data | Strata Data Conference | 2018
    About me & Citus Data
    Citus Data Co-Founders, Left to Right
    Ozgun Erdogan, Sumedh Pathak, Umur Cubukcu
    Photo credit: Willy Johnson 2017
    • Umur Cubukcu, Co-Founder &
    CEO of Citus Data
    • Citus: Distributed PostgreSQL
    • Founded 2011, HQ in SOMA
    @umurc | @citusdata
    github.com/citusdata/citus

    View Slide

  3. Databases used to be simple (2008)
    3 Umur Cubukcu | Strata Data Conference | March 2018 | The State of Postgres
    (OLAP)
    Workloads
    Proprietary
    Open
    Source
    Operations
    Analytics
    (OLTP)
    RDBMS

    View Slide

  4. Data Growth >> Silicon Growth…
    Data
    2x every
    15 mo
    Moore’s Law
    2x every
    24 mo
    Data with less structure
    1 2
    LOG
    Umur Cubukcu | Strata Data Conference | March 2018 | The State of Postgres
    Two challenges for the relational database
    changed the landscape
    4

    View Slide

  5. 5 Umur Cubukcu | Strata Data Conference | March 2018 | The State of Postgres

    View Slide

  6. Meanwhile: Short history of Postgres
    Not the first time seeing similar challenges
    6
    • SQL or not? (1995)
    • Post-Ingres
    • Started life as object store
    • Added SQL API in 1995
    Umur Cubukcu | Strata Data Conference | March 2018 | The State of Postgres
    1
    2• Scaling out to handle data growth (2005)
    • For analytics only: MPPs
    • So many forks! AsterData, Netezza,
    ParAccel (Redshift), Greenplum

    View Slide

  7. 7
    Introducing PostgreSQL Extension APIs (2011)
    Amplifying vs. breaking the ecosystem
    Umur Cubukcu | Strata Data Conference | March 2018 | The State of Postgres
    Planner
    Executor
    Custom scan
    Commit / abort
    Access methods
    Foreign tables
    Functions
    ...
    ...
    ...
    ...
    ...
    ...
    ...
    Extension (.so)
    PostgreSQL
    CREATE EXTENSION ...

    View Slide

  8. Addressing challenges to RDBMS
    To structure, or not to structure?
    Scaling out—compute & performance
    8
    1
    2
    Umur Cubukcu | Strata Data Conference | March 2018 | The State of Postgres

    View Slide

  9. 9
    Start from file system
    (Hadoop)
    (-) Pay cost at query time
    (-) Batch vs. real-time
    (-) Indexes (Append only FS)
    (+) Any data, any structure
    (+) ’Infinitely’ scalable storage
    (+) Write fast
    Umur Cubukcu | Strata Data Conference | March 2018 | The State of Postgres

    View Slide

  10. 10
    Worry about only one
    access pattern
    (-) No expressiveness for analytics
    (-) No JOINS, data duplication
    (-) Enforce structure at app layer
    Semi-structured (JSON)
    (+) Simple: Put & Get
    (+) Scalable
    Umur Cubukcu | Strata Data Conference | March 2018 | The State of Postgres

    View Slide

  11. 11 Umur Cubukcu | Strata Data Conference | March 2018 | The State of Postgres
    Table "public.events"
    Column | Type | Sample Data
    ------------------------------------------------------------------
    user_id | bigint | 09288
    created_at | timestamp | 2018-03-08 00:57:12.6936+00
    payload | jsonb |
    Extend the database for JSON data
    TO STRUCTURE OR NOT TO STRUCTURE?
    1

    View Slide

  12. B-tree indexes
    GIN & GiST indexes
    Secondary indexes
    Full text search
    Index-only scans
    Fitting indexes into memory
    +
    Not to forget: Parallel queries, MVCC, and many more.
    Leverage indexing (and other fundamentals)
    SCALING COMPUTE & PERFORMANCE
    2
    Umur Cubukcu | Strata Data Conference | March 2018 | The State of Postgres
    12

    View Slide

  13. 13
    SELECT FROM
    events a JOIN users b
    SELECT FROM (a JOIN b)
    SELECT FROM (a JOIN b)
    Data Node 1
    events
    Events_101
    Events_103
    SELECT FROM (a JOIN b)
    SELECT FROM (a JOIN b)
    Data Node 2
    Data Node N
    .
    .
    .
    .
    .
    .
    Users_101
    Users_103

    users
    SCALING COMPUTE & PERFORMANCE
    2
    Events_104
    Events_102 Users_102
    Users_104
    Push computations (and joins) down
    to many PostgreSQL instances
    Umur Cubukcu | Strata Data Conference | March 2018 | The State of Postgres

    View Slide

  14. 14
    Extending Postgres for horizontal scale: Citus

    View Slide

  15. PostgreSQL: Vibrant, global ecosystem
    citus
    pgcrypto
    pg_cron
    pg_partman
    postgresql-HLL
    cstore_fdw
    unaccent
    cube
    jdbc_fdw
    pg_trgm
    PostGIS

    Sample PostgreSQL Extensions Integrations
    Umur Cubukcu | Strata Data Conference | March 2018 | The State of Postgres
    pg_buffercache
    pg_prewarm
    btree_gin
    btree_gist
    postgis_topology
    pg_stat_statements
    postgresql-unit
    plpgsql
    plv8
    pg_telemetry
    foreign data wrappers

    15

    View Slide

  16. PostgreSQL on fire
    PostgreSQL
    MySQL
    MongoDB
    SQL Server +
    Oracle
    Source: % database job postings that mention each specific technology, across 20K+ job posts on Hacker News, https://news.ycombinator.com
    Database adoption among developers1
    Umur Cubukcu | Strata Data Conference | March 2018 | The State of Postgres
    16

    View Slide

  17. Source: Google Trends for the past 2 years
    Winning Startups &
    Enterprises
    0
    10
    20
    30
    40
    50
    60
    70
    80
    90
    100
    PG Mongo Hadoop
    PostgreSQL popularity =
    Hadoop + Mongo combined
    Growing from already vast user base
    Umur Cubukcu | Strata Data Conference | March 2018 | The State of Postgres
    17

    View Slide

  18. So there’s an elephant in the room
    18 Umur Cubukcu | Strata Data Conference | March 2018 | The State of Postgres
    How does it all fit in with your stack?

    View Slide

  19. Modern workloads are evolving
    19
    (OLAP)
    Workloads
    Proprietary
    Open
    Source
    Operations
    Analytics
    (OLTP)
    RDBMS
    Improvement
    workloads
    Application workloads
    - Transactions
    - Short-requests
    - In-app analytics
    Umur Cubukcu | Strata Data Conference | March 2018 | The State of Postgres

    View Slide

  20. Modern databases serve 3 types of apps
    20
    Time to action
    Data volume
    Application data
    Systems of
    record
    • Core workloads, transactions
    • Real-time data
    • Millisecond latencies
    Systems of
    engagement
    • Drive engagement & revenue
    • Real-time data, multiple sources
    • Sub-second latencies
    Systems of
    improvement
    • Identify business process improvements
    • Offline data, multiple sources
    • Sub-minute / hour latencies, data analysts
    1
    3
    2
    Umur Cubukcu | Strata Data Conference | March 2018 | The State of Postgres

    View Slide

  21. PostgreSQL in your infrastructure stack
    21
    PostgreSQL
    Note: Standard PostgreSQL connectors for all tools (e.g. ODBC / JDBC, PostgreSQL language bindings) available for integrations.
    Application
    • Standalone database
    • Storage
    • Compute
    Data
    Spark
    HDFS / S3
    • Persistence layer for Spark
    • Persistence layer for Kafka
    Kafka
    NoSQL
    • Adjacent to NoSQL
    Umur Cubukcu | Strata Data Conference | March 2018 | The State of Postgres

    View Slide

  22. Scaling the tables
    Umur Cubukcu | Strata Data Conference | March 2018 | The State of Postgres

    View Slide

  23. Parting thoughts:
    PostgreSQL becoming the Linux of Databases
    23
    Extensibility
    Versatility
    Ecosystem
    Umur Cubukcu | Strata Data Conference | March 2018 | The State of Postgres

    View Slide

  24. Thank you
    citusdata.com/jobs
    Umur Cubukcu | Citus Data | Strata Data Conference 2018
    @citusdata
    github.com/citusdata/citus
    @umurc
    We’re Hiring!

    View Slide