The State of Postgres | Strata Data Conference San Jose 2018 | Umur Cubukcu

The State of Postgres | Strata Data Conference San Jose 2018 | Umur Cubukcu

PostgreSQL is often regarded as the world’s most advanced open source database—and it’s on fire. Umur Cubukcu, the CEO of Citus Data, moves beyond the typical list of features in the next release to explore why so many new projects “just use Postgres” as their system of record (or system of engagement) at scale. Along the way, you’ll learn how PostgreSQL’s extension APIs are fueling innovations in relational databases.

Topics include: a framework for thinking about modern workloads, the evolution of database infrastructure, extensibility for the database and PostgreSQL as an ecosystem.

024d6a0dd14fb31c804969a57a06dfbe?s=128

Citus Data

March 08, 2018
Tweet

Transcript

  1. 1.

    The State of Postgres For Modern, Scalable Applications Umur Cubukcu

    | Citus Data | Strata Data Conference 2018 @umurc | @citusdata | citusdata.com
  2. 2.

    2 Umur Cubukcu | Citus Data | Strata Data Conference

    | 2018 About me & Citus Data Citus Data Co-Founders, Left to Right Ozgun Erdogan, Sumedh Pathak, Umur Cubukcu Photo credit: Willy Johnson 2017 • Umur Cubukcu, Co-Founder & CEO of Citus Data • Citus: Distributed PostgreSQL • Founded 2011, HQ in SOMA @umurc | @citusdata github.com/citusdata/citus
  3. 3.

    Databases used to be simple (2008) 3 Umur Cubukcu |

    Strata Data Conference | March 2018 | The State of Postgres (OLAP) Workloads Proprietary Open Source Operations Analytics (OLTP) RDBMS
  4. 4.

    Data Growth >> Silicon Growth… Data 2x every 15 mo

    Moore’s Law 2x every 24 mo Data with less structure 1 2 LOG Umur Cubukcu | Strata Data Conference | March 2018 | The State of Postgres Two challenges for the relational database changed the landscape 4
  5. 6.

    Meanwhile: Short history of Postgres Not the first time seeing

    similar challenges 6 • SQL or not? (1995) • Post-Ingres • Started life as object store • Added SQL API in 1995 Umur Cubukcu | Strata Data Conference | March 2018 | The State of Postgres 1 2• Scaling out to handle data growth (2005) • For analytics only: MPPs • So many forks! AsterData, Netezza, ParAccel (Redshift), Greenplum
  6. 7.

    7 Introducing PostgreSQL Extension APIs (2011) Amplifying vs. breaking the

    ecosystem Umur Cubukcu | Strata Data Conference | March 2018 | The State of Postgres Planner Executor Custom scan Commit / abort Access methods Foreign tables Functions ... ... ... ... ... ... ... Extension (.so) PostgreSQL CREATE EXTENSION ...
  7. 8.

    Addressing challenges to RDBMS To structure, or not to structure?

    Scaling out—compute & performance 8 1 2 Umur Cubukcu | Strata Data Conference | March 2018 | The State of Postgres
  8. 9.

    9 Start from file system (Hadoop) (-) Pay cost at

    query time (-) Batch vs. real-time (-) Indexes (Append only FS) (+) Any data, any structure (+) ’Infinitely’ scalable storage (+) Write fast Umur Cubukcu | Strata Data Conference | March 2018 | The State of Postgres
  9. 10.

    10 Worry about only one access pattern (-) No expressiveness

    for analytics (-) No JOINS, data duplication (-) Enforce structure at app layer Semi-structured (JSON) (+) Simple: Put & Get (+) Scalable Umur Cubukcu | Strata Data Conference | March 2018 | The State of Postgres
  10. 11.

    11 Umur Cubukcu | Strata Data Conference | March 2018

    | The State of Postgres Table "public.events" Column | Type | Sample Data ------------------------------------------------------------------ user_id | bigint | 09288 created_at | timestamp | 2018-03-08 00:57:12.6936+00 payload | jsonb | Extend the database for JSON data TO STRUCTURE OR NOT TO STRUCTURE? 1
  11. 12.

    B-tree indexes GIN & GiST indexes Secondary indexes Full text

    search Index-only scans Fitting indexes into memory + Not to forget: Parallel queries, MVCC, and many more. Leverage indexing (and other fundamentals) SCALING COMPUTE & PERFORMANCE 2 Umur Cubukcu | Strata Data Conference | March 2018 | The State of Postgres 12
  12. 13.

    13 SELECT FROM events a JOIN users b SELECT FROM

    (a JOIN b) SELECT FROM (a JOIN b) Data Node 1 events Events_101 Events_103 SELECT FROM (a JOIN b) SELECT FROM (a JOIN b) Data Node 2 Data Node N . . . . . . Users_101 Users_103 … users SCALING COMPUTE & PERFORMANCE 2 Events_104 Events_102 Users_102 Users_104 Push computations (and joins) down to many PostgreSQL instances Umur Cubukcu | Strata Data Conference | March 2018 | The State of Postgres
  13. 15.

    PostgreSQL: Vibrant, global ecosystem citus pgcrypto pg_cron pg_partman postgresql-HLL cstore_fdw

    unaccent cube jdbc_fdw pg_trgm PostGIS … Sample PostgreSQL Extensions Integrations Umur Cubukcu | Strata Data Conference | March 2018 | The State of Postgres pg_buffercache pg_prewarm btree_gin btree_gist postgis_topology pg_stat_statements postgresql-unit plpgsql plv8 pg_telemetry foreign data wrappers … 15
  14. 16.

    PostgreSQL on fire PostgreSQL MySQL MongoDB SQL Server + Oracle

    Source: % database job postings that mention each specific technology, across 20K+ job posts on Hacker News, https://news.ycombinator.com Database adoption among developers1 Umur Cubukcu | Strata Data Conference | March 2018 | The State of Postgres 16
  15. 17.

    Source: Google Trends for the past 2 years Winning Startups

    & Enterprises 0 10 20 30 40 50 60 70 80 90 100 PG Mongo Hadoop PostgreSQL popularity = Hadoop + Mongo combined Growing from already vast user base Umur Cubukcu | Strata Data Conference | March 2018 | The State of Postgres 17
  16. 18.

    So there’s an elephant in the room 18 Umur Cubukcu

    | Strata Data Conference | March 2018 | The State of Postgres How does it all fit in with your stack?
  17. 19.

    Modern workloads are evolving 19 (OLAP) Workloads Proprietary Open Source

    Operations Analytics (OLTP) RDBMS Improvement workloads Application workloads - Transactions - Short-requests - In-app analytics Umur Cubukcu | Strata Data Conference | March 2018 | The State of Postgres
  18. 20.

    Modern databases serve 3 types of apps 20 Time to

    action Data volume Application data Systems of record • Core workloads, transactions • Real-time data • Millisecond latencies Systems of engagement • Drive engagement & revenue • Real-time data, multiple sources • Sub-second latencies Systems of improvement • Identify business process improvements • Offline data, multiple sources • Sub-minute / hour latencies, data analysts 1 3 2 Umur Cubukcu | Strata Data Conference | March 2018 | The State of Postgres
  19. 21.

    PostgreSQL in your infrastructure stack 21 PostgreSQL Note: Standard PostgreSQL

    connectors for all tools (e.g. ODBC / JDBC, PostgreSQL language bindings) available for integrations. Application • Standalone database • Storage • Compute Data Spark HDFS / S3 • Persistence layer for Spark • Persistence layer for Kafka Kafka NoSQL • Adjacent to NoSQL Umur Cubukcu | Strata Data Conference | March 2018 | The State of Postgres
  20. 22.
  21. 23.

    Parting thoughts: PostgreSQL becoming the Linux of Databases 23 Extensibility

    Versatility Ecosystem Umur Cubukcu | Strata Data Conference | March 2018 | The State of Postgres
  22. 24.

    Thank you citusdata.com/jobs Umur Cubukcu | Citus Data | Strata

    Data Conference 2018 @citusdata github.com/citusdata/citus @umurc We’re Hiring!