Slide 1

Slide 1 text

PostgreSQL Extension APIs are Changing the Face of Relational Databases Ozgun Erdogan Citus Data PGCon | May 2018

Slide 2

Slide 2 text

Disclaimer • I compiled these slides after going through a technical due diligence step for Citus Data. • So, this talk assumes that you don’t know much about PostgreSQL extension APIs. • The talk goes over five example extensions. If any of these extensions is too familiar, I’m happy to skip over them.

Slide 3

Slide 3 text

I love Postgres 3 Ozgun Erdogan CTO of Citus Data Distributed Systems Distributed Databases Formerly of Amazon Love drinking margaritas

Slide 4

Slide 4 text

4

Slide 5

Slide 5 text

Our mission at Citus Data 5 Make it so that your business never has to worry about scaling their database again

Slide 6

Slide 6 text

Punch Line 1. What is unique about PostgreSQL? • The extension APIs 2. PostgreSQL extensions can be a game changer for databases

Slide 7

Slide 7 text

Talk Outline 1. What is an extension? 2. Why can extensions change databases? 3. Postgres can’t do “this” • Semi-structured or unstructured data • Approximation algorithms for fast results • Geospatial database • S3 or columnar storage for storage • Scale out 4. Conclusion 5. Demo

Slide 8

Slide 8 text

What is an Extension • An extension is a piece of software that adds functionality to Postgres. Each extension bundles related objects together. • Postgres 9.1 started providing official APIs to override or extend any database module’s behavior. • “CREATE EXTENSION citus;” dynamically loads these objects into Postgres’ address space.

Slide 9

Slide 9 text

What can you Extend in Postgres? • You can override, cooperate with, or extend any combination of the following database modules: • Type system and operators • User defined functions and aggregates • Storage system and indexes • Write ahead logging and replication • Transaction engine • Background worker processes • Query planner and query executor • Configuration and database metadata

Slide 10

Slide 10 text

Why are Extensions so important • Every decade brings new workloads for databases. • The last decade was about capturing more data, in more shapes and form. • Postgres has been forked by dozens of commercial databases for new workloads. When you fork, your database diverges from the community. • What if you could leverage the database ecosystem and grow with it?

Slide 11

Slide 11 text

Extending a relational database: Really? Extending a relational database is a relatively new idea. Over the years, we received questions on this new idea. 1. Forking vs extensions: Can you really extend any database module? 2. Building from scratch vs extensions: Postgres is a relational database from an old era. It can’t do “this”.

Slide 12

Slide 12 text

Relational databases can’t do “this” Postgres isn’t designed for “this”: 1. Process semi-structured 2. Approximate and fast query results 3. Run geospatial workloads 4. Non-relational data storage 5. Scale out for large datasets

Slide 13

Slide 13 text

Postgres can’t do semi-structured data • NoSQL popularized the use of semi-structured data as an alternative to data models used in relational databases. In practice, each model has benefits. • Postgres has an extensible type system. It already supports semi-structured data types: 1. XML 2. Full-text search 3. Hstore: precursor to JSONB 4. JSON / JSONB

Slide 14

Slide 14 text

JSONB data type – store and query from compose.com

Slide 15

Slide 15 text

JSONB data type – aggregate and index

Slide 16

Slide 16 text

Postgres can do semi-structured data • PostgreSQL stores and processes semi-structured data just as efficiently as NoSQL databases. You also get rich features that come with a relational database. • http://goo.gl/NuoLgP (Mongo vs Postgres jsonb benchmarks) • If your semi-structured or unstructured data can’t be served by existing data types, you can always create your own type. You can even add operators, aggregate functions, or indexes.

Slide 17

Slide 17 text

Postgres can’t do query approximation • Real-time analytics is an emerging workload for databases. • You use Postgres to power a customer facing dashboard. Your analytical queries require sub-second response times.

Slide 18

Slide 18 text

HLL – count(distinct) storage

Slide 19

Slide 19 text

HLL – count(distinct) query

Slide 20

Slide 20 text

Postgres can do fast / approximate queries • Real-time analytics databases (such as Spark or Elastic Search) can provide fast answers to analytics queries using approximation algorithms. • PostgreSQL offers the same functionality through its extensions. 1. HLL provides count(distinct) approximation. 2. TopN stores and merges top rows in a database according to some criteria. 3. TDigest or HDR provide percentile approximation across large datasets.

Slide 21

Slide 21 text

Postgres can’t be a spatial database • A spatial database stores and queries data that represents objects defined in a geometric space. • Spatial databases represent geometric objects such as lines and polygons. Some databases handle complex structures such as 3D objects and topological coverages. from boundlessgeo.com

Slide 22

Slide 22 text

PostGIS – Geographic objects

Slide 23

Slide 23 text

PostGIS – Geospatial joins

Slide 24

Slide 24 text

Postgres can become a spatial database • The PostGIS extension turns PostgreSQL into one of most popular geospatial databases in the world. • Thousands of companies use PostGIS for spatial workloads – from projects such as OpenStreetMap to start-ups like Hotel Tonight. • If you need more from your spatial database, you can easily extend Postgres. In fact, PostGIS comes with six other extensions for specific use cases.

Slide 25

Slide 25 text

Postgres can only do row storage • Postgres 9.1+ comes with foreign data wrapper APIs. With these APIs, you can add read from or write to any data source. • Postgres already has 106 wrappers. With these, you can run SQL commands on diverse data sources: 1. S3 (read-only) 2. MongoDB 3. Oracle 4. Cstore_fdw

Slide 26

Slide 26 text

CStore – Columnar storage • CStore is under development. For example, cstore doesn’t yet support Update / Delete commands. • Cstore’s primary benefit today is compression. People use it to reduce in- memory and storage footprint.

Slide 27

Slide 27 text

Block 1 Block 2 Block 3 Block 4 Block 5 Block 6 Block 7 150K rows (configurable) 150K rows (configurable) 10K column values (configurable) per block ORC file format

Slide 28

Slide 28 text

CStore – Data Load and Query

Slide 29

Slide 29 text

Postgres can do more than row stores • Default storage engine for relational databases is row- oriented. But, Postgres can do way more than row stores. • You can extend Postgres to store data in a columnar format or interact with other databases – such as DynamoDB or Oracle. • Postgres provides extension apis to (1) scan foreign tables, (2) scan foreign joins, (3) update foreign tables, (4) lock rows, (5) sample data, (6) override planner and executor, and more.

Slide 30

Slide 30 text

Postgres doesn’t scale • “SQL doesn’t scale” answers a complex problem by making a simple statement. • SQL means different things to different people. Depending on the context, it could mean multi-tenant (B2B) databases, short read/writes, real-time analytics, or data warehousing. • Scaling each one of these workloads require extending the relational database in a different way.

Slide 31

Slide 31 text

Citus – Distributed database 1. Citus scales out PostgreSQL • Uses sharding and replication • Query engine parallelizes SQL queries across machines 2. Citus extends PostgreSQL • Uses Postgres extension APIs to cooperate with or extend all database modules 3. Available in 3 ways • Open source, enterprise software, and managed database as a service on AWS

Slide 32

Slide 32 text

Citus – Scaling out PostgreSQL

Slide 33

Slide 33 text

Citus – Architecture diagram (simplified) Coordinator SELECT sum(…), count(…) FROM teams_1001 SELECT sum … FROM teams_1003 Worker node 1 Table metadata Table_1001 Table_1003 SELECT sum … FROM teams_1002 SELECT sum … FROM teams_1004 Worker node 2 Table_1002 Table_1004 Worker node N . . . . . . Each node Postgres with Citus installed 1 shard = 1 Postgres table SELECT avg(..) FROM teams;

Slide 34

Slide 34 text

Postgres can scale • “SQL doesn’t scale” is a simple statement to a complex problem. It’s easy to dismiss a complex problem by making a statement - that trivializes the problem. • SQL is hard, not impossible, to scale.

Slide 35

Slide 35 text

Summary • Postgres Extension APIs provide a unique way to build new databases. • Postgres can be extended to many different workloads 1. jsonb: Semi-structured data 2. HyperLogLog: Fast and approximate count(distinct) 3. PostGIS: Geospatial database 4. cstore_fdw: columnar storage (in works) 5. Citus: Scale out your database

Slide 36

Slide 36 text

Conclusion • Postgres 10 enables you to extend any database module’s behavior. This way, you can use functionality built into Postgres over decades. You can also grow with the rich ecosystem of tools and libraries. • Extensions are a game changer for databases. • The monolithic relational database could be dying. If so, long live Postgres!

Slide 37

Slide 37 text

© 2017 Citus Data. All right reserved. [email protected] @citusdata Ozgun Erdogan www.citusdata.com citusdata.com/ newsletter

Slide 38

Slide 38 text

Demo • Demo that shows how different Postgres extensions can work together!