PostgreSQL is becoming the relational database of choice. An important factor in the rising popularity of Postgres is the extension APIs that allow developers to improve any database module’s behavior. As a result, Postgres users have access to hundreds of extensions today.
In this talk, we're going to first describe extension APIs. Then, we’re going to present four popular Postgres extensions, and demo their use.
* PostGIS turns Postgres into a spatial database through adding support for geographic objects.
* HLL & TopN add approximation algorithms to Postgres. These algorithms are used when real-time responses matter more than exact results.
* pg_partman makes managing partitions in Postgres easy. Through partitions, Postgres provide 5-10x higher performance for time-series data.
* Citus transforms Postgres into a distributed database. To do this, Citus shards data, performs distributed deadlock detection, and parallelizes queries.
Finally, we’ll conclude with why we think Postgres sets the way forward for relational databases.
PostgreSQL is becoming the relational database of choice. One important factor in the rising popularity of Postgres are the extension APIs. These APIs allow developers to extend any database sub-module’s behavior for higher performance, security, or new functionality. As a result, Postgres users have access to over a hundred extensions today, and more to come in the future.
In this talk, I’m going to first describe PostgreSQL’s extension APIs. These APIs are unique to Postgres, and have the potential to change the database landscape. Then, we’re going to present the four most popular Postgres extensions, show the use cases where they are applicable, and demo their usage.
PostGIS turns Postgres into a spatial database. It adds support for geographic objects, allowing location queries to be run in SQL.
HyperLogLog (HLL) & TopN add approximation algorithms to Postgres. These sketch algorithms are used in distributed systems when real-time responses to queries matter more than exact results.
pgpartman makes creating and managing partitions in Postgres easy. Through careful partition management with pgpartman, Postgres offers 5-10x higher write and query performance for time-series data.
Citus transforms Postgres into a distributed database. Citus transparently shards and replicates data, performs distributed deadlock detection, and parallelizes queries.
After demoing these popular extensions, we’ll conclude with why we think the monolithic relational database is dying and how Postgres sets a path for the future. We’ll end the talk with a Q&A.
PostgreSQL Extension APIs
are Changing the Face of
PGCon | May 2018
• I compiled these slides after going through a technical
due diligence step for Citus Data.
• So, this talk assumes that you don’t know much about
PostgreSQL extension APIs.
• The talk goes over five example extensions. If any of
these extensions is too familiar, I’m happy to skip over
I love Postgres
CTO of Citus Data
Formerly of Amazon
Love drinking margaritas
Our mission at Citus Data
Make it so that your business
never has to worry about
scaling their database again
1. What is unique about PostgreSQL?
• The extension APIs
2. PostgreSQL extensions can be a game
changer for databases
1. What is an extension?
2. Why can extensions change databases?
3. Postgres can’t do “this”
• Semi-structured or unstructured data
• Approximation algorithms for fast results
• Geospatial database
• S3 or columnar storage for storage
• Scale out
What is an Extension
• An extension is a piece of software that adds
functionality to Postgres. Each extension bundles
related objects together.
• Postgres 9.1 started providing official APIs to override
or extend any database module’s behavior.
• “CREATE EXTENSION citus;” dynamically loads these
objects into Postgres’ address space.
What can you Extend in Postgres?
• You can override, cooperate with, or extend any
combination of the following database modules:
• Type system and operators
• User defined functions and aggregates
• Storage system and indexes
• Write ahead logging and replication
• Transaction engine
• Background worker processes
• Query planner and query executor
• Configuration and database metadata
Why are Extensions so important
• Every decade brings new workloads for databases.
• The last decade was about capturing more data, in
more shapes and form.
• Postgres has been forked by dozens of commercial
databases for new workloads. When you fork, your
database diverges from the community.
• What if you could leverage the database ecosystem
and grow with it?
Extending a relational database: Really?
Extending a relational database is a relatively new idea.
Over the years, we received questions on this new idea.
1. Forking vs extensions: Can you really extend any
2. Building from scratch vs extensions: Postgres is a
relational database from an old era. It can’t do “this”.
Relational databases can’t do “this”
Postgres isn’t designed for “this”:
1. Process semi-structured
2. Approximate and fast query results
3. Run geospatial workloads
4. Non-relational data storage
5. Scale out for large datasets
Postgres can’t do semi-structured data
• NoSQL popularized the use of semi-structured data as
an alternative to data models used in relational
databases. In practice, each model has benefits.
• Postgres has an extensible type system. It already
supports semi-structured data types:
2. Full-text search
3. Hstore: precursor to JSONB
4. JSON / JSONB
JSONB data type – store and query
JSONB data type – aggregate and index
Postgres can do semi-structured data
• PostgreSQL stores and processes semi-structured data
just as efficiently as NoSQL databases. You also get
rich features that come with a relational database.
• http://goo.gl/NuoLgP (Mongo vs Postgres jsonb benchmarks)
• If your semi-structured or unstructured data can’t be
served by existing data types, you can always create
your own type. You can even add operators, aggregate
functions, or indexes.
Postgres can’t do query approximation
• Real-time analytics is an emerging workload for databases.
• You use Postgres to power a customer facing dashboard.
Your analytical queries require sub-second response times.
HLL – count(distinct) storage
HLL – count(distinct) query
Postgres can do fast / approximate queries
• Real-time analytics databases (such as Spark or
Elastic Search) can provide fast answers to analytics
queries using approximation algorithms.
• PostgreSQL offers the same functionality through its
1. HLL provides count(distinct) approximation.
2. TopN stores and merges top rows in a database according to
3. TDigest or HDR provide percentile approximation across large
Postgres can’t be a spatial database
• A spatial database stores and
queries data that represents
objects defined in a geometric
• Spatial databases represent
geometric objects such as
lines and polygons. Some
databases handle complex
structures such as 3D objects
and topological coverages.
PostGIS – Geographic objects
PostGIS – Geospatial joins
Postgres can become a spatial database
• The PostGIS extension turns PostgreSQL into one of
most popular geospatial databases in the world.
• Thousands of companies use PostGIS for spatial
workloads – from projects such as OpenStreetMap to
start-ups like Hotel Tonight.
• If you need more from your spatial database, you can
easily extend Postgres. In fact, PostGIS comes with six
other extensions for specific use cases.
Postgres can only do row storage
• Postgres 9.1+ comes with foreign data wrapper APIs.
With these APIs, you can add read from or write to any
• Postgres already has 106 wrappers. With these, you
can run SQL commands on diverse data sources:
1. S3 (read-only)
CStore – Columnar storage
• CStore is under
doesn’t yet support
Update / Delete
• Cstore’s primary
benefit today is
use it to reduce in-
memory and storage
(configurable) 10K column values
ORC file format
CStore – Data Load and Query
Postgres can do more than row stores
• Default storage engine for relational databases is row-
oriented. But, Postgres can do way more than row stores.
• You can extend Postgres to store data in a columnar
format or interact with other databases – such as
DynamoDB or Oracle.
• Postgres provides extension apis to (1) scan foreign
tables, (2) scan foreign joins, (3) update foreign tables, (4)
lock rows, (5) sample data, (6) override planner and
executor, and more.
Postgres doesn’t scale
• “SQL doesn’t scale” answers a complex problem by
making a simple statement.
• SQL means different things to different people.
Depending on the context, it could mean multi-tenant
(B2B) databases, short read/writes, real-time analytics,
or data warehousing.
• Scaling each one of these workloads require extending
the relational database in a different way.
Citus – Distributed database
1. Citus scales out PostgreSQL
• Uses sharding and replication
• Query engine parallelizes SQL queries across machines
2. Citus extends PostgreSQL
• Uses Postgres extension APIs to cooperate with or extend all
3. Available in 3 ways
• Open source, enterprise software, and managed database as a
service on AWS
Citus – Scaling out PostgreSQL
Citus – Architecture diagram (simplified)
SELECT sum(…), count(…) FROM
SELECT sum … FROM teams_1003
Worker node 1
SELECT sum … FROM teams_1002
SELECT sum … FROM teams_1004
Worker node 2
Worker node N
Each node Postgres with Citus installed
1 shard = 1 Postgres table
SELECT avg(..) FROM teams;
Postgres can scale
• “SQL doesn’t scale” is a
simple statement to a
complex problem. It’s easy
to dismiss a complex
problem by making a
statement - that trivializes
• SQL is hard, not
impossible, to scale.
• Postgres Extension APIs provide a unique way to build
• Postgres can be extended to many different workloads
1. jsonb: Semi-structured data
2. HyperLogLog: Fast and approximate count(distinct)
3. PostGIS: Geospatial database
4. cstore_fdw: columnar storage (in works)
5. Citus: Scale out your database
• Postgres 10 enables you to extend any database
module’s behavior. This way, you can use functionality
built into Postgres over decades. You can also grow
with the rich ecosystem of tools and libraries.
• Extensions are a game changer for databases.
• The monolithic relational database could be dying. If
so, long live Postgres!
© 2017 Citus Data. All right reserved.
• Demo that shows how different Postgres extensions
can work together!