Is PostgreSQL becoming THE
real-time analytics database?
PGCon | May 2018
What is real-time analytics?
• Real-time analytics is an emerging workload for
databases. This workload is also known as in-memory
databases or Hybrid Transactional Analytical
• The real-time analytics workload includes use-cases
such as powering customer facing dashboards,
behavior analytics (segmentation and funnels), IoT /
device analytics, time-series databases, and
New databases for real-time analytics
• In past years, proprietary databases have been built to
handle real-time analytics workloads. An example is
• Postgres’ new features and extensions make it
appealing and competitive for real-time analytics –
while also providing the benefits of open source.
• This lightning talk showcases five exciting
developments for real-time analytics workloads.
(1) JIT compilation for faster queries
• Andres has been working on JIT compilation using
LLVM since Postgres 9.6. These changes improve
analytical query performance for in-memory datasets.
Diagram from Andres’
(2) Approximation algos for fast replies
• When your database powers customer facing
dashboards, your analytical queries need to go over
hundreds of millions of rows in less than a second.
• The HyperLogLog (HLL) extension provides an
approximation algorithm to count the number of
distinct items with tunable precision.
• The TopN extension allows you to store, merge, and
serve approximations for order by-group by-limit
(3) Citus – Scale out Postgres
(Auto node failover)
Shards (S1, S5, S7, S8)
Transparent to application
(4) Native partitioning & partition by time
• Postgres 10 introduced native partitioning.
• Real-time analytics data has a time dimension to it.
This makes it suitable to partition by time – so that
your indexes remain local to your partitions and data
expiration becomes simple.
• Common pattern: Shard by granular key, such as
user_id or device_id. Then partition by time.
(5) You can use these features together
• A common pattern in real-time analytics is hot
updateable data and cold compressed data.
• You update data for the most recent partition for that
day. You then “close out” the partition and rotate this
data into a cold partition.
• With large data sets, you want to compress your cold
partitions. CStore foreign data wrapper can sort and
compress cold data.
(5) Shard on granular key, partition by time,
and cstore_fdw to compress cold data
(SQL Server 2016)
Conclusion: Postgres is awesome
• Proprietary databases built in features tailored to real-
time analytics workloads.
• Three years ago at PGCon:
pgshard-demo/ (Dynamically changing row and
columnar store in Postgres)
• The future is today!