Is PostgreSQL becoming THE real-time analytics database? | PGCon 2018 | Ozgun Erdogan

Is PostgreSQL becoming THE real-time analytics database? Ozgun Erdogan Citus
Data PGCon | May 2018

What is real-time analytics? •  Real-time analytics is an emerging
workload for databases. This workload is also known as in-memory databases or Hybrid Transactional Analytical Processing (HTAP). •  The real-time analytics workload includes use-cases such as powering customer facing dashboards, behavior analytics (segmentation and funnels), IoT / device analytics, time-series databases, and geospatial analysis.

New databases for real-time analytics •  In past years, proprietary
databases have been built to handle real-time analytics workloads. An example is SAP HANA. •  Postgres’ new features and extensions make it appealing and competitive for real-time analytics – while also providing the benefits of open source. •  This lightning talk showcases five exciting developments for real-time analytics workloads.

(1) JIT compilation for faster queries •  Andres has been
working on JIT compilation using LLVM since Postgres 9.6. These changes improve analytical query performance for in-memory datasets. Diagram from Andres’ FOSDEM presentation: http://anarazel.de/talks/ fosdem-2018-02-03/jit.pdf

(2) Approximation algos for fast replies •  When your database
powers customer facing dashboards, your analytical queries need to go over hundreds of millions of rows in less than a second. •  The HyperLogLog (HLL) extension provides an approximation algorithm to count the number of distinct items with tunable precision. •  The TopN extension allows you to store, merge, and serve approximations for order by-group by-limit queries.

(3) Citus – Scale out Postgres S1 Citus Node #1
PostgreSQL Primary WAL logs Monitoring Agents (Auto node failover) 10.0.0.15 (Elastic IP) Application (Ruby, Pyhton, Java, …) EBS volume S5 S7 S8 Shards (S1, S5, S7, S8) Transparent to application S2 Citus Node #8 PostgreSQL Primary WAL logs 10.0.0.25 (Elastic IP) EBS volume S3 S4 S9 ……. PgBou ncer PgBou ncer

(4) Native partitioning & partition by time •  Postgres 10
introduced native partitioning. •  Real-time analytics data has a time dimension to it. This makes it suitable to partition by time – so that your indexes remain local to your partitions and data expiration becomes simple. •  Common pattern: Shard by granular key, such as user_id or device_id. Then partition by time.

(5) You can use these features together •  A common
pattern in real-time analytics is hot updateable data and cold compressed data. •  You update data for the most recent partition for that day. You then “close out” the partition and rotate this data into a cold partition. •  With large data sets, you want to compress your cold partitions. CStore foreign data wrapper can sort and compress cold data.

(5) Shard on granular key, partition by time, and cstore_fdw
to compress cold data Image from Microsoft.com (SQL Server 2016)

Conclusion: Postgres is awesome •  Proprietary databases built in features
tailored to real- time analytics workloads. •  Three years ago at PGCon: •  https://www.citusdata.com/blog/2015/08/19/futuristic- pgshard-demo/ (Dynamically changing row and columnar store in Postgres) •  The future is today!

Is PostgreSQL becoming THE real-time analytics ...

Is PostgreSQL becoming THE real-time analytics database? | PGCon 2018 | Ozgun Erdogan

Citus Data

More Decks by Citus Data

Other Decks in Technology

Featured

Transcript

Is PostgreSQL becoming THE real-time analytics database? Ozgun Erdogan Citus

What is real-time analytics? •  Real-time analytics is an emerging

New databases for real-time analytics •  In past years, proprietary

(1) JIT compilation for faster queries •  Andres has been

(2) Approximation algos for fast replies •  When your database

(3) Citus – Scale out Postgres S1 Citus Node #1

(4) Native partitioning & partition by time •  Postgres 10

(5) You can use these features together •  A common

(5) Shard on granular key, partition by time, and cstore_fdw

Conclusion: Postgres is awesome •  Proprietary databases built in features