Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Is PostgreSQL becoming THE real-time analytics database? | PGCon 2018 | Ozgun Erdogan

Is PostgreSQL becoming THE real-time analytics database? | PGCon 2018 | Ozgun Erdogan

Citus Data

May 31, 2018
Tweet

More Decks by Citus Data

Other Decks in Technology

Transcript

  1. Is PostgreSQL becoming THE
    real-time analytics database?
    Ozgun Erdogan
    Citus Data
    PGCon | May 2018

    View Slide

  2. What is real-time analytics?
    •  Real-time analytics is an emerging workload for
    databases. This workload is also known as in-memory
    databases or Hybrid Transactional Analytical
    Processing (HTAP).
    •  The real-time analytics workload includes use-cases
    such as powering customer facing dashboards,
    behavior analytics (segmentation and funnels), IoT /
    device analytics, time-series databases, and
    geospatial analysis.

    View Slide

  3. New databases for real-time analytics
    •  In past years, proprietary databases have been built to
    handle real-time analytics workloads. An example is
    SAP HANA.
    •  Postgres’ new features and extensions make it
    appealing and competitive for real-time analytics –
    while also providing the benefits of open source.
    •  This lightning talk showcases five exciting
    developments for real-time analytics workloads.

    View Slide

  4. (1) JIT compilation for faster queries
    •  Andres has been working on JIT compilation using
    LLVM since Postgres 9.6. These changes improve
    analytical query performance for in-memory datasets.
    Diagram from Andres’
    FOSDEM presentation:
    http://anarazel.de/talks/
    fosdem-2018-02-03/jit.pdf

    View Slide

  5. (2) Approximation algos for fast replies
    •  When your database powers customer facing
    dashboards, your analytical queries need to go over
    hundreds of millions of rows in less than a second.
    •  The HyperLogLog (HLL) extension provides an
    approximation algorithm to count the number of
    distinct items with tunable precision.
    •  The TopN extension allows you to store, merge, and
    serve approximations for order by-group by-limit
    queries.

    View Slide

  6. (3) Citus – Scale out Postgres
    S1
    Citus
    Node #1
    PostgreSQL
    Primary
    WAL logs
    Monitoring Agents
    (Auto node failover)
    10.0.0.15
    (Elastic IP)
    Application
    (Ruby, Pyhton,
    Java, …)
    EBS volume
    S5
    S7 S8
    Shards (S1, S5, S7, S8)
    Transparent to application
    S2
    Citus
    Node #8
    PostgreSQL
    Primary
    WAL logs
    10.0.0.25
    (Elastic IP)
    EBS volume
    S3
    S4 S9
    …….
    PgBou
    ncer
    PgBou
    ncer

    View Slide

  7. (4) Native partitioning & partition by time
    •  Postgres 10 introduced native partitioning.
    •  Real-time analytics data has a time dimension to it.
    This makes it suitable to partition by time – so that
    your indexes remain local to your partitions and data
    expiration becomes simple.
    •  Common pattern: Shard by granular key, such as
    user_id or device_id. Then partition by time.

    View Slide

  8. (5) You can use these features together
    •  A common pattern in real-time analytics is hot
    updateable data and cold compressed data.
    •  You update data for the most recent partition for that
    day. You then “close out” the partition and rotate this
    data into a cold partition.
    •  With large data sets, you want to compress your cold
    partitions. CStore foreign data wrapper can sort and
    compress cold data.

    View Slide

  9. (5) Shard on granular key, partition by time,
    and cstore_fdw to compress cold data
    Image from
    Microsoft.com
    (SQL Server 2016)

    View Slide

  10. Conclusion: Postgres is awesome
    •  Proprietary databases built in features tailored to real-
    time analytics workloads.
    •  Three years ago at PGCon:
    •  https://www.citusdata.com/blog/2015/08/19/futuristic-
    pgshard-demo/ (Dynamically changing row and
    columnar store in Postgres)
    •  The future is today!

    View Slide