Architecting operational real-time analytics apps with PostgreSQL & Hyperscale (Citus) | PGConf India 2020 | Parikshit Savjani

Architecting operational real-time analytics apps with PostgreSQL & Hyperscale (Citus) | PGConf India 2020 | Parikshit Savjani

Data is exploding with billions of device connected and emitting petabytes scale of data every minute. Postgres is the world’s most advanced open source relational database with NoSQL capabilities built in—and is growing in usage globally, including for business-critical, real-time streaming and operational analytics applications. With Hyperscale (Citus) now available as a built-in deployment option in Azure Database for PostgreSQL, you can architect Hybrid transactional analytical applications on Postgres at any scale on Azure. In this hands-on lab, you will build an operational real-time analytics & visualization app using PowerBI and Azure Database for PostgreSQL. Come and attend this workshop to architect a database for operational real-time analytics with Hyperscale (Citus)—from schema to data model to building real time dashboards in PowerBI.

143117954187136b825331f24da0e201?s=128

Azure Postgres

February 26, 2020
Tweet

Transcript

  1. Architecting operational real- time analytics apps with PostgreSQL & Hyperscale

    (Citus) Deepthi Anantharam Shashikant Shakya Parikshit Savjani Postgres Team at Microsoft @AzureDBPostgres @citusdata
  2. Session learning objective Architect a database for operational real-time analytics

    with Hyperscale (Citus) from schema <> data model <> rollups <> PowerBI
  3. How many of you are familiar with Citus extension on

    Postgres?
  4. How many of them have operational real-time analytics workloads?

  5. What we mean by operational real-time analytics End-users (lots of

    them) can run reports & complex analytics queries on-the-fly, on “fresh” (real-time) data
  6. Examples of operational real-time analytics Network telemetry Clickstream analysis IOT

  7. 5 requirements that real-time analytics applications all have • Large

    amounts of data • Sub-second response times • Large number of concurrent users • Ingests new data within seconds / minutes • Supports advanced analytics
  8. Why Postgres for real-time analytics? 6 reasons • Both real-time

    ingest & complex analytics in same database • Data types, operators, functions • Pre-aggregations: INSERT … SELECT, unlogged tables, temp tables • Extensions: pg_partman, pg_cron, hll, topn • Connectivity with rest of data pipeline • Scale to very large data volumes (100s of TBs) with Hyperscale (Citus)
  9. Why Postgres for real-time analytics? 6 reasons • Both real-time

    ingest & complex analytics in same database • Data types, operators, functions • Pre-aggregations: INSERT … SELECT, unlogged tables, temp tables • Extensions: pg_partman, pg_cron, hll, topn • Connectivity with rest of data pipeline • Scale to very large data volumes (100s of TBs) with Hyperscale (Citus)
  10. Why Postgres for real-time analytics? 6 reasons • Both real-time

    ingest & complex analytics in same database • Data types, operators, functions • Pre-aggregations: INSERT … SELECT, unlogged tables, temp tables • Extensions: pg_partman, pg_cron, hll, topn • Connectivity with rest of data pipeline • Scale to very large data volumes (100s of TBs) with Hyperscale (Citus)
  11. Why Postgres for real-time analytics? 6 reasons • Both real-time

    ingest & complex analytics in same database • Data types, operators, functions • Pre-aggregations: INSERT … SELECT, unlogged tables, temp tables • Extensions: pg_partman, pg_cron, hll, topn • Connectivity with rest of data pipeline • Scale to very large data volumes (100s of TBs) with Hyperscale (Citus)
  12. Why Postgres for real-time analytics? 6 reasons • Both real-time

    ingest & complex analytics in same database • Data types, operators, functions • Pre-aggregations: INSERT … SELECT, unlogged tables, temp tables • Extensions: pg_partman, pg_cron, hll, topn • Connectivity with rest of data pipeline • Scale to very large data volumes (100s of TBs) with Hyperscale (Citus)
  13. Why Postgres for real-time analytics? 6 reasons • Both real-time

    ingest & complex analytics in same database • Data types, operators, functions • Pre-aggregations: INSERT … SELECT, unlogged tables, temp tables • Extensions: pg_partman, pg_cron, hll, topn • Connectivity with rest of data pipeline • Scale to very large data volumes (100s of TBs) with Hyperscale (Citus)
  14. 2 deployment options for Azure Database for PostgreSQL: Single server

    & Hyperscale (Citus)
  15. Scaling out your Postgres database horizontally Each node has PostgreSQL

    with Citus installed. Each shard [ ] is 1 PostgreSQL table. Gives your app more compute, memory, disk. Easy to add nodes for growth or performance. Coordinator node
  16. Distributed PostgreSQL scales better APPLICATION SELECT FROM GROUP BY company_id,

    avg(spend) AS avg_campaign_spend compaigns company_id; METADATA COORDINATOR NODE WORKER NODES W1 W2 W3 … Wn SELECT company_id sum(spend), count(spend) … FROM campaigns_2009 … SELECT company_id sum(spend), count(spend) … FROM campaigns_2001 … SELECT company_id sum(spend), count(spend) … FROM campaigns_2017 … DISTRIBUTED AGGREGATE
  17. Microsoft Windows team relies on Citus and Postgres (on Azure)

    for mission-critical shiproom decisions
  18. “We can support 100s of concurrent users & more than

    6M queries every day. With Citus, response times for 75% of queries are less than 200 ms. And response times for 95% of queries are less than 3 seconds.”
  19. 100s of concurrent users 6M queries every day 75% of

    queries < 200 ms. 95% of queries < 3 sec
  20. “Distributed PostgreSQL with Citus is a game changer” https://techcommunity.microsoft.com/t5/azure-database-for-postgresql/architecting- petabyte-scale-analytics-by-scaling-out-postgres-on/ba-p/969685

  21. ~10x faster vs. Elasticsearch Powerful indexing: semi-structured data Focus on

    app by offloading management w/PaaS 300 GB+ data Majority of queries < 2 sec
  22. PostgreSQL tips for real-time analytics applications • Azure Database for

    PostgreSQL as your fully- managed database • COPY to load large amounts of raw data • Indexes to find new events during queries & pre- aggregations • Rollup tables built from raw event data • Incremental aggregation if you can have late data • Partitioning with pg_partman to expire old data • Automate pipeline with pg_cron • HLL, TopN to incrementally approximate complex queries • Hyperscale (Citus) to scale out
  23. It’s time for the Technical, Hands-on Lab

  24. Scenario: Cloud services provider Helping businesses monitor HTTP traffic Every

    time client receives HTTP request, your service receives a log record Ingest records & create HTTP analytics dashboard to give your clients insights Fast queries for results in real-time Analyze multiple sites at once OR a analyze single site at a time
  25. Technical Lab—the steps involved UNDERSTAND THE SCENARIO SETUP HYPERSCALE (CITUS)

    SERVER GROUP & FIREWALL DESIGN DATA MODEL AND SHARDING CREATE ROLLUP TABLES EXPIRE OLDER DATA USE APPROXIMATIONS (HLL, TOPN) FOR FAST QUERIES CONNECT WITH POWERBI FOR POWERFUL VISUALIZATIONS
  26. Let’s get started! Architecting operational real-time analytics apps with PostgreSQL

    & Hyperscale (Citus) aka.ms/AA7dv4k Activation Code: ACTIVATE5349
  27. Session takeaways 1. Use Azure Database for PostgreSQL with Hyperscale

    (Citus) to scale out the database horizontally—to deliver fast performance to real- time analytics applications 2. Use COPY to load large amounts of raw data 3. Use Indexes to find new events during queries & pre-aggregations 4. Use Rollup tables built from raw event data 5. Use Incremental aggregation if you can have late data 6. Use Partitioning with pg_partman to expire old data 7. Automate the data pipeline with pg_cron 8. Use HLL, TopN to incrementally approximate complex queries
  28. Session resources ž Azure Postgres service page: aka.ms/azure-postgres ž Azure

    Postgres / TechCommunity blog: aka.ms/azure-postgres-blog ž Citus technical newsletter: aka.ms/citus-newsletter ž Citus open source repo: github.com/citusdata/citus ž Follow us @AzureDBPostgres: twitter.com/AzureDBPostgres ž Follow @citusdata: twitter.com/citusdata
  29. CITUSDATA.COM/NEWSLETTER Do you get the Citus Newsletter?

  30. A good newsletter is like a good GIN index. Sign

    up for the Citus Newsletter
  31. @talktosavjani • @citusdata • @AzureDBPostgres AskAzureDBforPostgreSQL@service.microsoft.com ध"यवाद | Thank U