Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Transforming Your APIs into Business Gold—Archi...

Transforming Your APIs into Business Gold—Architecting a Real-Time API Usage Analytics Platform

In today's hyper-connected digital landscape, real-time API usage analysis and billing have become paramount. With the proliferation of APIs at the heart of modern applications and services, gaining real-time insights into their use is often mission-critical as it depends on critical factors such as latency, throughput, freshness, and the correctness of generated insights.

This talk discusses the architecture of a real-time API usage analytics system composed of Redpanda, Apache Flink, and Apache Pinot. Redpanda, as a scalable streaming data platform, enables high-volume, low-latency API volume data ingestion from API gateways in real time. The ingested data is streamed through Flink for streaming ETL, enabling operations like joins, aggregations, and transformations, feeding the final output of the pipeline to Apache Pinot for serving analytics at scale. In addition to that, we will use Flink for rate-limiting incoming API requests.

Having such a system enables businesses to make informed, agile decisions, addressing performance bottlenecks, security threats, and resource allocation issues in the API infrastructure promptly. Also, it ensures seamless user experiences by identifying usage patterns, issues, and opportunities as they occur, proactively enhancing the quality of your product.

Dunith Dhanushka

June 08, 2024
Tweet

More Decks by Dunith Dhanushka

Other Decks in Technology

Transcript

  1. © 2024 REDPANDA DATA Transforming Your APIs into Business Gold

    Architecting a Real-Time API Usage Analytics Platform 1 RTA Summit 2024
  2. © 2023 REDPANDA DATA About the presenter 2 Dunith Dhanushka

    Senior Developer Advocate, Redpanda Data • Event streaming, real-time analytics, and stream processing enthusiast • Frequent blogger, speaker, and an educator
  3. © 2023 REDPANDA DATA This talk • Is a blueprint

    for solutions that be built by combining Kafka/Redpanda and Apache Pinot. • Not coupled to specific APIM vendor. • No demo. sorry! :( • Take the blueprint as a takeaway 4
  4. © 2024 REDPANDA DATA An API is a business capability

    delivered to consumers over a network. 6
  5. © 2024 REDPANDA DATA In a digital business, APIs allow

    for programmatic access to business operations. 7
  6. © 2024 REDPANDA DATA Potential API consumers 14 • Real

    estate companies - Provide an accurate valuation figure for home buyers. • Banks - House valuation before approving a mortgage. • Insurance providers - Provide more accurate quotes for home and content insurances. • Government - Easily calculate property taxes.
  7. © 2024 REDPANDA DATA The need? 16 Find an efficient

    and reliable way to measure API usage for each consumer
  8. © 2024 Redpanda Data What metrics to derive? 1. API

    usage - API invocations over time per consumer 2. API latency - End to end latency 3. Unique users 4. Geographical usage distribution 5. Error count 18
  9. © 2024 Redpanda Data What metrics to derive? 19 Photo

    credits https://www.atatus.com/blog/api-analytics-tools/
  10. © 2024 REDPANDA DATA Batch or real-time insights? We prefer

    real-time actionable insights. - Real-time usage alerts for consumers - Real-time API traffic metrics for the product team - Real-time API health information for operations teams Complemented with batch processing: - Monthly usage-based billing reports. - Weekly API health reports - Daily API traffic reports 21
  11. © 2024 REDPANDA DATA Write path: API Gateway to datastore

    Directly writing to the analytics datastore is not recommended: - APIM system and the analytics infrastructure is coupled - Avoid synchronous writes - Support scalable ingestion 28
  12. © 2024 REDPANDA DATA Redpanda is a Kafka API compatible

    streaming data platform 31 • Written in C++ • Thread-per-core architecture • Designed for modern hardware
  13. © 2024 REDPANDA DATA Simple to deploy, use and manage

    Single binary Kafka-compatible APIs Easy Day 2 Ops Dev-friendly interface
  14. © 2024 REDPANDA DATA Expectations 37 • Streaming data ingestion

    - Higher data freshnes • Low-latency queries - User-facing queries • High query throughput - efficiently handle query spikes
  15. © 2024 REDPANDA DATA Apache Pinot is a real-time distributed

    OLAP database, designed to serve OLAP workloads on streaming data with extreme low latency and high concurrency. 40
  16. © 2024 REDPANDA DATA Need a Stream processor? 43 •

    Not mandatory, but nice to have based on the need. • Data cleansing - Redpanda’s WASM transforms can do this • A stateful stream processor like Apache Flink will add more value to the pipeline when: ◦ Real-time joins and enrichment is needed. ◦ Trigger alerts and fire up downstream workflows.
  17. © 2024 REDPANDA DATA Visualization and BI • Integrates with

    Python clients through Pinot Python driver. - PinotDB • Integrates with popular BI products via JDBC and ODBC interfaces. 46
  18. © 2024 REDPANDA DATA Ad-hoc querying - Jupyter notebook 48

    Hubert Dulay’s meetup talk https://www.youtube.com/watch?v=viZc_1nPNnI
  19. © 2024 REDPANDA DATA Batch workloads • Pinot can work

    with query federators like Trino and Presto for running batch reports. 49
  20. © 2024 REDPANDA DATA Implementation plan 1. Provision a Redpanda

    cluster, create topics, set ACLs. 2. Configure the Kafka sink in APISIX. 3. Create Pinot schemas and tables. 4. Massage data as needed. 5. Create/plug dashboards. 50
  21. © 2024 REDPANDA DATA Redpanda University Free, self-paced online learning

    https://university.redpanda.com • Learn the fundamentals of data streaming and Redpanda • Install Redpanda and use the rpk CLI to configure it • Create producers and consumers in Java, Python and NodeJS • Sign up today for free! 53
  22. © 2023 Redpanda Data Thanks for joining! Let’s keep in

    touch 54 @redpandadata redpanda-data redpanda-data [email protected]