Upgrade to Pro — share decks privately, control downloads, hide ads and more …

When NOT to use Apache Kafka?

Kai Waehner
October 05, 2022

When NOT to use Apache Kafka?

Apache Kafka is the de facto standard for data streaming to process data in motion. With its significant adoption growth across all industries, I get a very valid question every week: When NOT to use Apache Kafka? What limitations does the event streaming platform have? When does Kafka simply not provide the needed capabilities? How to qualify Kafka out as it is not the right tool for the job?

This session explores the DOs and DONTs. Separate sections explain when to use Kafka, when NOT to use Kafka, and when to MAYBE use Kafka.

No matter if you think about open source Apache Kafka, a cloud service like Confluent Cloud, or another technology using the Kafka protocol like Redpanda or Pulsar, check out this slide deck.

A detailed article about this topic:
https://www.kai-waehner.de/blog/2022/01/04/when-not-to-use-apache-kafka/

Kai Waehner

October 05, 2022
Tweet

More Decks by Kai Waehner

Other Decks in Technology

Transcript

  1. Data Streaming with Apache Kafka DWH APP STREAM PROCESSING CONNECTORS

    ksqlDB KStreams APP Streaming ETL Data Processing Real-time Analytics Stateless and Stateful Business Applications Fully-managed Pipelines Connectivity to Data Infrastructure, SaaS, AI/ML Data Governance Connectivity Filtering and Routing Change Data Capture Built-in Scale and Fault Tolerance Oracle DB ORACLE CDC SOURCE PREMIUM CONNECTOR Real-time Data Sharing across Hybrid and Multi-Cloud Storage Backpressure Handling Slow Consumers Replayability kai-waehner.de | @KaiWaehner | When NOT to use Apache Kafka?
  2. Markets DaaS Digital replatforming/ Legacy Modernization Customer 360 Faster transactional

    processing / analysis incl. Machine Learning / AI Microservices Architecture Online Fraud Detection Online Security (syslog, log aggregation, Splunk replacement) Middleware replacement Website / Core Operations / Payments (Central Nervous System) Real-time app updates Customer Experience Core Business Platform Operational Efficiency (Agility) Migrate to Cloud Fraud Detection Regulatory Increase Revenue (make money) Decrease Costs (save money) Mitigate Risk (protect money) Business Value 10 business use case Strategic Driver 20 business use case Data Eng. / Infrastructure use case Use Cases for Data Streaming by Business Value kai-waehner.de | @KaiWaehner | When NOT to use Apache Kafka?
  3. Kafka is a Database BUT NOT for Complex Analytics kai-waehner.de

    | @KaiWaehner | When NOT to use Apache Kafka? Durable Fault-tolerant Tiered Storage Compacted Topics Exactly-once Semantics RocksDB on Client Side ksqlDB Interactive Queries “You Name It” Connect
  4. Kafka is NOT a Proxy for Millions of Clients kai-waehner.de

    | @KaiWaehner | When NOT to use Apache Kafka? “Last Mile” Integration is usually a Proxy (like HTTP or MQTT)
  5. Kafka is NOT an API Management Platform kai-waehner.de | @KaiWaehner

    | When NOT to use Apache Kafka? Orders Customers Payments Stock API (HTTP/REST) Data Streaming Data Integration Real-Time Apps API Gateway API Lifecycle Data Sharing Monetization REST Proxy Stream Exchange
  6. Kafka is NOT the right tool for processing large messages

    * kai-waehner.de | @KaiWaehner | When NOT to use Apache Kafka? Claim Check Enterprise Integration Pattern: * BUT works well for some use cases, e.g.: - Splitting large legacy CSV files - Externalizing large payloads on-the-fly - Image processing at the edge - Uploading large files into the DWH Pre-Processing and Data Correlation e.g. enrich with other metadata (ksqlDB) Store big files in data lake (e.g. AWS S3) Consume and correlate metadata (Kafka Streams) Automated Orchestration (Kafka Clients) Real time analytics and other business applications (Kafka Clients + other tools) Send metadata including link to video in object store (Kafka Producer) Download big files from data lake
  7. Kafka is NOT an IoT Platform * kai-waehner.de | @KaiWaehner

    | When NOT to use Apache Kafka? Siemens S7 Kafka Connect Storage Kafka Streams / ksqlDB Stateless + Stateful REST Proxy HTTP(S) SCADA DCS ERP MES Cloud Factory * BUT Kafka is a fundamental part of most IoT projects, e.g.: - Scalable real-time data hub for IoT data AND IT data - Edge and hybrid cloud - Direct integration with IoT protocols - Integration via 3rd party with IoT protocols Analytics Database Data Lake CRM Kafka Connect Cluster Linking
  8. Kafka is NOT for hard real-time requirements kai-waehner.de | @KaiWaehner

    | When NOT to use Apache Kafka? OT - Connected Vehicle (Car, Train, Drone) OT - Manufacturing (Field Bus, PLC, Machine, Robot) IT – Enterprise Software (Data Center, Cloud, Car IT) Central Data Center / Public Cloud Vehicle Data Robot Data All Data C C++ Rust C C++ Rust Java Python Go [#] Hard Real Time = Deterministic network with zero spikes + zero latency [#] Soft Real Time + Near Real Time + Batch Cluster Linking Cluster Linking