Upgrade to Pro — share decks privately, control downloads, hide ads and more …

How Many Topics Is Too Many? Kafka Lessons from...

Avatar for Posedio Posedio PRO
November 17, 2025

How Many Topics Is Too Many? Kafka Lessons from Order Management at Scale

When building a modern order management system in the cloud, event-driven architectures and asynchronous processing quickly become unavoidable. With microservices, the advice often sounds deceptively simple: “Create a topic for every event,” “Keep them small,” “Make it elegant.” But what happens when you follow those rules?
In this talk, we’ll share the hard-earned lessons from designing and operating an event-driven order management system at scale where we discovered that the “elegant” approach can sometimes lead to surprising complexity, hidden coupling, and operational pain. You’ll hear about the real-world consequences of topic design decisions, how message schema composition affects resilience and clarity, and what we learned from an experiment we never quite meant to run.
Whether you’re just starting your Kafka journey or are already deep in production, this session will give you practical insights into structuring topics, managing event evolution, and keeping your architecture understandable as it grows.

Avatar for Posedio

Posedio PRO

November 17, 2025
Tweet

More Decks by Posedio

Other Decks in Programming

Transcript

  1. How Many Topics Is Too Many: Hannes Bösch, CEO of

    Posedio Apache Kafka Meetup 17.11.2025 Kafka Lessons from Order Managment at Scale
  2. Do it RIGHT. The Devil‘s Cocktail Democratically Evolving Distributed Software

    Architecture Added Complexity with Cloud Native Infrastructure, Tools and Processes Agile Waterfall Large Scale Project
  3. Do it RIGHT. Conway’s Law & Brooks’s Law in Action

    12 • New product boundaries were aligned with existing organizational structures (i.e., current domains). • Approximately 100 external developers with varying skill levels were added to the teams.
  4. Do it RIGHT. Communication Complexity 𝑐ℎ𝑎𝑛𝑛𝑒𝑙𝑠 = 𝑛 (𝑛 −

    1) 2 https://www.pmi.org/learning/library/overcoming-communications-complexity-ambiguity-projects-6631
  5. Do it RIGHT. Missing Observer and Orchestrator • Difficult to

    diagnose and debug product errors. • Challenging to detect, alert, and respond to issues in a timely manner. • Fixing faulty states or recovering broken processes is cumbersome and error- prone.
  6. Do it RIGHT. Too many Communication Channels 18 • Creating

    new topics for every new use case instead of reusing existing ones. • Designing topics tailored to specific consumer needs rather than domain semantics. • Loss of event ordering guarantees. • Race conditions in downstream services due to uncoordinated event processing. • Multiple external representations of the same domain event, leading to inconsistencies.
  7. Do it RIGHT. Incomplete Data 19 • Tight coupling between

    services. • Race conditions introduced through improper coordination or concurrency handling. • Risk of internal DDoS due to excessive or uncontrolled callback calls. Producer Events do not contain payload of the object affected
  8. Do it RIGHT. Split Brain 20 • Many-to-many (M→N) relationships

    between domain objects. • Multiple subscribers consuming the same domain event. • Essential validations of domain invariants are skipped or incomplete.
  9. Do it RIGHT. The Unhappy Path 21 • Missing or

    incomplete error-handling strategy. • No mechanism in place to notify upstream components about flaky or invalid data.
  10. Do it RIGHT. Distinguish Events (or Documents) vs. Commands Event

    Messages (or Documents): • Reflects change in system • Schema owned by producer • Producer upgrades first (evolves Schema) • Forward Compatible Changes Only Command Messages: • RPC over Kafka • Schema owned by Consumer • Consumer Upgrades First • Backward Compatible Changes Only
  11. Do it RIGHT. Enforce Schema on Write 24 • A

    schema always exists. Either explicitly defined on write or implicitly interpreted on read. • A Schema Registry is typically used when data is shared beyond the team level. • The Avro format is commonly supported by Schema Registry implementations.
  12. Do it RIGHT. One Topic per Aggregate 25 • An

    aggregate defines the external representation of a domain object. • It serves as the smallest consistency boundary within the business domain. • Oversized aggregates often indicate a design issue. • Such design issues should not be addressed at the topic level.
  13. Do it RIGHT. Distributed State-Machine Theorem 26 • The state

    of a domain object is managed exclusively by a single Domain Service. • Each Domain Service is responsible for controlling state transitions in a transactional manner. • Every Domain Service must ensure the agreed-upon delivery guarantees. Manage Distributed State
  14. Do it RIGHT. Enveloping 27 • Multiple schemes for a

    topic • EventType encoded in the RecordType • Object embedded as payload in the event • Guaranteed ordering (with the same partition key) • Efficient filtering for consumers • Easy to introduce new states and customize content of state changes
  15. Do it RIGHT. Mandatory Technical Identifiers 28 • Introduce header

    fields aligned with cloud native standards • For efficient filtering, routing and observability • Make decisions before deserializing the payload • Producers MUST ensure that source + id is unique CloudEvents Standard with Kafka Protocol Bindings https://github.com/cloudevents/spec/blob/main/cloudevents/bindings/kafka-protocol-binding.md
  16. Do it RIGHT. Consistency of External Representation of Domain Objects

    29 The external representation of an object should be consistent across interfaces; a request through the HTTP/REST API should yield the same result as a request made via the message bus.
  17. Do it RIGHT. Track signals and alert on deviations. 30

    • Message Validation Errors on Clients • Consumer Lags (Threshold for Consumer Offsets)
  18. Do it RIGHT. Awareness 31 Thinnest viable communcation plan (TVCP)

    Wiki: • Owner • Topic • Schemes • Consumers • Mailing List • Release History
  19. Do it RIGHT. And more basic rules… • Explicitly indicate

    when cross-product communication is intended (e.g., through topic naming conventions). • Define a Retention Policy for GDPR-relevant data (preferably using non-compacted topics). • Specify the versioning strategy for the topic itself (and so implicitly the schema). • Configure idempotency settings on the producer. • Ensure the number of partitions does not exceed the maximum number of consumers in the largest consumer group. 32
  20. Do it RIGHT. Rome can be reached by many paths,

    but the bold and well-paved ones call to us the most…
  21. Do it RIGHT. When to create topics • One topic

    per domain aggregate • When handling confidential data (e.g., communication with external products) • When GDPR concerns apply (e.g., compacted topics) • To separate tenants - but use the same schema for all tenants • For simple, intra-team data transformations (e.g., using Kafka Streams) • For Dead Letter Queues 36