Eventing with Apache Kafka Having data is better than needing data Ansgar Brauner / @a_brauner Sebastian Gauder / @rattakresch

Our history

3 Details REWE GROUP Turnover >57 bn History > 90 years Employees >330.000 Inudstries Food Retail, Tourism, DIY Shops >15.000

Our history

Our history 2014 40 15 100 28 200 46 2015 2016 2017 # Services # Dev Teams 1 2 2018 270 48

Scale at Servicelevel Our 48 teams are developing and running more than 200 services Imagine if all of them talk to each other:

Problems in HTTP/REST-only architectures Gateway µService 1 µService 2 µService 5 µService 4 Things that help: ● Timeouts ● Fallbacks ● Circuit Breakers ● Eventing µService 3

How Eventing helps us to reduce synchronous dependencies in distributed systems

What is Eventing?

What is the goal of Eventing? ● Enable services to provide themselves with data asynchronously before it is needed in a request ● Kind of database replication ● More performance & stability Service A a Service B b Service C c a c a’ c’

Representation of something that happened in the domain (Eric Evans) ● An event concerns: ○ one domain entity (e.g. “customer”, “shopping cart”, “order”, “delivery area”) ○ and one state change that happened to that entity (e.g. “customer registered”, “item added to shopping cart”, “order fulfilled”, “delivery area expanded”) ● Event = functional object to describe domain changes ● Event = vehicle for database replication What is a (domain-) event?

Technical event ● ID: Unique identifier ● Key: Which entity is affected? ● Version: Which version of this entity is this? ● Time: When did the event occur? ● Type: What kind of action happened? ● Payload: What are the details? ○ Entire entity - not deltas! { “id” : “4ea55fbb7c887”, “key” : “7ebc8eeb1f2f45”, “version” : 1, “time” : "2018-02-22T17:05:55Z", “type” : “customer-registered”, “payload” : { “id” : “7ebc8eeb1f2f45”, “first_name” : “Sebastian”, “last_name” : “Gauder”, “e-mail” : “gaudi(at)” } }

Sample: Customer data customer data customer topic Customer Data Service <> Invoice Service customer data’ <> Loyalty Service customer data’’ <> . . . “payload”: { “customer_uuid” : ”876ef6e5”, “version” : 3, “name” : “Peter Smith”, “loyalty_id” : “477183877”, “invoice_address” : “752 High Street”, “delivery_address” : “67 Liverpool Street” } “customer_uuid” : ”876ef6e5”, “version” : 3, “name” : “Peter Smith”, “invoice_address” : “752 High Street” “customer_uuid” : ”876ef6e5”, “version” : 3, “name” : “Peter Smith”, “loyalty_id” : “477183877”

Where are the Pitfalls?

Events must be self contained Requirement ● The event must contain all data about the state change. ○ No further synchronous call must be necessary to receive additional data ○ No further event must be processed to reconstruct entity state ○ Data must be in a consistent state after every event consumption -> transactional completeness ... “type” : “entity-updated”, “payload” : { “version” : 3, “entity-id” : “ab56ea712” “entity-details” : “https://entity-service/entities/ab56ea712” } But as small and focused as possible

Sample: Transactional completeness Store 1 Zip code 50676 Zip code 51063 “key”: “store_1”, “payload”: { “zips”: [”50676”,”51063”] } Store 1 Zip code 50676 Zip code 51063 Store 2 “key”: “store_1” “payload”: { “zips”: [”50676”] } “key”: “store_2” “payload”: { “zips”: [“51063”] }

Sample: Transactional completeness Zip code 50676 Store 1 Zip code 51063 Zip code 50676 Store 1 Zip code 51063 Store 2 “key”: “51063” “payload”: { “store”: “store_2” } “key”: “50676”, “payload”: { “store”: “store_1” } “key”: “51063”, “payload”: { “store”: “store_1” }

Only true facts must be published/committed Requirement Publishing Service DB topic [2] store [3] publish [1] receive Subscribing Service [4] consume DB [5] store [6] commit

Events are associated with a root entity Requirement ● An event must contain the whole aggregate ○ e.g. a shopping cart event contains all line items and their amounts ○ if the amount of a line item changes a new shopping cart event must be published ○ Necessary for Log Compaction! Shopping cart Line item * “type” : “shopping_cart-updated” … “payload” : { “shopping_cart_id” : “1749040”, “customer_uuid” : “6fe700ab8”, “version” : 5, “line_items” : [ {“id” : “193688”, “name” : “milk”, “amount” : 3}, {“id” : “982367”, “name” : “banana”, “amount”: 5}, {“id” : “729993”, “name” : “pizza spinaci”, “amount”: 1}, ... ] }

Reasons to not use eventing ● Write operations - Eventing only substitutes GET operations ● Communication with clients ● Time critical data flow -> Eventual consistency! Service A a Service B b Service C c a c a’ c’

Sample: Eventual consistency

How to implement

We chose Apache Kafka

Apache Kafka ● Open-source stream processing platform written in Scala and Java ● High-throughput, low-latency platform for real-time data streams ● Originally developed at Linkedin, open sourced in 2011 ● Offers 4 APIs: Producer, Consumer, Stream, Connect ● We use Apache Kafka in a pub-sub manner. This means most of our services use the Producer and Consumer APIs “Kafka is used for building real-time data pipelines and streaming apps. It is horizontally scalable, fault-tolerant, wicked fast, and runs in production in thousands of companies.” (

What to send as Messages ● In general every resource a service owns should be published ● Every state change in the domain is published as an event ● Those events are sent as messages to topics ● Topics can be created when the first event arrives { “id” : “4ea55fbb7c887”, “key” : “7ebc8eeb1f2f45”, “version” : 7, “time” : "2018-02-22T17:05:55Z", “type” : “customer-registered”, “payload” : { “id” : “7ebc8eeb1f2f45”, “version” : 7, “first_name” : “Sebastian”, “last_name” : “Gauder”, “e-mail” : “gaudi(at)” } }

Topics and their organization ● Domain events are sent to topics ● Topics can have any number of subscribers ● Topics are split in partitions, the order is only ensured inside a partition ● Each record has a sequential ID assigned ● Partitions are distributed over the in the Cluster ● Partitions can have configurable number of replicas

That data could grow, log-compaction ● An endless stream of events is stored in the cluster ● Only the most recent version of an entity is kept ● No point-in-time access to data ● Choose a wise key for your entity and update a single entity always to this key

Producers ● Every service which owns a resource should publish those resource-entities to a topic ● Use only one producer or make sure there are no issues about the order of events ● To enable log-compaction use a partitioner that ensures an event with the same key is always sent to the same partition ● All producers should be able to republish all entities on request A B C D <> <> <> <> topic

Producers - Best Practice Entity Repo Event Repo Published Version Repo Producer <> <> TX_1 <> Topic <> TX_2 <> ● The producer has to make sure that the message is delivered and the write is committed ● Therefor we store the raw event in a database to enable retries until it’s committed to the cluster ● Scheduled jobs can take care of retries and cleanup

Consumers ● Every service can consume every available data and should consume all data it needs to fulfill a request - having data at request time is better than trying to get it from another service ● The consumer has to process events idempotently. An event could be consumed more than once. The infrastructure ensures at-least-once delivery ● Consumers have to take care of deployment specialties like blue/green ● Consumers should be able to re-consume from the beginning. For instance more data is needed

Consumers - Best Practice ● The consumer is responsible for a manual commit only after a successful processing of the event. Successful can mean: ● Needed data from an event is saved in the services data-store ● The event can’t be processed and is stored in a private error queue / table Entity Repo Error Repo Processed Version Repo Consumer <> Topic <> <> <>

Source code

@MappedSuperclass public abstract class DomainEvent

{ @Id private String id; private String key; @Convert(converter = ZonedDateTimeConverter.class) private ZonedDateTime time; private String type; @Embedded private P payload; @Entity public class ProductEvent extends DomainEvent { } @Embeddable public class EventPayload { @Version private Long version; } public class ProductPayload extends EventPayload { @NotNull private String productId; @NotNull private String name; private String vendor; @NotNull private String price; @Column(length = 2000) private String description; @NotNull private String productNumber; private String image; }

@Component public class DomainEventPublisher

> { private final LastPublishedVersionRepository lastPublishedVersionRepository; private final DomainEventRepository

eventRepository; private final KafkaPublisher

eventPublisher; @Inject public DomainEventPublisher(...) } @Transactional public void process(final String eventId) { eventRepository.findById(eventId).ifPresent(e -> sendEvent(e)); } @Transactional public void processNext() { sendEvent(eventRepository.findFirstByTimeInSmallestVersion()); } private void sendEvent(final E event) { if (event == null) { return; } final String lastPublishedVersionId = buildLastPublishedVersionId(event); obtainLastPublishedVersion(lastPublishedVersionId).ifPresent(v -> { try { if (v.getVersion() < event.getVersion()) { // need to block here so that following statements are executed inside transaction SendResult sendResult = eventPublisher.publish(event).get(1, TimeUnit.SECONDS);"published event to {}:{} at {}",...); v.setVersion(event.getVersion());; } eventRepository.delete(event); } catch (final Exception ex) { LOG.error("error publishing event with id [{}] due to {}", event.getId(), ex.getMessage(), ex); } }); } Publish - already sent? - pass event to publisher - update version repo - delete event from repo

@Component public class KafkaPublisher

> { private static final Logger LOGGER = LoggerFactory.getLogger(KafkaPublisher.class); private final KafkaTemplate kafkaTemplate; private final ObjectMapper objectMapper; private final String topic; @Inject public KafkaPublisher(final KafkaTemplate kafkaTemplate, final ObjectMapper objectMapper, @Value("${eventing.topic.product}") final String topic) { this.kafkaTemplate = kafkaTemplate; this.objectMapper = objectMapper; this.topic = topic; } public ListenableFuture> publish(final E event) {"publishing event {} to topic {}", event.getId(), topic); return kafkaTemplate.send(topic, event.getKey(), toEventMessage(event)); } private String toEventMessage(final E event) { try { return objectMapper.writeValueAsString(event); } catch (final JsonProcessingException e) { LOGGER.error("Could not serialize event with id {}", event.getId(), e); return ""; } } } Publish - publish event to Kafka

@Component public class ProductEventConsumer extends AbstractKafkaConsumer { @Inject protected ProductEventConsumer(ProductEventProcessor messageProcessor, UnprocessableEventService unprocessableEventService) { super(messageProcessor, unprocessableEventService, ImmutableSet.of(UncategorizedDataAccessException.class, TransientDataAccessException.class, CannotCreateTransactionException.class)); } @Retryable( maxAttempts = Integer.MAX_VALUE, backoff = @Backoff(delay = 60000, multiplier = 2), value = {Exception.class}) @KafkaListener(topics = "${productqueue.topic_name}") public void listen(final ConsumerRecord consumerRecord, final Acknowledgment ack) { super.handleConsumerRecord(consumerRecord, ack); } } Consume - retrieve event from topic - pass to specific handler

public abstract class AbstractKafkaConsumer { private static final Logger LOG = LoggerFactory.getLogger(AbstractKafkaConsumer.class); private final boolean payloadSensitive; private final DomainEventProcessor domainEventProcessor; private final UnprocessableEventService unprocessableEventService; private final Set> temporaryExceptions; protected AbstractKafkaConsumer( … ) { … } protected void handleConsumerRecord(final ConsumerRecord consumerRecord, final Acknowledgment ack) {"Received {}", … ); final EventProcessingState state = processAndMapExceptionsToState(consumerRecord); if (EventProcessingState.UNEXPECTED_ERROR == state) { UnprocessedEventEntity(consumerRecord)); } else if (EventProcessingState.TEMPORARY_ERROR == state) { throw new TemporaryKafkaProcessingError("Message processing failed temporarily"); } ack.acknowledge(); } private EventProcessingState processAndMapExceptionsToState(final ConsumerRecord consumerRecord) { try { return domainEventProcessor.processConsumerRecord(consumerRecord); } catch (final RuntimeException e) { if ( -> temporaryException.isInstance(e))) { LOG.error("Message processing failed temporarily for {}", … , e); return EventProcessingState.TEMPORARY_ERROR; } LOG.error("Message processing failed unexpectedly for {}", … , e); return EventProcessingState.UNEXPECTED_ERROR; } } } Consume - pass event to processor - handle errors - acknowledge (if appropriate)

public abstract class AbstractDomainEventProcessor

> implements DomainEventProcessor { protected final EventParser eventParser; private final Class eventType; private final ConsumerTopicConfig topicConfig; private final ProcessedEventService processedEventService; public AbstractDomainEventProcessor( … ) { … } @Transactional @Override public EventProcessingState processConsumerRecord(final ConsumerRecord consumerRecord) { try { final E eventMessage = eventParser.parseMessage(consumerRecord.value(), eventType); final long version = eventMessage.getVersion(); final String key = eventMessage.getKey(); final String topic = consumerRecord.topic(); if (isSkippable(topic, key, version)) {"Skipping old {} message with key {} and version {}", topic, key, version); return EventProcessingState.SUCCESS; } final EventProcessingState state = processEvent(eventMessage); if (state.isFinalState()) { processedEventService.updateLastProcessedVersion(topic, key, version); } return state; } catch (final MessageProcessingException e) { LOG.warn("Failed to create valid {} object from {}", … , e); return e.getState(); } } protected abstract EventProcessingState processEvent(E domainEvent); private boolean isSkippable(final String topic, final String key, final long version) { return processedEventService.getLastProcessedVersion(topic, key) > version; } } Consume - parse event message - check if version is already known - pass event object to product processor - update version repo

@Component public class ProductEventProcessor extends AbstractDomainEventProcessor { private static final Logger LOG = LoggerFactory.getLogger(AbstractDomainEventProcessor.class); private final JpaProductRepository repository; @Inject public ProductEventProcessor( … ) { super(ProductEvent.class, productTopicConfig, eventParser, processedEventService); this.repository = repository; } @Override protected EventProcessingState processEvent(final ProductEvent productEvent) { switch (productEvent.getType()) { case "product-created": case "product-updated":; break; default: LOG.warn("Unexpected type: '{}' of message with key '{}'", productEvent.getType(), productEvent.getKey()); return EventProcessingState.UNEXPECTED_ERROR; } return EventProcessingState.SUCCESS; } } Consume - extract entity from event - store entity

Git Project: rewe-digital/integration-patterns

Thank you Questions ?

