Slide 1

Slide 1 text

Eventing with Apache Kafka Having data is better than needing data Ansgar Brauner / @a_brauner Sebastian Gauder / @rattakresch

Slide 2

Slide 2 text

Our history

Slide 3

Slide 3 text

3 Details REWE GROUP Turnover >57 bn History > 90 years Employees >330.000 Inudstries Food Retail, Tourism, DIY Shops >15.000

Slide 4

Slide 4 text

Our history

Slide 5

Slide 5 text

Our history 2014 40 15 100 28 200 46 2015 2016 2017 # Services # Dev Teams 1 2 2018 270 48

Slide 6

Slide 6 text

Scale at Servicelevel Our 48 teams are developing and running more than 200 services Imagine if all of them talk to each other:

Slide 7

Slide 7 text

Scale at Servicelevel Our 48 teams are developing and running more than 200 services Imagine if all of them talk to each other:

Slide 8

Slide 8 text

Scale at Servicelevel Our 48 teams are developing and running more than 200 services Imagine if all of them talk to each other:

Slide 9

Slide 9 text

Problems in HTTP/REST-only architectures Gateway µService 1 µService 2 µService 5 µService 4 Things that help: ● Timeouts ● Fallbacks ● Circuit Breakers ● Eventing µService 3

Slide 10

Slide 10 text

How Eventing helps us to reduce synchronous dependencies in distributed systems

Slide 11

Slide 11 text

What is Eventing?

Slide 12

Slide 12 text

What is the goal of Eventing? ● Enable services to provide themselves with data asynchronously before it is needed in a request ● Kind of database replication ● More performance & stability Service A a Service B b Service C c a c a’ c’

Slide 13

Slide 13 text

Representation of something that happened in the domain (Eric Evans) ● An event concerns: ○ one domain entity (e.g. “customer”, “shopping cart”, “order”, “delivery area”) ○ and one state change that happened to that entity (e.g. “customer registered”, “item added to shopping cart”, “order fulfilled”, “delivery area expanded”) ● Event = functional object to describe domain changes ● Event = vehicle for database replication What is a (domain-) event?

Slide 14

Slide 14 text

Technical event ● ID: Unique identifier ● Key: Which entity is affected? ● Version: Which version of this entity is this? ● Time: When did the event occur? ● Type: What kind of action happened? ● Payload: What are the details? ○ Entire entity - not deltas! { “id” : “4ea55fbb7c887”, “key” : “7ebc8eeb1f2f45”, “version” : 1, “time” : "2018-02-22T17:05:55Z", “type” : “customer-registered”, “payload” : { “id” : “7ebc8eeb1f2f45”, “first_name” : “Sebastian”, “last_name” : “Gauder”, “e-mail” : “gaudi(at)rewe-digital.com” } }

Slide 15

Slide 15 text

Sample: Customer data customer data customer topic Customer Data Service <> Invoice Service customer data’ <> Loyalty Service customer data’’ <> . . . “payload”: { “customer_uuid” : ”876ef6e5”, “version” : 3, “name” : “Peter Smith”, “loyalty_id” : “477183877”, “invoice_address” : “752 High Street”, “delivery_address” : “67 Liverpool Street” } “customer_uuid” : ”876ef6e5”, “version” : 3, “name” : “Peter Smith”, “invoice_address” : “752 High Street” “customer_uuid” : ”876ef6e5”, “version” : 3, “name” : “Peter Smith”, “loyalty_id” : “477183877”

Slide 16

Slide 16 text

Where are the Pitfalls?

Slide 17

Slide 17 text

Events must be self contained Requirement ● The event must contain all data about the state change. ○ No further synchronous call must be necessary to receive additional data ○ No further event must be processed to reconstruct entity state ○ Data must be in a consistent state after every event consumption -> transactional completeness ... “type” : “entity-updated”, “payload” : { “version” : 3, “entity-id” : “ab56ea712” “entity-details” : “https://entity-service/entities/ab56ea712” } But as small and focused as possible

Slide 18

Slide 18 text

Sample: Transactional completeness Store 1 Zip code 50676 Zip code 51063 “key”: “store_1”, “payload”: { “zips”: [”50676”,”51063”] } Store 1 Zip code 50676 Zip code 51063 Store 2 “key”: “store_1” “payload”: { “zips”: [”50676”] } “key”: “store_2” “payload”: { “zips”: [“51063”] }

Slide 19

Slide 19 text

Sample: Transactional completeness Zip code 50676 Store 1 Zip code 51063 Zip code 50676 Store 1 Zip code 51063 Store 2 “key”: “51063” “payload”: { “store”: “store_2” } “key”: “50676”, “payload”: { “store”: “store_1” } “key”: “51063”, “payload”: { “store”: “store_1” }

Slide 20

Slide 20 text

Only true facts must be published/committed Requirement Publishing Service DB topic [2] store [3] publish [1] receive Subscribing Service [4] consume DB [5] store [6] commit

Slide 21

Slide 21 text

Events are associated with a root entity Requirement ● An event must contain the whole aggregate ○ e.g. a shopping cart event contains all line items and their amounts ○ if the amount of a line item changes a new shopping cart event must be published ○ Necessary for Log Compaction! Shopping cart Line item * “type” : “shopping_cart-updated” … “payload” : { “shopping_cart_id” : “1749040”, “customer_uuid” : “6fe700ab8”, “version” : 5, “line_items” : [ {“id” : “193688”, “name” : “milk”, “amount” : 3}, {“id” : “982367”, “name” : “banana”, “amount”: 5}, {“id” : “729993”, “name” : “pizza spinaci”, “amount”: 1}, ... ] }

Slide 22

Slide 22 text

Reasons to not use eventing ● Write operations - Eventing only substitutes GET operations ● Communication with clients ● Time critical data flow -> Eventual consistency! Service A a Service B b Service C c a c a’ c’

Slide 23

Slide 23 text

Sample: Eventual consistency

Slide 24

Slide 24 text

How to implement

Slide 25

Slide 25 text

We chose Apache Kafka

Slide 26

Slide 26 text

Apache Kafka ● Open-source stream processing platform written in Scala and Java ● High-throughput, low-latency platform for real-time data streams ● Originally developed at Linkedin, open sourced in 2011 ● Offers 4 APIs: Producer, Consumer, Stream, Connect ● We use Apache Kafka in a pub-sub manner. This means most of our services use the Producer and Consumer APIs “Kafka is used for building real-time data pipelines and streaming apps. It is horizontally scalable, fault-tolerant, wicked fast, and runs in production in thousands of companies.” (https://kafka.apache.org/)

Slide 27

Slide 27 text

What to send as Messages ● In general every resource a service owns should be published ● Every state change in the domain is published as an event ● Those events are sent as messages to topics ● Topics can be created when the first event arrives { “id” : “4ea55fbb7c887”, “key” : “7ebc8eeb1f2f45”, “version” : 7, “time” : "2018-02-22T17:05:55Z", “type” : “customer-registered”, “payload” : { “id” : “7ebc8eeb1f2f45”, “version” : 7, “first_name” : “Sebastian”, “last_name” : “Gauder”, “e-mail” : “gaudi(at)rewe-digital.com” } }

Slide 28

Slide 28 text

Topics and their organization ● Domain events are sent to topics ● Topics can have any number of subscribers ● Topics are split in partitions, the order is only ensured inside a partition ● Each record has a sequential ID assigned ● Partitions are distributed over the in the Cluster ● Partitions can have configurable number of replicas http://kafka.apache.org/documentation.html

Slide 29

Slide 29 text

That data could grow, log-compaction ● An endless stream of events is stored in the cluster ● Only the most recent version of an entity is kept ● No point-in-time access to data ● Choose a wise key for your entity and update a single entity always to this key http://kafka.apache.org/documentation.html#compaction

Slide 30

Slide 30 text

Producers ● Every service which owns a resource should publish those resource-entities to a topic ● Use only one producer or make sure there are no issues about the order of events ● To enable log-compaction use a partitioner that ensures an event with the same key is always sent to the same partition ● All producers should be able to republish all entities on request A B C D <> <> <> <> topic

Slide 31

Slide 31 text

Producers - Best Practice Entity Repo Event Repo Published Version Repo Producer <> <> TX_1 <> Topic <> TX_2 <> ● The producer has to make sure that the message is delivered and the write is committed ● Therefor we store the raw event in a database to enable retries until it’s committed to the cluster ● Scheduled jobs can take care of retries and cleanup

Slide 32

Slide 32 text

Consumers ● Every service can consume every available data and should consume all data it needs to fulfill a request - having data at request time is better than trying to get it from another service ● The consumer has to process events idempotently. An event could be consumed more than once. The infrastructure ensures at-least-once delivery ● Consumers have to take care of deployment specialties like blue/green ● Consumers should be able to re-consume from the beginning. For instance more data is needed

Slide 33

Slide 33 text

Consumers - Best Practice ● The consumer is responsible for a manual commit only after a successful processing of the event. Successful can mean: ● Needed data from an event is saved in the services data-store ● The event can’t be processed and is stored in a private error queue / table Entity Repo Error Repo Processed Version Repo Consumer <> Topic <> <> <>

Slide 34

Slide 34 text

Source code

Slide 35

Slide 35 text

@MappedSuperclass public abstract class DomainEvent

{ @Id private String id; private String key; @Convert(converter = ZonedDateTimeConverter.class) private ZonedDateTime time; private String type; @Embedded private P payload; @Entity public class ProductEvent extends DomainEvent { } @Embeddable public class EventPayload { @Version private Long version; } public class ProductPayload extends EventPayload { @NotNull private String productId; @NotNull private String name; private String vendor; @NotNull private String price; @Column(length = 2000) private String description; @NotNull private String productNumber; private String image; }

Slide 36

Slide 36 text

@Component public class DomainEventPublisher

> { private final LastPublishedVersionRepository lastPublishedVersionRepository; private final DomainEventRepository

eventRepository; private final KafkaPublisher

eventPublisher; @Inject public DomainEventPublisher(...) } @Transactional public void process(final String eventId) { eventRepository.findById(eventId).ifPresent(e -> sendEvent(e)); } @Transactional public void processNext() { sendEvent(eventRepository.findFirstByTimeInSmallestVersion()); } private void sendEvent(final E event) { if (event == null) { return; } final String lastPublishedVersionId = buildLastPublishedVersionId(event); obtainLastPublishedVersion(lastPublishedVersionId).ifPresent(v -> { try { if (v.getVersion() < event.getVersion()) { // need to block here so that following statements are executed inside transaction SendResult sendResult = eventPublisher.publish(event).get(1, TimeUnit.SECONDS); LOG.info("published event to {}:{} at {}",...); v.setVersion(event.getVersion()); lastPublishedVersionRepository.save(v); } eventRepository.delete(event); } catch (final Exception ex) { LOG.error("error publishing event with id [{}] due to {}", event.getId(), ex.getMessage(), ex); } }); } Publish - already sent? - pass event to publisher - update version repo - delete event from repo

Slide 37

Slide 37 text

@Component public class KafkaPublisher

> { private static final Logger LOGGER = LoggerFactory.getLogger(KafkaPublisher.class); private final KafkaTemplate kafkaTemplate; private final ObjectMapper objectMapper; private final String topic; @Inject public KafkaPublisher(final KafkaTemplate kafkaTemplate, final ObjectMapper objectMapper, @Value("${eventing.topic.product}") final String topic) { this.kafkaTemplate = kafkaTemplate; this.objectMapper = objectMapper; this.topic = topic; } public ListenableFuture> publish(final E event) { LOGGER.info("publishing event {} to topic {}", event.getId(), topic); return kafkaTemplate.send(topic, event.getKey(), toEventMessage(event)); } private String toEventMessage(final E event) { try { return objectMapper.writeValueAsString(event); } catch (final JsonProcessingException e) { LOGGER.error("Could not serialize event with id {}", event.getId(), e); return ""; } } } Publish - publish event to Kafka

Slide 38

Slide 38 text

@Component public class ProductEventConsumer extends AbstractKafkaConsumer { @Inject protected ProductEventConsumer(ProductEventProcessor messageProcessor, UnprocessableEventService unprocessableEventService) { super(messageProcessor, unprocessableEventService, ImmutableSet.of(UncategorizedDataAccessException.class, TransientDataAccessException.class, CannotCreateTransactionException.class)); } @Retryable( maxAttempts = Integer.MAX_VALUE, backoff = @Backoff(delay = 60000, multiplier = 2), value = {Exception.class}) @KafkaListener(topics = "${productqueue.topic_name}") public void listen(final ConsumerRecord consumerRecord, final Acknowledgment ack) { super.handleConsumerRecord(consumerRecord, ack); } } Consume - retrieve event from topic - pass to specific handler

Slide 39

Slide 39 text

public abstract class AbstractKafkaConsumer { private static final Logger LOG = LoggerFactory.getLogger(AbstractKafkaConsumer.class); private final boolean payloadSensitive; private final DomainEventProcessor domainEventProcessor; private final UnprocessableEventService unprocessableEventService; private final Set> temporaryExceptions; protected AbstractKafkaConsumer( … ) { … } protected void handleConsumerRecord(final ConsumerRecord consumerRecord, final Acknowledgment ack) { LOG.info("Received {}", … ); final EventProcessingState state = processAndMapExceptionsToState(consumerRecord); if (EventProcessingState.UNEXPECTED_ERROR == state) { unprocessableEventService.save(new UnprocessedEventEntity(consumerRecord)); } else if (EventProcessingState.TEMPORARY_ERROR == state) { throw new TemporaryKafkaProcessingError("Message processing failed temporarily"); } ack.acknowledge(); } private EventProcessingState processAndMapExceptionsToState(final ConsumerRecord consumerRecord) { try { return domainEventProcessor.processConsumerRecord(consumerRecord); } catch (final RuntimeException e) { if (temporaryExceptions.stream().anyMatch(temporaryException -> temporaryException.isInstance(e))) { LOG.error("Message processing failed temporarily for {}", … , e); return EventProcessingState.TEMPORARY_ERROR; } LOG.error("Message processing failed unexpectedly for {}", … , e); return EventProcessingState.UNEXPECTED_ERROR; } } } Consume - pass event to processor - handle errors - acknowledge (if appropriate)

Slide 40

Slide 40 text

public abstract class AbstractDomainEventProcessor

> implements DomainEventProcessor { protected final EventParser eventParser; private final Class eventType; private final ConsumerTopicConfig topicConfig; private final ProcessedEventService processedEventService; public AbstractDomainEventProcessor( … ) { … } @Transactional @Override public EventProcessingState processConsumerRecord(final ConsumerRecord consumerRecord) { try { final E eventMessage = eventParser.parseMessage(consumerRecord.value(), eventType); final long version = eventMessage.getVersion(); final String key = eventMessage.getKey(); final String topic = consumerRecord.topic(); if (isSkippable(topic, key, version)) { LOG.info("Skipping old {} message with key {} and version {}", topic, key, version); return EventProcessingState.SUCCESS; } final EventProcessingState state = processEvent(eventMessage); if (state.isFinalState()) { processedEventService.updateLastProcessedVersion(topic, key, version); } return state; } catch (final MessageProcessingException e) { LOG.warn("Failed to create valid {} object from {}", … , e); return e.getState(); } } protected abstract EventProcessingState processEvent(E domainEvent); private boolean isSkippable(final String topic, final String key, final long version) { return processedEventService.getLastProcessedVersion(topic, key) > version; } } Consume - parse event message - check if version is already known - pass event object to product processor - update version repo

Slide 41

Slide 41 text

@Component public class ProductEventProcessor extends AbstractDomainEventProcessor { private static final Logger LOG = LoggerFactory.getLogger(AbstractDomainEventProcessor.class); private final JpaProductRepository repository; @Inject public ProductEventProcessor( … ) { super(ProductEvent.class, productTopicConfig, eventParser, processedEventService); this.repository = repository; } @Override protected EventProcessingState processEvent(final ProductEvent productEvent) { switch (productEvent.getType()) { case "product-created": case "product-updated": repository.save(toProduct(productEvent)); break; default: LOG.warn("Unexpected type: '{}' of message with key '{}'", productEvent.getType(), productEvent.getKey()); return EventProcessingState.UNEXPECTED_ERROR; } return EventProcessingState.SUCCESS; } } Consume - extract entity from event - store entity

Slide 42

Slide 42 text

Git Project: rewe-digital/integration-patterns https://git.io/vA2MY

Slide 43

Slide 43 text

Thank you Questions ?

Slide 44

Slide 44 text

Eventing with Apache Kafka Having data is better than needing data Ansgar Brauner / @a_brauner Sebastian Gauder / @rattakresch