Having data is better than needing data - Eventing with Apache Kafka

Df3558cf2c1b027678a0d15d71b48a2e?s=47 Sebastian
September 07, 2018

Having data is better than needing data - Eventing with Apache Kafka

When building our microservice platform we wondered how services should consume data without opening the gates of hell by doing endless chains of API calls. We realized one fact: Having data is better than needing data. Therefore we decided to allow services to keep data redundant. Letting 40 teams work as autonomous as possible we introduced “eventing” and Apache Kafka to reduce dependencies when answering requests. This talk shows the principles and basics about asynchronous communication with Apache Kafka along with code samples. We’ll show you how simple producers and consumers can be written, how domain events are passed around between services and the most common pitfalls to be avoided for developers.

Df3558cf2c1b027678a0d15d71b48a2e?s=128

Sebastian

September 07, 2018
Tweet

Transcript

  1. Eventing with Apache Kafka Having data is better than needing

    data Ansgar Brauner / @a_brauner Sebastian Gauder / @rattakresch
  2. Our history

  3. 3 Details REWE GROUP Turnover >57 bn History > 90

    years Employees >330.000 Inudstries Food Retail, Tourism, DIY Shops >15.000
  4. Our history

  5. Our history 2014 40 15 100 28 200 46 2015

    2016 2017 # Services # Dev Teams 1 2 2018 270 48
  6. Scale at Servicelevel Our 48 teams are developing and running

    more than 200 services Imagine if all of them talk to each other:
  7. Scale at Servicelevel Our 48 teams are developing and running

    more than 200 services Imagine if all of them talk to each other:
  8. Scale at Servicelevel Our 48 teams are developing and running

    more than 200 services Imagine if all of them talk to each other:
  9. Problems in HTTP/REST-only architectures Gateway µService 1 µService 2 µService

    5 µService 4 Things that help: • Timeouts • Fallbacks • Circuit Breakers • Eventing µService 3
  10. How Eventing helps us to reduce synchronous dependencies in distributed

    systems
  11. What is Eventing?

  12. What is the goal of Eventing? • Enable services to

    provide themselves with data asynchronously before it is needed in a request • Kind of database replication • More performance & stability Service A a Service B b Service C c a c a’ c’
  13. Representation of something that happened in the domain (Eric Evans)

    • An event concerns: ◦ one domain entity (e.g. “customer”, “shopping cart”, “order”, “delivery area”) ◦ and one state change that happened to that entity (e.g. “customer registered”, “item added to shopping cart”, “order fulfilled”, “delivery area expanded”) • Event = functional object to describe domain changes • Event = vehicle for database replication What is a (domain-) event?
  14. Technical event • ID: Unique identifier • Key: Which entity

    is affected? • Version: Which version of this entity is this? • Time: When did the event occur? • Type: What kind of action happened? • Payload: What are the details? ◦ Entire entity - not deltas! { “id” : “4ea55fbb7c887”, “key” : “7ebc8eeb1f2f45”, “version” : 1, “time” : "2018-02-22T17:05:55Z", “type” : “customer-registered”, “payload” : { “id” : “7ebc8eeb1f2f45”, “first_name” : “Sebastian”, “last_name” : “Gauder”, “e-mail” : “gaudi(at)rewe-digital.com” } }
  15. Sample: Customer data customer data customer topic Customer Data Service

    <<publish Event>> Invoice Service customer data’ <<subscribe Event>> Loyalty Service customer data’’ <<subscribe Event>> . . . “payload”: { “customer_uuid” : ”876ef6e5”, “version” : 3, “name” : “Peter Smith”, “loyalty_id” : “477183877”, “invoice_address” : “752 High Street”, “delivery_address” : “67 Liverpool Street” } “customer_uuid” : ”876ef6e5”, “version” : 3, “name” : “Peter Smith”, “invoice_address” : “752 High Street” “customer_uuid” : ”876ef6e5”, “version” : 3, “name” : “Peter Smith”, “loyalty_id” : “477183877”
  16. Where are the Pitfalls?

  17. Events must be self contained Requirement • The event must

    contain all data about the state change. ◦ No further synchronous call must be necessary to receive additional data ◦ No further event must be processed to reconstruct entity state ◦ Data must be in a consistent state after every event consumption -> transactional completeness ... “type” : “entity-updated”, “payload” : { “version” : 3, “entity-id” : “ab56ea712” “entity-details” : “https://entity-service/entities/ab56ea712” } But as small and focused as possible
  18. Sample: Transactional completeness Store 1 Zip code 50676 Zip code

    51063 “key”: “store_1”, “payload”: { “zips”: [”50676”,”51063”] } Store 1 Zip code 50676 Zip code 51063 Store 2 “key”: “store_1” “payload”: { “zips”: [”50676”] } “key”: “store_2” “payload”: { “zips”: [“51063”] }
  19. Sample: Transactional completeness Zip code 50676 Store 1 Zip code

    51063 Zip code 50676 Store 1 Zip code 51063 Store 2 “key”: “51063” “payload”: { “store”: “store_2” } “key”: “50676”, “payload”: { “store”: “store_1” } “key”: “51063”, “payload”: { “store”: “store_1” }
  20. Only true facts must be published/committed Requirement Publishing Service DB

    topic [2] store [3] publish [1] receive Subscribing Service [4] consume DB [5] store [6] commit
  21. Events are associated with a root entity Requirement • An

    event must contain the whole aggregate ◦ e.g. a shopping cart event contains all line items and their amounts ◦ if the amount of a line item changes a new shopping cart event must be published ◦ Necessary for Log Compaction! Shopping cart Line item * “type” : “shopping_cart-updated” … “payload” : { “shopping_cart_id” : “1749040”, “customer_uuid” : “6fe700ab8”, “version” : 5, “line_items” : [ {“id” : “193688”, “name” : “milk”, “amount” : 3}, {“id” : “982367”, “name” : “banana”, “amount”: 5}, {“id” : “729993”, “name” : “pizza spinaci”, “amount”: 1}, ... ] }
  22. Reasons to not use eventing • Write operations - Eventing

    only substitutes GET operations • Communication with clients • Time critical data flow -> Eventual consistency! Service A a Service B b Service C c a c a’ c’
  23. Sample: Eventual consistency

  24. How to implement

  25. We chose Apache Kafka

  26. Apache Kafka • Open-source stream processing platform written in Scala

    and Java • High-throughput, low-latency platform for real-time data streams • Originally developed at Linkedin, open sourced in 2011 • Offers 4 APIs: Producer, Consumer, Stream, Connect • We use Apache Kafka in a pub-sub manner. This means most of our services use the Producer and Consumer APIs “Kafka is used for building real-time data pipelines and streaming apps. It is horizontally scalable, fault-tolerant, wicked fast, and runs in production in thousands of companies.” (https://kafka.apache.org/)
  27. What to send as Messages • In general every resource

    a service owns should be published • Every state change in the domain is published as an event • Those events are sent as messages to topics • Topics can be created when the first event arrives { “id” : “4ea55fbb7c887”, “key” : “7ebc8eeb1f2f45”, “version” : 7, “time” : "2018-02-22T17:05:55Z", “type” : “customer-registered”, “payload” : { “id” : “7ebc8eeb1f2f45”, “version” : 7, “first_name” : “Sebastian”, “last_name” : “Gauder”, “e-mail” : “gaudi(at)rewe-digital.com” } }
  28. Topics and their organization • Domain events are sent to

    topics • Topics can have any number of subscribers • Topics are split in partitions, the order is only ensured inside a partition • Each record has a sequential ID assigned • Partitions are distributed over the in the Cluster • Partitions can have configurable number of replicas http://kafka.apache.org/documentation.html
  29. That data could grow, log-compaction • An endless stream of

    events is stored in the cluster • Only the most recent version of an entity is kept • No point-in-time access to data • Choose a wise key for your entity and update a single entity always to this key http://kafka.apache.org/documentation.html#compaction
  30. Producers • Every service which owns a resource should publish

    those resource-entities to a topic • Use only one producer or make sure there are no issues about the order of events • To enable log-compaction use a partitioner that ensures an event with the same key is always sent to the same partition • All producers should be able to republish all entities on request A B C D <<publish>> <<subscribe>> <<subscribe>> <<subscribe>> topic
  31. Producers - Best Practice Entity Repo Event Repo Published Version

    Repo Producer <<upsert>> <<upsert>> TX_1 <<delete>> Topic <<upsert>> TX_2 <<publish>> • The producer has to make sure that the message is delivered and the write is committed • Therefor we store the raw event in a database to enable retries until it’s committed to the cluster • Scheduled jobs can take care of retries and cleanup
  32. Consumers • Every service can consume every available data and

    should consume all data it needs to fulfill a request - having data at request time is better than trying to get it from another service • The consumer has to process events idempotently. An event could be consumed more than once. The infrastructure ensures at-least-once delivery • Consumers have to take care of deployment specialties like blue/green • Consumers should be able to re-consume from the beginning. For instance more data is needed
  33. Consumers - Best Practice • The consumer is responsible for

    a manual commit only after a successful processing of the event. Successful can mean: • Needed data from an event is saved in the services data-store • The event can’t be processed and is stored in a private error queue / table Entity Repo Error Repo Processed Version Repo Consumer <<upsert>> Topic <<subscribe>> <<upsert>> <<insert>>
  34. Source code

  35. @MappedSuperclass public abstract class DomainEvent<P extends EventPayload> { @Id private

    String id; private String key; @Convert(converter = ZonedDateTimeConverter.class) private ZonedDateTime time; private String type; @Embedded private P payload; @Entity public class ProductEvent extends DomainEvent<ProductPayload> { } @Embeddable public class EventPayload { @Version private Long version; } public class ProductPayload extends EventPayload { @NotNull private String productId; @NotNull private String name; private String vendor; @NotNull private String price; @Column(length = 2000) private String description; @NotNull private String productNumber; private String image; }
  36. @Component public class DomainEventPublisher<P extends EventPayload, E extends DomainEvent<P>> {

    private final LastPublishedVersionRepository lastPublishedVersionRepository; private final DomainEventRepository<P, E> eventRepository; private final KafkaPublisher<P, E> eventPublisher; @Inject public DomainEventPublisher(...) } @Transactional public void process(final String eventId) { eventRepository.findById(eventId).ifPresent(e -> sendEvent(e)); } @Transactional public void processNext() { sendEvent(eventRepository.findFirstByTimeInSmallestVersion()); } private void sendEvent(final E event) { if (event == null) { return; } final String lastPublishedVersionId = buildLastPublishedVersionId(event); obtainLastPublishedVersion(lastPublishedVersionId).ifPresent(v -> { try { if (v.getVersion() < event.getVersion()) { // need to block here so that following statements are executed inside transaction SendResult<String, String> sendResult = eventPublisher.publish(event).get(1, TimeUnit.SECONDS); LOG.info("published event to {}:{} at {}",...); v.setVersion(event.getVersion()); lastPublishedVersionRepository.save(v); } eventRepository.delete(event); } catch (final Exception ex) { LOG.error("error publishing event with id [{}] due to {}", event.getId(), ex.getMessage(), ex); } }); } Publish - already sent? - pass event to publisher - update version repo - delete event from repo
  37. @Component public class KafkaPublisher<P extends EventPayload, E extends DomainEvent<P>> {

    private static final Logger LOGGER = LoggerFactory.getLogger(KafkaPublisher.class); private final KafkaTemplate<String, String> kafkaTemplate; private final ObjectMapper objectMapper; private final String topic; @Inject public KafkaPublisher(final KafkaTemplate<String, String> kafkaTemplate, final ObjectMapper objectMapper, @Value("${eventing.topic.product}") final String topic) { this.kafkaTemplate = kafkaTemplate; this.objectMapper = objectMapper; this.topic = topic; } public ListenableFuture<SendResult<String, String>> publish(final E event) { LOGGER.info("publishing event {} to topic {}", event.getId(), topic); return kafkaTemplate.send(topic, event.getKey(), toEventMessage(event)); } private String toEventMessage(final E event) { try { return objectMapper.writeValueAsString(event); } catch (final JsonProcessingException e) { LOGGER.error("Could not serialize event with id {}", event.getId(), e); return ""; } } } Publish - publish event to Kafka
  38. @Component public class ProductEventConsumer extends AbstractKafkaConsumer { @Inject protected ProductEventConsumer(ProductEventProcessor

    messageProcessor, UnprocessableEventService unprocessableEventService) { super(messageProcessor, unprocessableEventService, ImmutableSet.of(UncategorizedDataAccessException.class, TransientDataAccessException.class, CannotCreateTransactionException.class)); } @Retryable( maxAttempts = Integer.MAX_VALUE, backoff = @Backoff(delay = 60000, multiplier = 2), value = {Exception.class}) @KafkaListener(topics = "${productqueue.topic_name}") public void listen(final ConsumerRecord<String, String> consumerRecord, final Acknowledgment ack) { super.handleConsumerRecord(consumerRecord, ack); } } Consume - retrieve event from topic - pass to specific handler
  39. public abstract class AbstractKafkaConsumer { private static final Logger LOG

    = LoggerFactory.getLogger(AbstractKafkaConsumer.class); private final boolean payloadSensitive; private final DomainEventProcessor domainEventProcessor; private final UnprocessableEventService unprocessableEventService; private final Set<Class<? extends RuntimeException>> temporaryExceptions; protected AbstractKafkaConsumer( … ) { … } protected void handleConsumerRecord(final ConsumerRecord<String, String> consumerRecord, final Acknowledgment ack) { LOG.info("Received {}", … ); final EventProcessingState state = processAndMapExceptionsToState(consumerRecord); if (EventProcessingState.UNEXPECTED_ERROR == state) { unprocessableEventService.save(new UnprocessedEventEntity(consumerRecord)); } else if (EventProcessingState.TEMPORARY_ERROR == state) { throw new TemporaryKafkaProcessingError("Message processing failed temporarily"); } ack.acknowledge(); } private EventProcessingState processAndMapExceptionsToState(final ConsumerRecord<String, String> consumerRecord) { try { return domainEventProcessor.processConsumerRecord(consumerRecord); } catch (final RuntimeException e) { if (temporaryExceptions.stream().anyMatch(temporaryException -> temporaryException.isInstance(e))) { LOG.error("Message processing failed temporarily for {}", … , e); return EventProcessingState.TEMPORARY_ERROR; } LOG.error("Message processing failed unexpectedly for {}", … , e); return EventProcessingState.UNEXPECTED_ERROR; } } } Consume - pass event to processor - handle errors - acknowledge (if appropriate)
  40. public abstract class AbstractDomainEventProcessor<P extends EventPayload, E extends DomainEvent<P>> implements

    DomainEventProcessor { protected final EventParser eventParser; private final Class<E> eventType; private final ConsumerTopicConfig topicConfig; private final ProcessedEventService processedEventService; public AbstractDomainEventProcessor( … ) { … } @Transactional @Override public EventProcessingState processConsumerRecord(final ConsumerRecord<String, String> consumerRecord) { try { final E eventMessage = eventParser.parseMessage(consumerRecord.value(), eventType); final long version = eventMessage.getVersion(); final String key = eventMessage.getKey(); final String topic = consumerRecord.topic(); if (isSkippable(topic, key, version)) { LOG.info("Skipping old {} message with key {} and version {}", topic, key, version); return EventProcessingState.SUCCESS; } final EventProcessingState state = processEvent(eventMessage); if (state.isFinalState()) { processedEventService.updateLastProcessedVersion(topic, key, version); } return state; } catch (final MessageProcessingException e) { LOG.warn("Failed to create valid {} object from {}", … , e); return e.getState(); } } protected abstract EventProcessingState processEvent(E domainEvent); private boolean isSkippable(final String topic, final String key, final long version) { return processedEventService.getLastProcessedVersion(topic, key) > version; } } Consume - parse event message - check if version is already known - pass event object to product processor - update version repo
  41. @Component public class ProductEventProcessor extends AbstractDomainEventProcessor<ProductPayload, ProductEvent> { private static

    final Logger LOG = LoggerFactory.getLogger(AbstractDomainEventProcessor.class); private final JpaProductRepository repository; @Inject public ProductEventProcessor( … ) { super(ProductEvent.class, productTopicConfig, eventParser, processedEventService); this.repository = repository; } @Override protected EventProcessingState processEvent(final ProductEvent productEvent) { switch (productEvent.getType()) { case "product-created": case "product-updated": repository.save(toProduct(productEvent)); break; default: LOG.warn("Unexpected type: '{}' of message with key '{}'", productEvent.getType(), productEvent.getKey()); return EventProcessingState.UNEXPECTED_ERROR; } return EventProcessingState.SUCCESS; } } Consume - extract entity from event - store entity
  42. Git Project: rewe-digital/integration-patterns https://git.io/vA2MY

  43. Thank you Questions ?

  44. Eventing with Apache Kafka Having data is better than needing

    data Ansgar Brauner / @a_brauner Sebastian Gauder / @rattakresch