Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Streaming Apps and Poison Pills: Handle the Unexpected with Kafka Streams

Loïc DIVAD
October 01, 2019

Streaming Apps and Poison Pills: Handle the Unexpected with Kafka Streams

Loïc DIVAD

October 01, 2019
Tweet

More Decks by Loïc DIVAD

Other Decks in Technology

Transcript

  1. 3 @loicmdivad @XebiaFr Processor API: The dark side of Kafka

    Streams XKE* - March 2018 XKE = Xebia Knowledge Exchange
  2. 4 @loicmdivad @XebiaFr 4 @loicmdivad @XebiaFr > println(sommaire) Incoming records

    may be corrupted, or cannot be handled by the serializer / deserializer. These records are referred to as “poison pills” 1. Log and Crash 2. Skip the Corrupted 3. Sentinel Value Pattern 4. Dead Letter Queue Pattern
  3. 7 @loicmdivad @XebiaFr 7 @loicmdivad @XebiaFr Streaming App Poison Pills

    1. Log and Crash - Breakfast 2. Skip the Corrupted - Lunch 3. Sentinel Value Pattern - Drink 4. Dead Letter Queue Pattern - Dinner
  4. 10 @loicmdivad @XebiaFr Really old systems receive raw bytes directly

    from message queues 10100110111010101 Exercise #1 - breakfast
  5. 11 @loicmdivad @XebiaFr Really old systems receive raw bytes directly

    from message queues With Kafka (Connect and Streams) we’d like to continuously transform these messages 10100110111010101 Kafka Connect Kafka Brokers Exercise #1 - breakfast
  6. 12 @loicmdivad @XebiaFr Really old systems receive raw bytes directly

    from message queues With Kafka (Connect and Streams) we’d like to continuously transform these messages But we need a deserializer with special decoder to understand each event What happens if we get a buggy implementation of the deserializer? 10100110111010101 Kafka Connect Kafka Brokers Kafka Streams Exercise #1 - breakfast
  7. 14 @loicmdivad @XebiaFr // Exercise #1: Breakfast sealed trait FoodOrder

    case class Breakfast(lang: Lang, fruit: Fruit, liquid: Liquid, pastries: Vector[Pastry] = Vector.empty) extends FoodOrder
  8. 15 @loicmdivad @XebiaFr // Exercise #1: Breakfast sealed trait FoodOrder

    case class Breakfast(lang: Lang, fruit: Fruit, liquid: Liquid, pastries: Vector[Pastry] = Vector.empty) extends FoodOrder implicit lazy val BreakfastCodec: Codec[Breakfast] = new Codec[Breakfast] = ???
  9. 16 @loicmdivad @XebiaFr // Exercise #1: Breakfast sealed trait FoodOrder

    case class Breakfast(lang: Lang, fruit: Fruit, liquid: Liquid, pastries: Vector[Pastry] = Vector.empty) extends FoodOrder implicit lazy val BreakfastCodec: Codec[Breakfast] = new Codec[Breakfast] = ??? class FoodOrderSerializer extends Serializer[FoodOrder] = ??? class FoodOrderDeserializer extends Deserializer[FoodOrder] = ???
  10. 17 @loicmdivad @XebiaFr // Exercise #1: Breakfast sealed trait FoodOrder

    case class Breakfast(lang: Lang, fruit: Fruit, liquid: Liquid, pastries: Vector[Pastry] = Vector.empty) extends FoodOrder implicit lazy val BreakfastCodec: Codec[Breakfast] = new Codec[Breakfast] = ??? class FoodOrderSerializer extends Serializer[FoodOrder] = ??? class FoodOrderDeserializer extends Deserializer[FoodOrder] = ??? org.apache.kafka.common.serialization Take Away
  11. 19 @loicmdivad @XebiaFr Log and Crash 2019-04-17 03:43:12 macbook-de-lolo [ERROR]

    (LogAndFailExceptionHandler.java:39) - Exception caught during Deserialization, taskId: 0_0, topic: input-food-order, partition: 0, offset: 109 Exception in thread "answer-one-breakfast-0d808ce7-0ef1-44c6-808a-f594bc7fceae-StreamThread-1" org.apache.kafka.streams.errors.StreamsException: Deserialization exception handler is set to fail upon a deserialization error. If you would rather have the streaming pipeline continue after a deserialization error, please set the default.deserialization.exception.handler appropriately. at org.apache.kafka.streams.processor.internals.RecordDeserializer.deserialize(RecordDeserializer.java:80) at org.apache.kafka.streams.processor.internals.RecordQueue.addRawRecords(RecordQueue.java:101) at org.apache.kafka.streams.processor.internals.PartitionGroup.addRawRecords(PartitionGroup.java:124) ... at org.apache.kafka.streams.processor.internals.StreamTask.addRecords(StreamTask.java:711) at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:747) Caused by: java.lang.IllegalArgumentException: dishes: Insufficient number of elements: decoded 0 but should have decoded 268435712 at scodec.Attempt$Failure.require(Attempt.scala:108) at fr.xebia.ldi.ratatouille.serde.BreakfastDeserializer.deserialize(BreakfastDeserializer.scala:22) at fr.xebia.ldi.ratatouille.serde.BreakfastDeserializer.deserialize(BreakfastDeserializer.scala:15) at org.apache.kafka.common.serialization.Deserializer.deserialize(Deserializer.java:58) at fr.xebia.ldi.ratatouille.serde.BreakfastDeserializer.deserialize(BreakfastDeserializer.scala:15) at org.apache.kafka.streams.processor.internals.SourceNode.deserializeValue(SourceNode.java:60) at org.apache.kafka.streams.processor.internals.RecordDeserializer.deserialize(RecordDeserializer.java:66)
  12. 20 @loicmdivad @XebiaFr Log and Crash 2019-04-17 03:43:12 macbook-de-lolo [ERROR]

    (LogAndFailExceptionHandler.java:39) - Exception caught during Deserialization, taskId: 0_0, topic: exercise-breakfast, partition: 0, offset: 109 Exception in thread "answer-one-breakfast-0d808ce7-0ef1-44c6-808a-f594bc7fceae-StreamThread-1" org.apache.kafka.streams.errors.StreamsException: Deserialization exception handler is set to fail upon a deserialization error. If you would rather have the streaming pipeline continue after a deserialization error, please set the default.deserialization.exception.handler appropriately. at org.apache.kafka.streams.processor.internals.RecordDeserializer.deserialize(RecordDeserializer.java:80) at org.apache.kafka.streams.processor.internals.RecordQueue.addRawRecords(RecordQueue.java:101) at org.apache.kafka.streams.processor.internals.PartitionGroup.addRawRecords(PartitionGroup.java:124) ... at org.apache.kafka.streams.processor.internals.StreamTask.addRecords(StreamTask.java:711) at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:747) Caused by: java.lang.IllegalArgumentException: dishes: Insufficient number of elements: decoded 0 but should have decoded 268435712 at scodec.Attempt$Failure.require(Attempt.scala:108) at fr.xebia.ldi.ratatouille.serde.BreakfastDeserializer.deserialize(BreakfastDeserializer.scala:22) at fr.xebia.ldi.ratatouille.serde.BreakfastDeserializer.deserialize(BreakfastDeserializer.scala:15) at org.apache.kafka.common.serialization.Deserializer.deserialize(Deserializer.java:58) at fr.xebia.ldi.ratatouille.serde.BreakfastDeserializer.deserialize(BreakfastDeserializer.scala:15) at org.apache.kafka.streams.processor.internals.SourceNode.deserializeValue(SourceNode.java:60) at org.apache.kafka.streams.processor.internals.RecordDeserializer.deserialize(RecordDeserializer.java:66)
  13. 21 @loicmdivad @XebiaFr |val frame1: Array[Byte] = Array(0x33, 0xd4, 0xfc,

    0x00, 0x00, 0x00, 0x01, 0xa5) |val frame2: Array[Byte] = Array(0x44, 0xd2, 0xfe, 0x10, 0x02, 0x03, 0x01)
  14. 22 @loicmdivad @XebiaFr |val frame1: Array[Byte] = Array( , 0xd4,

    0xfc, 0x00, 0x00, 0x00, 0x01, 0xa5) |val frame2: Array[Byte] = Array( , 0xd2, 0xfe, 0x10, 0x02, 0x03, 0x01)
  15. 23 @loicmdivad @XebiaFr |val frame1: Array[Byte] = Array( , 0xd4,

    0xfc, 0x00, 0x00, 0x00, 0x01, 0xa5) |val frame2: Array[Byte] = Array( , 0xd2, 0xfe, 0x10, x2, 0x03, 0x01) |case class Meat(sausages: Int, bacons: Int, . . . )
  16. 24 @loicmdivad @XebiaFr ▼ Change consumer group ▼ Manually update

    my offsets ▼ Reset my streaming app and set my auto reset to LATEST ▽ $ kafka-streams-application-reset ... ▼ Destroy the topic, no message = no poison pill ▽ $ kafka-topics --delete --topic ... ▼ My favourite <3 ▽ $ confluent destroy && confluent start Don’t Do ▼ Fill an issue and suggest a fix to the tooling team
  17. 26 @loicmdivad @XebiaFr 26 @loicmdivad @XebiaFr Log and Crash Like

    all consumers, Kafka Streams applications deserialize messages from the broker. The deserialization process can fail. It raises an exception that cannot be caught by our code. Buggy deserializers have to be fixed before the application restarts, by default ...
  18. 28 @loicmdivad @XebiaFr // Exercise #2: Lunch sealed trait FoodOrder

    case class Lunch(name: String, price: Double, `type`: LunchType) extends FoodOrder
  19. 29 @loicmdivad @XebiaFr // Exercise #2: Lunch sealed trait FoodOrder

    case class Lunch(name: String, price: Double, `type`: LunchType) extends FoodOrder • starter • main • dessert
  20. 31 @loicmdivad @XebiaFr Skip the Corrupted 2019-04-17 03:43:12 macbook-de-lolo [ERROR]

    (LogAndFailExceptionHandler.java:39) - Exception caught during Deserialization, taskId: 0_0, topic: exercise-breakfast, partition: 0, offset: 109 Exception in thread "answer-one-breakfast-0d808ce7-0ef1-44c6-808a-f594bc7fceae-StreamThread-1" org.apache.kafka.streams.errors.StreamsException: Deserialization exception handler is set to fail upon a deserialization error. If you would rather have the streaming pipeline continue after a deserialization error, please set the default.deserialization.exception.handler appropriately. at org.apache.kafka.streams.processor.internals.RecordDeserializer.deserialize(RecordDeserializer.java:80) at org.apache.kafka.streams.processor.internals.RecordQueue.addRawRecords(RecordQueue.java:101) at org.apache.kafka.streams.processor.internals.PartitionGroup.addRawRecords(PartitionGroup.java:124) ... at org.apache.kafka.streams.processor.internals.StreamTask.addRecords(StreamTask.java:711) at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:747) Caused by: java.lang.IllegalArgumentException: ... decoded 0 but should have decoded 268435712 at scodec.Attempt$Failure.require(Attempt.scala:108) at fr.xebia.ldi.ratatouille.serde.BreakfastDeserializer.deserialize(BreakfastDeserializer.scala:22) at fr.xebia.ldi.ratatouille.serde.BreakfastDeserializer.deserialize(BreakfastDeserializer.scala:15) at org.apache.kafka.common.serialization.Deserializer.deserialize(Deserializer.java:58) at fr.xebia.ldi.ratatouille.serde.BreakfastDeserializer.deserialize(BreakfastDeserializer.scala:15) at org.apache.kafka.streams.processor.internals.SourceNode.deserializeValue(SourceNode.java:60) at org.apache.kafka.streams.processor.internals.RecordDeserializer.deserialize(RecordDeserializer.java:66)
  21. 32 @loicmdivad @XebiaFr 32 @loicmdivad @XebiaFr public class LogAndFailExceptionHandler implements

    DeserializationExceptionHandler /* ... */ public class LogAndContinueExceptionHandler implements DeserializationExceptionHandler /* ... */
  22. 33 @loicmdivad @XebiaFr public class LogAndFailExceptionHandler implements DeserializationExceptionHandler /* ...

    */ public class LogAndContinueExceptionHandler implements DeserializationExceptionHandler /* ... */ public interface DeserializationExceptionHandler extends Configurable { DeserializationHandlerResponse handle(final ProcessorContext context, final ConsumerRecord<byte[], byte[]> record, final Exception exception); enum DeserializationHandlerResponse { CONTINUE(0, "CONTINUE"), FAIL(1, "FAIL"); /* ... */ } } }
  23. 34 @loicmdivad @XebiaFr public class LogAndFailExceptionHandler implements DeserializationExceptionHandler /* ...

    */ public class LogAndContinueExceptionHandler implements DeserializationExceptionHandler /* ... */ public interface DeserializationExceptionHandler extends Configurable { DeserializationHandlerResponse handle(final ProcessorContext context, final ConsumerRecord<byte[], byte[]> record, final Exception exception); enum DeserializationHandlerResponse { CONTINUE(0, "CONTINUE"), FAIL(1, "FAIL"); /* ... */ } } } Take Away
  24. 36 @loicmdivad @XebiaFr 36 @loicmdivad @XebiaFr The Exception Handler in

    the call stack Powered by the Flow intelliJ plugin ➞ findtheflow.io
  25. 37 @loicmdivad @XebiaFr 37 @loicmdivad @XebiaFr Powered by the Flow

    intelliJ plugin ➞ findtheflow.io The Exception Handler in the call stack
  26. 38 @loicmdivad @XebiaFr 38 @loicmdivad @XebiaFr Powered by the Flow

    intelliJ plugin ➞ findtheflow.io The Exception Handler in the call stack
  27. 39 @loicmdivad @XebiaFr 39 @loicmdivad @XebiaFr Powered by the Flow

    intelliJ plugin ➞ findtheflow.io The Exception Handler in the call stack
  28. 40 @loicmdivad @XebiaFr 40 @loicmdivad @XebiaFr Skip the Corrupted All

    exceptions thrown by deserializers are caught by a DeserializationExceptionHandler A handler returns Fail or Continue You can implement your own Handler But the two handlers provided by the library are really basic… let’s explore other methods
  29. 41 @loicmdivad @XebiaFr 41 @loicmdivad @XebiaFr All exceptions thrown by

    deserializers are caught by a DeserializationExceptionHandler A handler returns Fail or Continue You can implement your own Handler But the two handlers provided by the library are really basic… let’s explore other methods Skip the Corrupted Take Away
  30. 43 @loicmdivad @XebiaFr // Exercise #3: Drink sealed trait FoodOrder

    case class Drink(name: String, quantity: Int, `type`: DrinkType, alcohol: Option[Double]) extends FoodOrder
  31. 44 @loicmdivad @XebiaFr // Exercise #3: Drink sealed trait FoodOrder

    case class Drink(name: String, quantity: Int, `type`: DrinkType, alcohol: Option[Double]) extends FoodOrder • wine • rhum • beer • champagne • ...
  32. 45 @loicmdivad @XebiaFr We need to turn the deserialization process

    into a pure transformation that cannot crash To do so, we will replace corrupted message by a sentinel value. It’s a special-purpose record (e.g: null, None, Json.Null, etc ...) Sentinel Value Pattern f: G → H G H
  33. 46 @loicmdivad @XebiaFr We need to turn the deserialization process

    into a pure transformation that cannot crash To do so, we will replace corrupted message by a sentinel value. It’s a special-purpose record (e.g: null, None, Json.Null, etc ...) This allows downstream processors to recognize and handle such sentinel values Sentinel Value Pattern f: G → H G H G H
  34. 47 @loicmdivad @XebiaFr We need to turn the deserialization process

    into a pure transformation that cannot crash To do so, we will replace corrupted message by a sentinel value. It’s a special-purpose record (e.g: null, None, Json.Null, etc ...) This allows downstream processors to recognize and handle such sentinel values With Kafka Streams this can be achieved by implementing a Deserializer Sentinel Value Pattern f: G → H G H G H null
  35. 49 @loicmdivad @XebiaFr case object FoodOrderError extends FoodOrder class FoodOrderDeserializer

    extends Deserializer[FoodOrder] = ??? class SentinelValueDeserializer extends FoodOrderDeserializer { override def deserialize(topic: String, data: Array[Byte]): FoodOrder = Try(super.deserialize(topic, data)).getOrElse(FoodOrderErr) }
  36. 51 @loicmdivad @XebiaFr class FoodOrderSentinelValueProcessor extends ValueTransformer[Json, Unit] { var

    sensor: Sensor = _ var context: ProcessorContext = _ def metricName(stat: String): MetricName = ??? override def init(context: ProcessorContext): Unit = { this.context = context this.sensor = this.context.metrics.addSensor("sentinel-value", INFO) sensor.add(metricName("total"), new Total()) sensor.add(metricName("rate"), new Rate(TimeUnit.SECONDS, new Count())) } override def transform(value: Json): Unit = sensor.record() }
  37. 54 @loicmdivad @XebiaFr 54 @loicmdivad @XebiaFr Sentinel Value Pattern By

    implementing a custom serde we can create a safe Deserializer. Downstreams now receive a sentinel value indicating a deserialization error. Errors can then be treated correctly, example: monitoring the number of deserialization errors with a custom metric But we lost a lot of information about the error… let’s see a last method
  38. 55 @loicmdivad @XebiaFr 55 @loicmdivad @XebiaFr Sentinel Value Pattern By

    implementing a custom serde we can create a safe Deserializer. Downstreams now receive a sentinel value indicating a deserialization error. Errors can then be treated correctly, example: monitoring the number of deserialization errors with a custom metric But we lost a lot of information about the error… let’s see a last method Take Away
  39. 57 @loicmdivad @XebiaFr // Exercise #4: Dinner sealed trait FoodOrder

    case class Dinner(dish: Command, zone: String, moment: Moment, maybeClient: Option[Client]) extends FoodOrder
  40. 58 @loicmdivad @XebiaFr Dead Letter Queue Pattern In this method

    we will let the deserializer fail. For each failure we will send a message to a topic containing corrupted messages. Each message will have the original content of the input message (for reprocessing) and additional meta data about the failure. With Kafka Streams this can be achieved by implementing a DeserializationExceptionHandler Streaming APP dead letter queue input topic output topic
  41. 59 @loicmdivad @XebiaFr class DeadLetterQueueFoodExceptionHandler() extends DeserializationExceptionHandler { override def

    handle(context: ProcessorContext, record: ConsumerRecord[Array[Byte], Array[Byte]], exception: Exception): DeserializationHandlerResponse = { }
  42. 60 @loicmdivad @XebiaFr class DeadLetterQueueFoodExceptionHandler() extends DeserializationExceptionHandler { override def

    handle(context: ProcessorContext, record: ConsumerRecord[Array[Byte], Array[Byte]], exception: Exception): DeserializationHandlerResponse = { val producerRecord = new ProducerRecord(topic, /*same key, value and ts,*/ headers.asJava) producer.send(producerRecord, /* Producer Callback */ ) DeserializationHandlerResponse.CONTINUE }
  43. 61 @loicmdivad @XebiaFr class DeadLetterQueueFoodExceptionHandler() extends DeserializationExceptionHandler { var topic:

    String = _ var producer: KafkaProducer[Array[Byte], Array[Byte]] = _ override def configure(configs: util.Map[String, _]): Unit = ??? override def handle(context: ProcessorContext, record: ConsumerRecord[Array[Byte], Array[Byte]], exception: Exception): DeserializationHandlerResponse = { val producerRecord = new ProducerRecord(topic, /*same key, value and ts,*/ headers.asJava) producer.send(producerRecord, /* Producer Callback */ ) DeserializationHandlerResponse.CONTINUE }
  44. 62 @loicmdivad @XebiaFr class DeadLetterQueueFoodExceptionHandler() extends DeserializationExceptionHandler { var topic:

    String = _ var producer: KafkaProducer[Array[Byte], Array[Byte]] = _ override def configure(configs: util.Map[String, _]): Unit = ??? override def handle(context: ProcessorContext, record: ConsumerRecord[Array[Byte], Array[Byte]], exception: Exception): DeserializationHandlerResponse = { val headers = record.headers().toArray ++ Array[Header]( new RecordHeader("processing-time", ???), new RecordHeader("hexa-datetime", ???), new RecordHeader("error-message", ???), ... ) val producerRecord = new ProducerRecord(topic, /*same key, value and ts,*/ headers.asJava) producer.send(producerRecord, /* Producer Callback */ ) DeserializationHandlerResponse.CONTINUE }
  45. 63 @loicmdivad @XebiaFr Fill the headers with some meta data

    01061696e0016536f6d6500000005736f6d65206f Value message to hexa Restaurant description Event date and time Food order category
  46. 64 @loicmdivad @XebiaFr class DeadLetterQueueFoodExceptionHandler() extends DeserializationExceptionHandler { var topic:

    String = _ var producer: KafkaProducer[Array[Byte], Array[Byte]] = _ override def configure(configs: util.Map[String, _]): Unit = ??? override def handle(context: ProcessorContext, record: ConsumerRecord[Array[Byte], Array[Byte]], exception: Exception): DeserializationHandlerResponse = { val headers = record.headers().toArray ++ Array[Header]( new RecordHeader("processing-time", ???), new RecordHeader("hexa-datetime", ???), new RecordHeader("error-message", ???), ... ) val producerRecord = new ProducerRecord(topic, /*same key, value and ts,*/ headers.asJava) producer.send(producerRecord, /* Producer Callback */ ) DeserializationHandlerResponse.CONTINUE }
  47. 65 @loicmdivad @XebiaFr class DeadLetterQueueFoodExceptionHandler() extends DeserializationExceptionHandler { var topic:

    String = _ var producer: KafkaProducer[Array[Byte], Array[Byte]] = _ override def configure(configs: util.Map[String, _]): Unit = ??? override def handle(context: ProcessorContext, record: ConsumerRecord[Array[Byte], Array[Byte]], exception: Exception): DeserializationHandlerResponse = { val headers = record.headers().toArray ++ Array[Header]( new RecordHeader("processing-time", ???), new RecordHeader("hexa-datetime", ???), new RecordHeader("error-message", ???), ... ) val producerRecord = new ProducerRecord(topic, /*same key, value and ts,*/ headers.asJava) producer.send(producerRecord, /* Producer Callback */ ) DeserializationHandlerResponse.CONTINUE } Take Away
  48. 69 @loicmdivad @XebiaFr 69 @loicmdivad @XebiaFr Dead Letter Queue Pattern

    You can provide your own implementation of DeserializationExceptionHandler. This lets you use the Producer API to write a corrupted record directly to a quarantine topic. Then you can manually analyse your corrupted records ⚠Warning: This approach have side effects that are invisible to the Kafka Streams runtime.
  49. 70 @loicmdivad @XebiaFr 70 @loicmdivad @XebiaFr Dead Letter Queue Pattern

    You can provide your own implementation of DeserializationExceptionHandler. This lets you use the Producer API to write a corrupted record directly to a quarantine topic. Then you can manually analyse your corrupted records ⚠Warning: This approach have side effects that are invisible to the Kafka Streams runtime. Take Away
  50. 73 @loicmdivad @XebiaFr 73 @loicmdivad @XebiaFr Related Post Kafka Connect

    Deep Dive – Error Handling and Dead Letter Queues - by Robin Moffatt Building Reliable Reprocessing and Dead Letter Queues with Apache Kafka - by Ning Xia Handling bad messages using Kafka's Streams API - answer by Matthias J. Sax
  51. 74 @loicmdivad @XebiaFr 74 @loicmdivad @XebiaFr Conclusion When using Kafka,

    deserialization is the responsibility of the clients. These internal errors are not easy to catch When it’s possible, use Avro + Schema Registry When it’s not possible, Kafka Streams applies techniques to deal with serde errors: - DLQ: By extending a ExceptionHandler - Sentinel Value: By extending a Deserializer
  52. 76 @loicmdivad @XebiaFr 76 @loicmdivad @XebiaFr Images Photo by rawpixel

    on Unsplash Photo by João Marcelo Martins on Unsplash Photo by Jordane Mathieu on Unsplash Photo by Brooke Lark on Unsplash Photo by Jakub Kapusnak on Unsplash Photo by Melissa Walker Horn on Unsplash Photo by Aneta Pawlik on Unsplash
  53. 79 @loicmdivad @XebiaFr Pure HTML Akka Http Server Akka Actor

    System Kafka Topic Exercise1 Exercise2 Me, clicking everywhere Akka Stream Kafka