Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Processor API: The dark side of Kafka-Streams

Processor API: The dark side of Kafka-Streams

Presentation of the states stores in Kafka-Streams and how to use them through the Processor API

Ed81876bf33da90cdae47ce9b8df056b?s=128

Loïc DIVAD

March 12, 2018
Tweet

Transcript

  1. Processor API The dark side of Kafka-Streams 1

  2. Loïc DIVAD Data Engineer @XebiaFr @loicmdivad 2

  3. APACHE KAFKA Producers Consumers Kafka Broker-1 Broker-1 Broker-1 topic 3

  4. KAFKA STREAMS 4 Kafka Streams

  5. > println(sommaire) 1. Use Case: Stream Fighter 2. Streams DSL:

    the principal API 3. Processor API: the dark side ... 4. The Token Provider 5
  6. USE CASE: STREAM FIGHTER 6

  7. USE CASE: STREAM FIGHTER { , , } 7 Round

    event:
  8. USE CASE: STREAM FIGHTER PRO: Character Victories Ryu 32 Chun-Li

    28 Skullomania 14 8
  9. STREAMS DSL: THE PRINCIPAL API 9

  10. KAFKA STREAM, STREAMS DSL 1. Simple 2. Expressif 3. Declarative

    The Kafka Streams DSL is built on top of the Streams Processor API. It is the recommended for most users… Most data processing operations can be expressed in just a few lines of DSL code. 10
  11. KAFKA STREAM, STREAMS DSL StreamsBuilder builder = new StreamsBuilder(); GlobalKTable<String,

    Arena> arenaTable = builder.globalTable("ARENAS", /* */); KStream<String, Round> rounds = builder.stream("ROUNDS", /* */); rounds .filter((String arenaId, Round round) -> round.getGame() == StreetFighter) .mapValues((Round round) -> round.getWinner()) .join(arenaTable, (arena, player) -> arena, Victory::new) .selectKey(Parsing::extractConceptAndCharacter) //{arena: …, character: …} .groupByKey().windowedBy(window).count(/* */); 11
  12. STATE-LESS OPERATIONS Or element wise computation : the processing of

    a single element in isolation .filter() .map() 12
  13. STATE-FULL OPERATIONS Aggregations requires a state to combine multiple elements

    together .groupBY() 2 8 State 0 5 13
  14. PROCESSOR API: THE OTHER API 14

  15. KAFKA STREAM, PROCESSOR API With the Processor API, you can

    define arbitrary stream processors that process one received record at a time, and connect these processors with their associated state stores to compose a topology that represents a customized processing logic. 1. KStream & KTable 2. .map() / .filter() 3. .count() / .reduce() / .leftJoin() 4. Direct access to the state stores 15
  16. PROCESSOR TOPOLOGY 16

  17. PROCESSOR CONTEXT A processor context provides : • the metadata

    of the currently processed record • the source kafka topic and partition • Access to the states stores • The abilities to forward records to the down streams A stream processor is a node in the processor topology that represents a single processing step. Processor State Store Input / Output Stream 17
  18. PROCESSOR TOPOLOGY .addSource("ARENAS" ... .addSource("ROUNDS" ... 18

  19. ROCKSDB RocksDB is a C++ library providing an embedded key-value

    store, where keys and values are arbitrary byte streams. RocksJava is a project to build high performance but easy-to-use Java driver for RocksDB. /tmp/kafka-streams/<application-id> ├── 1_0 ├── 2_0 │ └── VICTORIES-STORE │ └── VICTORIES-STORE:1518220800000 └── global └── rocksdb └── ARENASTATE-STORE-0000000000 <application-id>-ARENAS-STORE-changelog 19 KIP-67: Queryable state for Kafka Streams
  20. PROCESS-ARENA HAS NO ACCESS TO STATESTORE ARENA-STORE Exception in thread

    "XKE-KSTREAM-PROC-0170b98b-55b6-42b7-93d3-5c3dd734afb4-StreamThread-1" org.apache.kafka.streams.errors.StreamsException: failed to initialize processor PROCESS-ARENA at org.apache.kafka.streams.processor.internals.ProcessorNode.init(ProcessorNode.java:106) at org.apache.kafka.streams.processor.internals.StreamTask.initTopology(StreamTask.java:378) at org.apache.kafka.streams.processor.internals.StreamTask.initializeTopology(StreamTask.java:169) at org.apache.kafka.streams.processor.internals.AssignedTasks.transitionToRunning(AssignedTasks.java:292) at org.apache.kafka.streams.processor.internals.AssignedTasks.initializeNewTasks(AssignedTasks.java:126) at org.apache.kafka.streams.processor.internals.TaskManager.updateNewAndRestoringTasks(TaskManager.java:260) at org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:813) at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:774) at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:744) Caused by: org.apache.kafka.streams.errors.TopologyBuilderException: Invalid topology building: Processor PROCESS-ARENA has no access to StateStore ARENA-STORE at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.getStateStore(ProcessorContextImpl.java:72) at fr.xebia.ldi.fighter.stream.processor.ProcessArena.init(ProcessArena.java:23) at org.apache.kafka.streams.processor.internals.ProcessorNode$2.run(ProcessorNode.java:54) at org.apache.kafka.streams.processor.internals.StreamsMetricsImpl.measureLatencyNs(StreamsMetricsImpl.java:208) at org.apache.kafka.streams.processor.internals.ProcessorNode.init(ProcessorNode.java:104) ... 8 more 20
  21. PROCESSOR TOPOLOGY .addSource("ARENAS" ... .addSource("ROUNDS" ... 21

  22. PONCTUATION TYPE With the Stream time triggering of .punctuate() is

    driven by the arrival of messages. Stream time, 10sec 0s 30s 40s Wall clock time, 10sec 0s 40s When wall-clock-time is used, .punctuate() is triggered purely by the wall-clock time. KIP-138: Change punctuate semantics this.context.schedule( 10, PunctuationType.STREAM_TIME, /**/); this.context.schedule( 10, PunctuationType.WALL_CLOCK_TIME, /**/); 22
  23. PROCESSOR TOPOLOGY .addSource("ARENAS" ... .addSource("ROUNDS" ... context.forward(... 23

  24. PROCESSOR TOPOLOGY .addSource("ARENAS" ... .addSource("ROUNDS" ... context.forward(... 24

  25. THINK DISTRIBUTED 25 Ken PRO Ken PRO +1 +1

  26. THINK DISTRIBUTED 26 +2 Ken PRO Ken PRO

  27. PROCESSOR TOPOLOGY .addSource("ARENAS" ... .addSource("ROUNDS" ... context.forward(... 27

  28. PROCESSOR TOPOLOGY .addSource("ARENAS" ... .addSource("ROUNDS" ... context.forward(... this.store.put( ... .addSink(

    "... 28
  29. THINK DISTRIBUTED 29

  30. THINK DISTRIBUTED 30 builer .addSource("ARENAS-SRC", /*serde*/ , "ARENAS") .addProcessor("PROCESS", ProcessArena::new,

    "ARENA") builder .addGlobalStore( arenaStoreBuilder, /*src name*/, Serdes.String().deserializer(), arenaSerde.deserializer(), /*topic*/, /*processor*/, ProcessArena::new); KIP-233: SIMPLIFY .addGlobalStateStore()
  31. PROCESSOR TOPOLOGY .addSource("ARENAS" ... .addSource("ROUNDS" ... context.forward(... this.store.put( ... .addSink(

    "... 31
  32. THE TOKEN PROVIDER 32

  33. TOKEN CORRELATOR 33 {"token": "....

  34. TOKEN CORRELATOR 34 {"token": "001", ....

  35. TOKEN CORRELATOR Kafka Streams Rounds Input Tokens Input Tokens State

    Changelog Output gift 35
  36. TOKEN CORRELATOR KStream<String, Round> rounds = builder.stream("ROUNDS", Consumed.with(Serdes.String(), roundSerde, new

    EventTimeExtractor(), LATEST)); rounds .filter((arenaId, round) -> round.getGame() == StreetFighter) .filter((arenaId, round) -> round.getWinner().getCombo() >= 5) .filter((arenaId, round) -> round.getWinner().getLife() >= 75) .through("ONE-PARTITION-WINNERS-TOPIC") .transform(ProcessToken::new, "TOKEN-STORE") .to("PROVIDED-TOKENS"); 36
  37. TOKEN CORRELATOR KStream<String, Round> rounds = builder.stream("ROUNDS", Consumed.with(Serdes.String(), roundSerde, new

    EventTimeExtractor(), LATEST)); rounds .filter((arenaId, round) -> round.getGame() == StreetFighter) .filter((arenaId, round) -> round.getWinner().getCombo() >= 5) .filter((arenaId, round) -> round.getWinner().getLife() >= 75) .through("ONE-PARTITION-WINNERS-TOPIC") .transform( , "TOKEN-STORE") .to("PROVIDED-TOKENS"); 37
  38. CONCLUSION 38

  39. MERCI 39

  40. ANNEXES 40

  41. MEMORY MANAGEMENT 1x Kafka Producer : buffer.memory 2x Kafka Consumer

    : consumer.memory.bytes TCP send / receive buffers (100K) : • send.buffer.bytes • receive.buffer.bytes Deserialized Objects Buffering : • buffered.records.per.partition • buffered.bytes.per.partition Persistent State Store Buffering & Triggering based Caches : • cache.max.bytes.buffering • = write_buffer_size + max_write_buffer_number ? ( ) coming soon 0_0 0_1 41 Discussion: Memory Management
  42. Ressources - code / docs DivLoic/xke-stream-fighter 42 Processor API Documentaiton—

    Confluent Docs
  43. Resources - images 43 ➔ A few Unsplach pictures: ◆

    Photo by Caleb George on Unsplash ◆ Photo by Barn Images on Unsplash ◆ Photo by Rebecca Oliver on Unsplash ◆ Photo by Ian Schneider on Unsplash ➔ The 15 min GIPHY search: ◆ Brent Rambo ◆ The sad boy from Neverland ➔ The 5 min Google image search: ◆ All Character Select Themes from TaciturnArtist ◆ Ryan Hemsworth, the Street Fighter II remix ➔ A bunch of iOS emojis