Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Processor API: The dark side of Kafka-Streams

Processor API: The dark side of Kafka-Streams

Presentation of the states stores in Kafka-Streams and how to use them through the Processor API

Loïc DIVAD

March 12, 2018
Tweet

More Decks by Loïc DIVAD

Other Decks in Programming

Transcript

  1. > println(sommaire) 1. Use Case: Stream Fighter 2. Streams DSL:

    the principal API 3. Processor API: the dark side ... 4. The Token Provider 5
  2. KAFKA STREAM, STREAMS DSL 1. Simple 2. Expressif 3. Declarative

    The Kafka Streams DSL is built on top of the Streams Processor API. It is the recommended for most users… Most data processing operations can be expressed in just a few lines of DSL code. 10
  3. KAFKA STREAM, STREAMS DSL StreamsBuilder builder = new StreamsBuilder(); GlobalKTable<String,

    Arena> arenaTable = builder.globalTable("ARENAS", /* */); KStream<String, Round> rounds = builder.stream("ROUNDS", /* */); rounds .filter((String arenaId, Round round) -> round.getGame() == StreetFighter) .mapValues((Round round) -> round.getWinner()) .join(arenaTable, (arena, player) -> arena, Victory::new) .selectKey(Parsing::extractConceptAndCharacter) //{arena: …, character: …} .groupByKey().windowedBy(window).count(/* */); 11
  4. STATE-LESS OPERATIONS Or element wise computation : the processing of

    a single element in isolation .filter() .map() 12
  5. KAFKA STREAM, PROCESSOR API With the Processor API, you can

    define arbitrary stream processors that process one received record at a time, and connect these processors with their associated state stores to compose a topology that represents a customized processing logic. 1. KStream & KTable 2. .map() / .filter() 3. .count() / .reduce() / .leftJoin() 4. Direct access to the state stores 15
  6. PROCESSOR CONTEXT A processor context provides : • the metadata

    of the currently processed record • the source kafka topic and partition • Access to the states stores • The abilities to forward records to the down streams A stream processor is a node in the processor topology that represents a single processing step. Processor State Store Input / Output Stream 17
  7. ROCKSDB RocksDB is a C++ library providing an embedded key-value

    store, where keys and values are arbitrary byte streams. RocksJava is a project to build high performance but easy-to-use Java driver for RocksDB. /tmp/kafka-streams/<application-id> ├── 1_0 ├── 2_0 │ └── VICTORIES-STORE │ └── VICTORIES-STORE:1518220800000 └── global └── rocksdb └── ARENASTATE-STORE-0000000000 <application-id>-ARENAS-STORE-changelog 19 KIP-67: Queryable state for Kafka Streams
  8. PROCESS-ARENA HAS NO ACCESS TO STATESTORE ARENA-STORE Exception in thread

    "XKE-KSTREAM-PROC-0170b98b-55b6-42b7-93d3-5c3dd734afb4-StreamThread-1" org.apache.kafka.streams.errors.StreamsException: failed to initialize processor PROCESS-ARENA at org.apache.kafka.streams.processor.internals.ProcessorNode.init(ProcessorNode.java:106) at org.apache.kafka.streams.processor.internals.StreamTask.initTopology(StreamTask.java:378) at org.apache.kafka.streams.processor.internals.StreamTask.initializeTopology(StreamTask.java:169) at org.apache.kafka.streams.processor.internals.AssignedTasks.transitionToRunning(AssignedTasks.java:292) at org.apache.kafka.streams.processor.internals.AssignedTasks.initializeNewTasks(AssignedTasks.java:126) at org.apache.kafka.streams.processor.internals.TaskManager.updateNewAndRestoringTasks(TaskManager.java:260) at org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:813) at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:774) at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:744) Caused by: org.apache.kafka.streams.errors.TopologyBuilderException: Invalid topology building: Processor PROCESS-ARENA has no access to StateStore ARENA-STORE at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.getStateStore(ProcessorContextImpl.java:72) at fr.xebia.ldi.fighter.stream.processor.ProcessArena.init(ProcessArena.java:23) at org.apache.kafka.streams.processor.internals.ProcessorNode$2.run(ProcessorNode.java:54) at org.apache.kafka.streams.processor.internals.StreamsMetricsImpl.measureLatencyNs(StreamsMetricsImpl.java:208) at org.apache.kafka.streams.processor.internals.ProcessorNode.init(ProcessorNode.java:104) ... 8 more 20
  9. PONCTUATION TYPE With the Stream time triggering of .punctuate() is

    driven by the arrival of messages. Stream time, 10sec 0s 30s 40s Wall clock time, 10sec 0s 40s When wall-clock-time is used, .punctuate() is triggered purely by the wall-clock time. KIP-138: Change punctuate semantics this.context.schedule( 10, PunctuationType.STREAM_TIME, /**/); this.context.schedule( 10, PunctuationType.WALL_CLOCK_TIME, /**/); 22
  10. THINK DISTRIBUTED 30 builer .addSource("ARENAS-SRC", /*serde*/ , "ARENAS") .addProcessor("PROCESS", ProcessArena::new,

    "ARENA") builder .addGlobalStore( arenaStoreBuilder, /*src name*/, Serdes.String().deserializer(), arenaSerde.deserializer(), /*topic*/, /*processor*/, ProcessArena::new); KIP-233: SIMPLIFY .addGlobalStateStore()
  11. TOKEN CORRELATOR KStream<String, Round> rounds = builder.stream("ROUNDS", Consumed.with(Serdes.String(), roundSerde, new

    EventTimeExtractor(), LATEST)); rounds .filter((arenaId, round) -> round.getGame() == StreetFighter) .filter((arenaId, round) -> round.getWinner().getCombo() >= 5) .filter((arenaId, round) -> round.getWinner().getLife() >= 75) .through("ONE-PARTITION-WINNERS-TOPIC") .transform(ProcessToken::new, "TOKEN-STORE") .to("PROVIDED-TOKENS"); 36
  12. TOKEN CORRELATOR KStream<String, Round> rounds = builder.stream("ROUNDS", Consumed.with(Serdes.String(), roundSerde, new

    EventTimeExtractor(), LATEST)); rounds .filter((arenaId, round) -> round.getGame() == StreetFighter) .filter((arenaId, round) -> round.getWinner().getCombo() >= 5) .filter((arenaId, round) -> round.getWinner().getLife() >= 75) .through("ONE-PARTITION-WINNERS-TOPIC") .transform( , "TOKEN-STORE") .to("PROVIDED-TOKENS"); 37
  13. MEMORY MANAGEMENT 1x Kafka Producer : buffer.memory 2x Kafka Consumer

    : consumer.memory.bytes TCP send / receive buffers (100K) : • send.buffer.bytes • receive.buffer.bytes Deserialized Objects Buffering : • buffered.records.per.partition • buffered.bytes.per.partition Persistent State Store Buffering & Triggering based Caches : • cache.max.bytes.buffering • = write_buffer_size + max_write_buffer_number ? ( ) coming soon 0_0 0_1 41 Discussion: Memory Management
  14. Resources - images 43 ➔ A few Unsplach pictures: ◆

    Photo by Caleb George on Unsplash ◆ Photo by Barn Images on Unsplash ◆ Photo by Rebecca Oliver on Unsplash ◆ Photo by Ian Schneider on Unsplash ➔ The 15 min GIPHY search: ◆ Brent Rambo ◆ The sad boy from Neverland ➔ The 5 min Google image search: ◆ All Character Select Themes from TaciturnArtist ◆ Ryan Hemsworth, the Street Fighter II remix ➔ A bunch of iOS emojis