Rethinking Stream Processing with Apache Kafka: Applications vs. Clusters, Streams vs. Databases - Zurich Apache Kafka Meetup 19 September 2017

1 Rethinking Stream Processing with Apache Kafka: Applications vs. Clusters,
Streams vs. Databases An introduction to Kafka’s Streams API Target audience: technical staff, developers, architects Expected duration: 40 minutes

2 0.11 Exactly-once semantics 0.10 Data processing (Streams API) 0.9
Data integration (Connect API) Intra-cluster replication 0.8 2012 2014 2015 2016 2017 Cluster mirroring 0.7 2013 Apache Kafka: birthed as a messaging system, now a streaming platform

13 (Does NOT run inside the Kafka brokers!)

14 (Does NOT run inside the Kafka brokers!)

18 http://docs.confluent.io/current/streams/kafka-streams-examples/docs/index.html

20 Before

21 Before With Kafka’s Streams API

22 KStream<Integer, Integer> input = builder.stream("numbers-topic"); // Stateless computation KStream<Integer,
Integer> doubled = input.mapValues(v -> v * 2); // Stateful computation KTable<Integer, Integer> sumOfOdds = input .filter((k,v) -> v % 2 != 0) .selectKey((k, v) -> 1) .groupByKey() .reduce((v1, v2) -> v1 + v2, "sum-of-odds"); class PrintToConsoleProcessor implements Processor<K, V> { @Override public void init(ProcessorContext context) {} @Override void process(K key, V value) { System.out.println("Got value " + value); } @Override void punctuate(long timestamp) {} @Override void close() {} }

24 Linux Windows

30 http://www.confluent.io/blog/introducing-kafka-streams-stream-processing-made-simple https://kafka.apache.org/documentation/streams#streams_duality

43 …and many more…

44 …and many more…

47 2016 2017 First release of Kafka’s Streams API (0.10.0.0)
today Kafka Streams API in the wild Kafka 0.10.2.1 In production at LINE Corp., Japan 220+ million active users, processing millions of msg/s “Applying Kafka Streams for internal message delivery pipeline” https://engineering.linecorp.com/en/blog/detail/80

49 Supported since Apache Kafka 0.11 (June 2017)

58 …and more…

60 $ curl -sXGET http://localhost:7070/kafka-music/charts/top-five [ { "artist": "Subhumans", "album":
"Live In A Dive", "name": "All Gone Dead", "plays": 126 }, { "artist": "Wheres The Pope?", "album": "PSI", "name": "Fear Of God", "plays": 115 }, ... ]

64 https://kafka.apache.org/documentation/streams http://docs.confluent.io/current/streams/ https://www.confluent.io/downloads/

65 KSQL: a Streaming SQL Engine for Apache Kafka™ from
Confluent ü No coding required, all you need is SQL ü No separate processing cluster required ü Powered by Kafka: elastic, scalable, distributed, battle-tested CREATE TABLE possible_fraud AS SELECT card_number, count(*) FROM authorization_attempts WINDOW TUMBLING (SIZE 5 SECONDS) GROUP BY card_number HAVING count(*) > 3; CREATE STREAM vip_actions AS SELECT userid, page, action FROM clickstream c LEFT JOIN users u ON c.userid = u.userid WHERE u.level = ‘Platinum’; KSQL is the simplest way to process streams of data in real-time ü Perfect for streaming ETL, anomaly detection, event monitoring, and more ü Part of Confluent Open Source https://github.com/confluentinc/ksql

Rethinking Stream Processing with Apache Kafka...

Rethinking Stream Processing with Apache Kafka: Applications vs. Clusters, Streams vs. Databases - Zurich Apache Kafka Meetup 19 September 2017

More Decks by Michael G. Noll

Other Decks in Programming

Featured

Transcript