Building an Streaming Platform with Kafka

4c253af5a9977910b9326b19199d3023?s=47 Pere Urbón
November 21, 2018

Building an Streaming Platform with Kafka

The need to integrate a swarm of systems has always been present in the history of IT, however with the advent of microservices, big data and IoT this has simply exploded. Through the exploration of a few use cases, this presentation will introduce stream processing, a powerful and scalable way to transform and connect applications around your business.

We will explain in this talk how Apache Kafka and Confluent can be used to connect the diverse collection of applications the actual business face. Components such as KSQL where non developers can process stream events at scale or Kafka Stream oriented to build scalable applications to process event data.

4c253af5a9977910b9326b19199d3023?s=128

Pere Urbón

November 21, 2018
Tweet

Transcript

  1. 1 Building an Streaming Platform with Kafka Pere Urbón-Bayes Technical

    Architect (TAM) pere@confluent.io
  2. 2 Topics • Set the stage. • Introducing the key

    concepts ( Kafka Broker, Connect and KStreams) • Using events for notifications and state transfer • Conclusion
  3. 3 Kafka & Confluent

  4. 4 Is Kafka a Streaming Platform? The Log Connectors Connectors

    Producer Consumer Streaming Engine
  5. 5 authorization_attempts possible_fraud What exactly is Stream Processing?

  6. 6 CREATE STREAM possible_fraud AS SELECT card_number, count(*) FROM authorization_attempts

    WINDOW TUMBLING (SIZE 5 MINUTE) GROUP BY card_number HAVING count(*) > 3; What exactly is Stream Processing? authorization_attempts possible_fraud
  7. 7 CREATE STREAM possible_fraud AS SELECT card_number, count(*) FROM authorization_attempts

    WINDOW TUMBLING (SIZE 5 MINUTE) GROUP BY card_number HAVING count(*) > 3; authorization_attempts possible_fraud What exactly is Stream Processing?
  8. 8 CREATE STREAM possible_fraud AS SELECT card_number, count(*) FROM authorization_attempts

    WINDOW TUMBLING (SIZE 5 MINUTE) GROUP BY card_number HAVING count(*) > 3; authorization_attempts possible_fraud What exactly is Stream Processing?
  9. 9 CREATE STREAM possible_fraud AS SELECT card_number, count(*) FROM authorization_attempts

    WINDOW TUMBLING (SIZE 5 MINUTE) GROUP BY card_number HAVING count(*) > 3; authorization_attempts possible_fraud What exactly is Stream Processing?
  10. 10 CREATE STREAM possible_fraud AS SELECT card_number, count(*) FROM authorization_attempts

    WINDOW TUMBLING (SIZE 5 MINUTE) GROUP BY card_number HAVING count(*) > 3; authorization_attempts possible_fraud What exactly is Stream Processing?
  11. 11 CREATE STREAM possible_fraud AS SELECT card_number, count(*) FROM authorization_attempts

    WINDOW TUMBLING (SIZE 5 MINUTE) GROUP BY card_number HAVING count(*) > 3; authorization_attempts possible_fraud What exactly is Stream Processing?
  12. 12 Streaming is the toolset for dealing with events as

    they move!
  13. 13 Looking more closely: What is a Streaming Platform? The

    Log Connectors Connectors Producer Consumer Streaming Engine
  14. 14 Looking more closely: Kafka’s Distributed Log The Log Connectors

    Connectors Producer Consumer Streaming Engine
  15. 15 Kafka’s Distributed Log: A durable messaging system Kafka is

    similar to a traditional messaging system (ActiveMQ, Rabbit,..) but with: • Better scalability • Fault Tolerance • Hight Availability • Better storage.
  16. 16 The log is a simple idea Messages are always

    appended at the end Old New
  17. 17 Consumers have a position all of their own Sally

    is here George is here Fred is here Old New Scan Scan Scan
  18. 18 Only Sequential Access Old New Read to offset &

    scan
  19. 19 Scaling Out

  20. 20 Shard data to get scalability Messages are sent to

    different partitions Producer (1) Producer (2) Producer (3) Cluster of machine s Partitions live on different machines
  21. 21 Replicate to get fault tolerance replicate msg msg leader

    Machine A Machine B
  22. 22 Replication provides resiliency A ‘replica’ takes over on machine

    failure
  23. 23 Linearly Scalable Architecture Single topic: - Many producers machines

    - Many consumer machines - Many Broker machines No Bottleneck!! Consumers Producers KAFKA
  24. 24 Clusters can be connected to provide Worldwide, localized views

    24 NY London Tokyo Replicator Replicator Replicator
  25. 25 The Connect API The Log Connectors Connectors Producer Consumer

    Streaming Engine
  26. 26 Ingest / Egest into practically any data source Kafka

    Connect Kafka Connect Kafka
  27. 27 List of Kafka Connect sources and sinks (and more…)

    Amazon S3 Elasticsearch HDFS JDBC Couchbase Cassandra Oracle SAP Vertica Blockchain JMX Kenesis MongoDB MQTT NATS Postgres Rabbit Redis Twitter DynamoDB FTP Github BigQuery Google Pub Sub RethinkDB Salesforce Solr Splunk
  28. 28 The Kafka Streams API / KSQL The Log Connectors

    Connectors Producer Consumer Streaming Engine
  29. 29 SELECT card_number, count(*) FROM authorization_attempts WINDOW (SIZE 5 MINUTE)

    GROUP BY card_number HAVING count(*) > 3; Engine for Continuous Computation
  30. 30 But it’s just an API public static void main(String[]

    args) { StreamsBuilder builder = new StreamsBuilder(); builder.stream(”caterpillars") .map((k, v) -> coolTransformation(k, v)) .to(“butterflies”); new KafkaStreams(builder.build(), props()).start(); } 30
  31. 31 Compacted Topic Join Stream Table Kafka Kafka Streams /

    KSQL Topic Join Streams and Tables
  32. 32 Windows / Retention – Handle Late Events The asynchronous

    dilemma: Who was first? The order or the payment? KAFKA Payments Orders Buffer 5 mins Emailer Join by Key
  33. 33 KAFKA Payments Orders Buffer 5 mins Emailer Join by

    Key KStream orders = builder.stream(“Orders”); KStream payments = builder.stream(“Payments”); orders.join(payments, KeyValue::new, JoinWindows.of(1 * MIN)) .peek((key, pair) -> emailer.sendMail(pair)); Windows / Retention – Handle Late Events
  34. 34 A KTable is just a stream with infinite retention

    KAFKA Emailer Orders, Payments Customers Join
  35. 35 KStream orders = builder.stream(“Orders”); KStream payments = builder.stream(“Payments”); KTable

    customers = builder.table(“Customers”); orders.join(payments, EmailTuple::new, JoinWindows.of(1*MIN)) .join(customers, (tuple, cust) -> tuple.setCust(cust)) .peek((key, tuple) -> emailer.sendMail(tuple)); KAFKA Emailer Orders, Payments Customers Join Materialize a table in two lines of code! A KTable is just a stream with infinite retention
  36. 36 The Log Connectors Connectors Producer Consumer Streaming Engine Kafka

    is a complete Streaming Platform
  37. 37 What happens when we apply this to Microservices? Microservices

  38. 38 Microservices App Increasingly we build ecosystems: Microservices

  39. 39 We break them into services that have specific roles

    Customer Service Shipping Service
  40. 40 The Problem is now your DATA

  41. 41 Most services share the same core facts. Orders Customers

    Catalog Most services live in here
  42. 42 Kafka works as a Backbone for Services to exchange

    Events 42 Kafka Notification Data is replicated
  43. 43 Services on a Streaming Platform

  44. 44

  45. 45 Thank You!, questions? Pere Urbón-Bayes Technical Architect (TAM) pere@confluent.io

    http://www.twitter.com/purbon