Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Building an Streaming Platform with Kafka

Pere Urbón
November 21, 2018

Building an Streaming Platform with Kafka

The need to integrate a swarm of systems has always been present in the history of IT, however with the advent of microservices, big data and IoT this has simply exploded. Through the exploration of a few use cases, this presentation will introduce stream processing, a powerful and scalable way to transform and connect applications around your business.

We will explain in this talk how Apache Kafka and Confluent can be used to connect the diverse collection of applications the actual business face. Components such as KSQL where non developers can process stream events at scale or Kafka Stream oriented to build scalable applications to process event data.

Pere Urbón

November 21, 2018
Tweet

More Decks by Pere Urbón

Other Decks in Technology

Transcript

  1. 2 Topics • Set the stage. • Introducing the key

    concepts ( Kafka Broker, Connect and KStreams) • Using events for notifications and state transfer • Conclusion
  2. 6 CREATE STREAM possible_fraud AS SELECT card_number, count(*) FROM authorization_attempts

    WINDOW TUMBLING (SIZE 5 MINUTE) GROUP BY card_number HAVING count(*) > 3; What exactly is Stream Processing? authorization_attempts possible_fraud
  3. 7 CREATE STREAM possible_fraud AS SELECT card_number, count(*) FROM authorization_attempts

    WINDOW TUMBLING (SIZE 5 MINUTE) GROUP BY card_number HAVING count(*) > 3; authorization_attempts possible_fraud What exactly is Stream Processing?
  4. 8 CREATE STREAM possible_fraud AS SELECT card_number, count(*) FROM authorization_attempts

    WINDOW TUMBLING (SIZE 5 MINUTE) GROUP BY card_number HAVING count(*) > 3; authorization_attempts possible_fraud What exactly is Stream Processing?
  5. 9 CREATE STREAM possible_fraud AS SELECT card_number, count(*) FROM authorization_attempts

    WINDOW TUMBLING (SIZE 5 MINUTE) GROUP BY card_number HAVING count(*) > 3; authorization_attempts possible_fraud What exactly is Stream Processing?
  6. 10 CREATE STREAM possible_fraud AS SELECT card_number, count(*) FROM authorization_attempts

    WINDOW TUMBLING (SIZE 5 MINUTE) GROUP BY card_number HAVING count(*) > 3; authorization_attempts possible_fraud What exactly is Stream Processing?
  7. 11 CREATE STREAM possible_fraud AS SELECT card_number, count(*) FROM authorization_attempts

    WINDOW TUMBLING (SIZE 5 MINUTE) GROUP BY card_number HAVING count(*) > 3; authorization_attempts possible_fraud What exactly is Stream Processing?
  8. 13 Looking more closely: What is a Streaming Platform? The

    Log Connectors Connectors Producer Consumer Streaming Engine
  9. 14 Looking more closely: Kafka’s Distributed Log The Log Connectors

    Connectors Producer Consumer Streaming Engine
  10. 15 Kafka’s Distributed Log: A durable messaging system Kafka is

    similar to a traditional messaging system (ActiveMQ, Rabbit,..) but with: • Better scalability • Fault Tolerance • Hight Availability • Better storage.
  11. 17 Consumers have a position all of their own Sally

    is here George is here Fred is here Old New Scan Scan Scan
  12. 20 Shard data to get scalability Messages are sent to

    different partitions Producer (1) Producer (2) Producer (3) Cluster of machine s Partitions live on different machines
  13. 23 Linearly Scalable Architecture Single topic: - Many producers machines

    - Many consumer machines - Many Broker machines No Bottleneck!! Consumers Producers KAFKA
  14. 24 Clusters can be connected to provide Worldwide, localized views

    24 NY London Tokyo Replicator Replicator Replicator
  15. 27 List of Kafka Connect sources and sinks (and more…)

    Amazon S3 Elasticsearch HDFS JDBC Couchbase Cassandra Oracle SAP Vertica Blockchain JMX Kenesis MongoDB MQTT NATS Postgres Rabbit Redis Twitter DynamoDB FTP Github BigQuery Google Pub Sub RethinkDB Salesforce Solr Splunk
  16. 28 The Kafka Streams API / KSQL The Log Connectors

    Connectors Producer Consumer Streaming Engine
  17. 29 SELECT card_number, count(*) FROM authorization_attempts WINDOW (SIZE 5 MINUTE)

    GROUP BY card_number HAVING count(*) > 3; Engine for Continuous Computation
  18. 30 But it’s just an API public static void main(String[]

    args) { StreamsBuilder builder = new StreamsBuilder(); builder.stream(”caterpillars") .map((k, v) -> coolTransformation(k, v)) .to(“butterflies”); new KafkaStreams(builder.build(), props()).start(); } 30
  19. 32 Windows / Retention – Handle Late Events The asynchronous

    dilemma: Who was first? The order or the payment? KAFKA Payments Orders Buffer 5 mins Emailer Join by Key
  20. 33 KAFKA Payments Orders Buffer 5 mins Emailer Join by

    Key KStream orders = builder.stream(“Orders”); KStream payments = builder.stream(“Payments”); orders.join(payments, KeyValue::new, JoinWindows.of(1 * MIN)) .peek((key, pair) -> emailer.sendMail(pair)); Windows / Retention – Handle Late Events
  21. 34 A KTable is just a stream with infinite retention

    KAFKA Emailer Orders, Payments Customers Join
  22. 35 KStream orders = builder.stream(“Orders”); KStream payments = builder.stream(“Payments”); KTable

    customers = builder.table(“Customers”); orders.join(payments, EmailTuple::new, JoinWindows.of(1*MIN)) .join(customers, (tuple, cust) -> tuple.setCust(cust)) .peek((key, tuple) -> emailer.sendMail(tuple)); KAFKA Emailer Orders, Payments Customers Join Materialize a table in two lines of code! A KTable is just a stream with infinite retention
  23. 42 Kafka works as a Backbone for Services to exchange

    Events 42 Kafka Notification Data is replicated
  24. 44