$30 off During Our Annual Pro Sale. View Details »

Building an Streaming Platform with Kafka

Pere Urbón
November 21, 2018

Building an Streaming Platform with Kafka

The need to integrate a swarm of systems has always been present in the history of IT, however with the advent of microservices, big data and IoT this has simply exploded. Through the exploration of a few use cases, this presentation will introduce stream processing, a powerful and scalable way to transform and connect applications around your business.

We will explain in this talk how Apache Kafka and Confluent can be used to connect the diverse collection of applications the actual business face. Components such as KSQL where non developers can process stream events at scale or Kafka Stream oriented to build scalable applications to process event data.

Pere Urbón

November 21, 2018
Tweet

More Decks by Pere Urbón

Other Decks in Technology

Transcript

  1. 1
    Building an Streaming
    Platform with Kafka
    Pere Urbón-Bayes
    Technical Architect (TAM)
    [email protected]

    View Slide

  2. 2
    Topics
    • Set the stage.
    • Introducing the key concepts ( Kafka Broker, Connect and KStreams)
    • Using events for notifications and state transfer
    • Conclusion

    View Slide

  3. 3
    Kafka & Confluent

    View Slide

  4. 4
    Is Kafka a Streaming Platform?
    The Log Connectors
    Connectors
    Producer Consumer
    Streaming Engine

    View Slide

  5. 5
    authorization_attempts possible_fraud
    What exactly is Stream Processing?

    View Slide

  6. 6
    CREATE STREAM possible_fraud AS
    SELECT card_number, count(*)
    FROM authorization_attempts
    WINDOW TUMBLING (SIZE 5 MINUTE)
    GROUP BY card_number
    HAVING count(*) > 3;
    What exactly is Stream Processing?
    authorization_attempts possible_fraud

    View Slide

  7. 7
    CREATE STREAM possible_fraud AS
    SELECT card_number, count(*)
    FROM authorization_attempts
    WINDOW TUMBLING (SIZE 5 MINUTE)
    GROUP BY card_number
    HAVING count(*) > 3;
    authorization_attempts possible_fraud
    What exactly is Stream Processing?

    View Slide

  8. 8
    CREATE STREAM possible_fraud AS
    SELECT card_number, count(*)
    FROM authorization_attempts
    WINDOW TUMBLING (SIZE 5 MINUTE)
    GROUP BY card_number
    HAVING count(*) > 3;
    authorization_attempts possible_fraud
    What exactly is Stream Processing?

    View Slide

  9. 9
    CREATE STREAM possible_fraud AS
    SELECT card_number, count(*)
    FROM authorization_attempts
    WINDOW TUMBLING (SIZE 5 MINUTE)
    GROUP BY card_number
    HAVING count(*) > 3;
    authorization_attempts possible_fraud
    What exactly is Stream Processing?

    View Slide

  10. 10
    CREATE STREAM possible_fraud AS
    SELECT card_number, count(*)
    FROM authorization_attempts
    WINDOW TUMBLING (SIZE 5 MINUTE)
    GROUP BY card_number
    HAVING count(*) > 3;
    authorization_attempts possible_fraud
    What exactly is Stream Processing?

    View Slide

  11. 11
    CREATE STREAM possible_fraud AS
    SELECT card_number, count(*)
    FROM authorization_attempts
    WINDOW TUMBLING (SIZE 5 MINUTE)
    GROUP BY card_number
    HAVING count(*) > 3;
    authorization_attempts possible_fraud
    What exactly is Stream Processing?

    View Slide

  12. 12
    Streaming is the toolset for dealing with events as they move!

    View Slide

  13. 13
    Looking more closely: What is a Streaming Platform?
    The Log Connectors
    Connectors
    Producer Consumer
    Streaming Engine

    View Slide

  14. 14
    Looking more closely: Kafka’s Distributed Log
    The Log Connectors
    Connectors
    Producer Consumer
    Streaming Engine

    View Slide

  15. 15
    Kafka’s Distributed Log: A durable messaging system
    Kafka is similar to a traditional messaging system (ActiveMQ,
    Rabbit,..) but with:
    • Better scalability
    • Fault Tolerance
    • Hight Availability
    • Better storage.

    View Slide

  16. 16
    The log is a simple idea
    Messages are always
    appended at the end
    Old New

    View Slide

  17. 17
    Consumers have a position all of their own
    Sally
    is here
    George
    is here
    Fred
    is here
    Old New
    Scan Scan
    Scan

    View Slide

  18. 18
    Only Sequential Access
    Old New
    Read to offset & scan

    View Slide

  19. 19
    Scaling Out

    View Slide

  20. 20
    Shard data to get scalability
    Messages are sent to different
    partitions
    Producer (1) Producer (2) Producer (3)
    Cluster
    of
    machine
    s
    Partitions live on different machines

    View Slide

  21. 21
    Replicate to get fault tolerance
    replicate
    msg
    msg
    leader
    Machine A
    Machine B

    View Slide

  22. 22
    Replication provides resiliency
    A ‘replica’ takes over on machine failure

    View Slide

  23. 23
    Linearly Scalable Architecture
    Single topic:
    - Many producers machines
    - Many consumer machines
    - Many Broker machines
    No Bottleneck!!
    Consumers
    Producers
    KAFKA

    View Slide

  24. 24
    Clusters can be connected to provide Worldwide, localized views
    24
    NY
    London
    Tokyo
    Replicator Replicator
    Replicator

    View Slide

  25. 25
    The Connect API
    The Log Connectors
    Connectors
    Producer Consumer
    Streaming Engine

    View Slide

  26. 26
    Ingest / Egest into practically any data source
    Kafka
    Connect
    Kafka
    Connect
    Kafka

    View Slide

  27. 27
    List of Kafka Connect sources and sinks (and more…)
    Amazon S3
    Elasticsearch
    HDFS
    JDBC
    Couchbase
    Cassandra
    Oracle
    SAP
    Vertica
    Blockchain
    JMX
    Kenesis
    MongoDB
    MQTT
    NATS
    Postgres
    Rabbit
    Redis
    Twitter
    DynamoDB
    FTP
    Github
    BigQuery
    Google Pub Sub
    RethinkDB
    Salesforce
    Solr
    Splunk

    View Slide

  28. 28
    The Kafka Streams API / KSQL
    The Log Connectors
    Connectors
    Producer Consumer
    Streaming Engine

    View Slide

  29. 29
    SELECT card_number, count(*)
    FROM authorization_attempts
    WINDOW (SIZE 5 MINUTE)
    GROUP BY card_number
    HAVING count(*) > 3;
    Engine for Continuous Computation

    View Slide

  30. 30
    But it’s just an API
    public static void main(String[] args) {
    StreamsBuilder builder = new StreamsBuilder();
    builder.stream(”caterpillars")
    .map((k, v) -> coolTransformation(k, v))
    .to(“butterflies”);
    new KafkaStreams(builder.build(), props()).start();
    }
    30

    View Slide

  31. 31
    Compacted
    Topic
    Join
    Stream
    Table
    Kafka
    Kafka Streams / KSQL
    Topic
    Join Streams and Tables

    View Slide

  32. 32
    Windows / Retention – Handle Late Events
    The asynchronous dilemma: Who was first? The order or the payment?
    KAFKA
    Payments
    Orders
    Buffer 5 mins
    Emailer
    Join by Key

    View Slide

  33. 33
    KAFKA
    Payments
    Orders
    Buffer 5 mins
    Emailer
    Join by Key
    KStream orders = builder.stream(“Orders”);
    KStream payments = builder.stream(“Payments”);
    orders.join(payments, KeyValue::new, JoinWindows.of(1 * MIN))
    .peek((key, pair) -> emailer.sendMail(pair));
    Windows / Retention – Handle Late Events

    View Slide

  34. 34
    A KTable is just a stream with infinite retention
    KAFKA
    Emailer
    Orders, Payments
    Customers
    Join

    View Slide

  35. 35
    KStream orders = builder.stream(“Orders”);
    KStream payments = builder.stream(“Payments”);
    KTable customers = builder.table(“Customers”);
    orders.join(payments, EmailTuple::new, JoinWindows.of(1*MIN))
    .join(customers, (tuple, cust) -> tuple.setCust(cust))
    .peek((key, tuple) -> emailer.sendMail(tuple));
    KAFKA
    Emailer
    Orders, Payments
    Customers
    Join
    Materialize a
    table in two
    lines of code!
    A KTable is just a stream with infinite retention

    View Slide

  36. 36
    The Log Connectors
    Connectors
    Producer Consumer
    Streaming Engine
    Kafka is a complete Streaming Platform

    View Slide

  37. 37
    What happens when we apply this to Microservices?
    Microservices

    View Slide

  38. 38
    Microservices
    App
    Increasingly we build ecosystems: Microservices

    View Slide

  39. 39
    We break them into services that have specific roles
    Customer
    Service
    Shipping
    Service

    View Slide

  40. 40
    The Problem is now your DATA

    View Slide

  41. 41
    Most services share the same core facts.
    Orders
    Customers
    Catalog
    Most
    services live
    in here

    View Slide

  42. 42
    Kafka works as a Backbone for Services to exchange Events
    42
    Kafka
    Notification
    Data is
    replicated

    View Slide

  43. 43
    Services on a
    Streaming
    Platform

    View Slide

  44. 44

    View Slide

  45. 45
    Thank You!, questions?
    Pere Urbón-Bayes
    Technical Architect (TAM)
    [email protected]
    http://www.twitter.com/purbon

    View Slide