Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Budapest Data Forum: What is Apache Kafka, and What is a Streaming Platform?

Budapest Data Forum: What is Apache Kafka, and What is a Streaming Platform?

Robin Moffatt

June 14, 2018
Tweet

More Decks by Robin Moffatt

Other Decks in Technology

Transcript

  1. 1
    What is Apache Kafka,
    and What is a Streaming
    Platform?
    Budapest Data Forum, 14 Jun 2018
    Robin Moffatt
    @rmoff [email protected]
    https://speakerdeck.com/rmoff/

    View Slide

  2. @rmoff / What is Apache Kafka, and What is a Streaming Platform? / Budapest Data Forum, June 2018
    • Developer Advocate @ Confluent
    • Working in data & analytics since 2001
    • Oracle ACE Director & Dev Champion
    • Blogging : http://rmoff.net & http://cnfl.io/rmoff
    • Twitter: @rmoff
    • Geek stuff
    • Beer & Fried Breakfasts
    $ whoami
    https://speakerdeck.com/rmoff/

    View Slide


  3. @rmoff / What is Apache Kafka, and What is a Streaming Platform? / Budapest Data Forum, June 2018
    Apache Kafka is a
    Streaming Platform

    View Slide

  4. @rmoff / What is Apache Kafka, and What is a Streaming Platform? / Budapest Data Forum, June 2018

    View Slide

  5. @rmoff / What is Apache Kafka, and What is a Streaming Platform? / Budapest Data Forum, June 2018
    Three Lenses

    View Slide

  6. @rmoff / What is Apache Kafka, and What is a Streaming Platform? / Budapest Data Forum, June 2018
    01
    Messaging
    Done Right
    02
    Scalable Streaming 

    Data Pipelines
    03
    Foundation for 

    Stream Processing
    What is Apache Kafka?

    View Slide

  7. @rmoff / What is Apache Kafka, and What is a Streaming Platform? / Budapest Data Forum, June 2018
    Scalability True Storage Real-Time
    Processing
    Lens 1: Messaging Done Right

    View Slide

  8. @rmoff / What is Apache Kafka, and What is a Streaming Platform? / Budapest Data Forum, June 2018
    Lens 2: Scalable Streaming Data Pipelines

    View Slide

  9. @rmoff / What is Apache Kafka, and What is a Streaming Platform? / Budapest Data Forum, June 2018
    Lens 2: Scalable Streaming Data Pipelines

    View Slide

  10. @rmoff / What is Apache Kafka, and What is a Streaming Platform? / Budapest Data Forum, June 2018
    Lens 3: Foundation for Stream Processing
    KSQL
    is the
    Streaming
    SQL Engine
    for
    Apache Kafka

    View Slide

  11. @rmoff / What is Apache Kafka, and What is a Streaming Platform? / Budapest Data Forum, June 2018
    The Streaming Platform

    View Slide

  12. @rmoff / What is Apache Kafka, and What is a Streaming Platform? / Budapest Data Forum, June 2018
    The Streaming Platform
    Event-Driven
    Scalable
    Decoupled

    View Slide


  13. @rmoff / What is Apache Kafka, and What is a Streaming Platform? / Budapest Data Forum, June 2018
    Bold claim: all your data
    is event streams

    View Slide

  14. @rmoff / What is Apache Kafka, and What is a Streaming Platform? / Budapest Data Forum, June 2018
    A Customer
    Experience

    View Slide

  15. @rmoff / What is Apache Kafka, and What is a Streaming Platform? / Budapest Data Forum, June 2018
    A Sale

    View Slide

  16. @rmoff / What is Apache Kafka, and What is a Streaming Platform? / Budapest Data Forum, June 2018
    A Sensor
    Reading

    View Slide

  17. @rmoff / What is Apache Kafka, and What is a Streaming Platform? / Budapest Data Forum, June 2018
    An Application
    Log Entry

    View Slide

  18. @rmoff / What is Apache Kafka, and What is a Streaming Platform? / Budapest Data Forum, June 2018
    Databases

    View Slide

  19. @rmoff / What is Apache Kafka, and What is a Streaming Platform? / Budapest Data Forum, June 2018
    Do you think that’s a table
    you are querying?

    View Slide

  20. @rmoff / What is Apache Kafka, and What is a Streaming Platform? / Budapest Data Forum, June 2018
    Account ID Balance
    12345 €50
    The Stream-Table Duality

    View Slide

  21. @rmoff / What is Apache Kafka, and What is a Streaming Platform? / Budapest Data Forum, June 2018
    Account ID Balance
    12345 €50
    Account ID Amount
    12345 + €50
    Time
    The Stream-Table Duality

    View Slide

  22. @rmoff / What is Apache Kafka, and What is a Streaming Platform? / Budapest Data Forum, June 2018
    Account ID Balance
    12345 €50
    Account ID Amount
    12345 + €50
    12345 + €25
    Account ID Balance
    12345 €75
    Time
    The Stream-Table Duality

    View Slide

  23. @rmoff / What is Apache Kafka, and What is a Streaming Platform? / Budapest Data Forum, June 2018
    Account ID Balance
    12345 €50
    Account ID Amount
    12345 + €50
    12345 + €25
    12345 -€60
    Account ID Balance
    12345 €75
    Account ID Balance
    12345 €15
    Time
    The Stream-Table Duality

    View Slide

  24. @rmoff / What is Apache Kafka, and What is a Streaming Platform? / Budapest Data Forum, June 2018
    Account ID Balance
    12345 €50
    Account ID Amount
    12345 + €50
    12345 + €25
    12345 -€60
    Account ID Balance
    12345 €75
    Account ID Balance
    12345 €15
    Time
    Stream Table
    The Stream-Table Duality

    View Slide

  25. @rmoff / What is Apache Kafka, and What is a Streaming Platform? / Budapest Data Forum, June 2018
    The truth is the log.
    The database is a cache
    of a subset of the log.
    —Pat Helland
    Immutability Changes Everything
    http://cidrdb.org/cidr2015/Papers/CIDR15_Paper16.pdf
    Photo by Bobby Burch on Unsplash

    View Slide

  26. @rmoff / What is Apache Kafka, and What is a Streaming Platform? / Budapest Data Forum, June 2018
    Event-Driven architectures in action…

    View Slide

  27. @rmoff / What is Apache Kafka, and What is a Streaming Platform? / Budapest Data Forum, June 2018
    Event-Driven architectures in action…

    View Slide

  28. @rmoff / What is Apache Kafka, and What is a Streaming Platform? / Budapest Data Forum, June 2018
    Event-Driven architectures in action…

    View Slide

  29. @rmoff / What is Apache Kafka, and What is a Streaming Platform? / Budapest Data Forum, June 2018
    Event-Driven architectures in action…

    View Slide

  30. @rmoff / What is Apache Kafka, and What is a Streaming Platform? / Budapest Data Forum, June 2018
    Event-Driven architectures in action…

    View Slide

  31. @rmoff / What is Apache Kafka, and What is a Streaming Platform? / Budapest Data Forum, June 2018
    Event-Driven architectures in action…

    View Slide

  32. @rmoff / What is Apache Kafka, and What is a Streaming Platform? / Budapest Data Forum, June 2018
    Event-Driven architectures in action…

    View Slide


  33. @rmoff / What is Apache Kafka, and What is a Streaming Platform? / Budapest Data Forum, June 2018
    A Brief Look at
    Kafka's Technology

    View Slide

  34. @rmoff / What is Apache Kafka, and What is a Streaming Platform? / Budapest Data Forum, June 2018 22
    Apache Kafka
    Reads are a single seek & scan
    Writes are
    append only
    Kafka
    A Distributed Commit Log. Publish and subscribe to 

    streams of records. Highly scalable, high throughput. 

    Supports transactions. Persisted data. Stream processing.
    Producer & Consumer APIs
    Open-source client libraries for numerous
    languages, to directly integrate with your
    applications.

    View Slide

  35. @rmoff / What is Apache Kafka, and What is a Streaming Platform? / Budapest Data Forum, June 2018 23
    Apache Kafka
    Orders
    Table
    Customers
    Kafka Streams API
    Kafka Connect API
    Reliable and scalable integration of Kafka
    with other systems – no coding required.
    Kafka Streams API
    Write standard Java applications & microservices

    to process your data in real-time

    View Slide

  36. @rmoff / What is Apache Kafka, and What is a Streaming Platform? / Budapest Data Forum, June 2018
    KSQL
    is the
    Streaming
    SQL Engine
    for
    Apache Kafka

    View Slide

  37. @rmoff / What is Apache Kafka, and What is a Streaming Platform? / Budapest Data Forum, June 2018
    KSQL for Real-Time Monitoring
    25
    • Log data monitoring, tracking and alerting
    • syslog data
    • Sensor / IoT data
    CREATE STREAM SYSLOG_INVALID_USERS AS
    SELECT HOST, MESSAGE
    FROM SYSLOG
    WHERE MESSAGE LIKE '%Invalid user%';
    http://cnfl.io/syslogs-filtering / http://cnfl.io/syslog-alerting

    View Slide

  38. @rmoff / What is Apache Kafka, and What is a Streaming Platform? / Budapest Data Forum, June 2018
    KSQL for Streaming ETL
    26
    CREATE STREAM platinum_customer_ratings AS 

    SELECT r.message, r.rating,
    c.customer_name, c.level
    FROM ratings r
    LEFT JOIN customers c
    ON r.userid = c.id 

    WHERE c.level = 'Platinum';
    Joining, filtering, and aggregating streams of event data

    View Slide

  39. @rmoff / What is Apache Kafka, and What is a Streaming Platform? / Budapest Data Forum, June 2018 27
    MySQL Debezium
    Kafka Connect
    Producer API
    Elasticsearch
    Kafka Connect
    Streaming ETL with Apache Kafka and KSQL

    View Slide


  40. @rmoff / What is Apache Kafka, and What is a Streaming Platform? / Budapest Data Forum, June 2018
    What Problems does
    Kafka Solve?

    View Slide

  41. @rmoff / What is Apache Kafka, and What is a Streaming Platform? / Budapest Data Forum, June 2018
    Event-Centric Thinking
    Streaming
    Platform
    “A product was viewed”
    Hadoop
    Web
    app

    View Slide

  42. @rmoff / What is Apache Kafka, and What is a Streaming Platform? / Budapest Data Forum, June 2018
    Event-Centric Thinking
    Streaming
    Platform
    “A product was viewed”
    Hadoop
    Web
    app
    mobile
    app
    APIs

    View Slide

  43. @rmoff / What is Apache Kafka, and What is a Streaming Platform? / Budapest Data Forum, June 2018
    mobile
    app
    web
    app
    APIs
    Streaming
    Platform
    Hadoop
    Security
    Monitoring
    Rec
    engine
    “A product was viewed”
    Event-Centric Thinking

    View Slide

  44. @rmoff / What is Apache Kafka, and What is a Streaming Platform? / Budapest Data Forum, June 2018
    Producer Consumer
    System Availability and Event Buffering

    View Slide

  45. @rmoff / What is Apache Kafka, and What is a Streaming Platform? / Budapest Data Forum, June 2018
    Producer Consumer
    System Availability and Event Buffering

    View Slide

  46. @rmoff / What is Apache Kafka, and What is a Streaming Platform? / Budapest Data Forum, June 2018
    Consumer A
    Producer
    24hr batch
    extract
    Varying Latency Requirements / Batch vs Stream

    View Slide

  47. @rmoff / What is Apache Kafka, and What is a Streaming Platform? / Budapest Data Forum, June 2018
    Producer
    24hr batch
    extract
    Consumer A
    Consumer B
    Varying Latency Requirements / Batch vs Stream

    View Slide

  48. @rmoff / What is Apache Kafka, and What is a Streaming Platform? / Budapest Data Forum, June 2018
    Producer
    24hr batch
    extract
    Consumer A
    Consumer B
    Varying Latency Requirements / Batch vs Stream

    View Slide

  49. @rmoff / What is Apache Kafka, and What is a Streaming Platform? / Budapest Data Forum, June 2018
    Producer
    24hr batch extract
    Realtime
    Consumer A
    Consumer B
    Varying Latency Requirements / Batch vs Stream

    View Slide

  50. @rmoff / What is Apache Kafka, and What is a Streaming Platform? / Budapest Data Forum, June 2018
    Producer
    24hr batch extract
    Realtime
    Consumer A
    Consumer B
    Varying Latency Requirements / Batch vs Stream

    View Slide

  51. @rmoff / What is Apache Kafka, and What is a Streaming Platform? / Budapest Data Forum, June 2018
    Producer Consumer A
    24hr batch extract
    Realtime
    Realtime
    Consumer B
    Varying Latency Requirements / Batch vs Stream

    View Slide

  52. @rmoff / What is Apache Kafka, and What is a Streaming Platform? / Budapest Data Forum, June 2018
    Technology & Code/Algo Version Changes
    Producer
    Consumer
    (v1)

    View Slide

  53. @rmoff / What is Apache Kafka, and What is a Streaming Platform? / Budapest Data Forum, June 2018
    Technology & Code/Algo Version Changes
    Producer
    Consumer
    (v1)
    Consumer
    (V2)

    View Slide

  54. @rmoff / What is Apache Kafka, and What is a Streaming Platform? / Budapest Data Forum, June 2018
    Technology & Code/Algo Version Changes
    Producer
    Consumer
    (V2)

    View Slide


  55. @rmoff / What is Apache Kafka, and What is a Streaming Platform? / Budapest Data Forum, June 2018
    Architectural Patterns
    with Apache Kafka

    View Slide

  56. @rmoff / What is Apache Kafka, and What is a Streaming Platform? / Budapest Data Forum, June 2018
    Photo by Christopher Burns on Unsplash
    Building for the
    Future

    View Slide

  57. @rmoff / What is Apache Kafka, and What is a Streaming Platform? / Budapest Data Forum, June 2018 45
    Tightly-coupled =
    Inflexible

    View Slide

  58. @rmoff / What is Apache Kafka, and What is a Streaming Platform? / Budapest Data Forum, June 2018 46
    Database offload→Hadoop/Object Storage/Cloud DW for Analytics
    HDFS / S3 /
    BigQuery etc
    RDBMS
    CDC

    View Slide

  59. @rmoff / What is Apache Kafka, and What is a Streaming Platform? / Budapest Data Forum, June 2018 47
    Streaming ETL with Apache Kafka and KSQL
    order items
    customer
    customer orders
    Stream
    Processing
    RDBMS CDC

    View Slide

  60. @rmoff / What is Apache Kafka, and What is a Streaming Platform? / Budapest Data Forum, June 2018 48
    Real-time Event Stream Enrichment with Apache Kafka and KSQL
    order events
    customer
    Stream
    Processing
    customer orders
    RDBMS

    CDC

    View Slide

  61. @rmoff / What is Apache Kafka, and What is a Streaming Platform? / Budapest Data Forum, June 2018 49
    Transform Once, Use Many
    order events
    customer
    Stream
    Processing
    customer orders
    RDBMS

    New App

    CDC

    View Slide

  62. @rmoff / What is Apache Kafka, and What is a Streaming Platform? / Budapest Data Forum, June 2018 50
    Transform Once, Use Many
    order events
    customer
    Stream
    Processing
    customer orders
    RDBMS

    HDFS / S3 / etc
    New App

    CDC

    View Slide

  63. @rmoff / What is Apache Kafka, and What is a Streaming Platform? / Budapest Data Forum, June 2018 51
    Drive new realtime applications using data from existing systems
    Existing
    App
    New
    App
    New
    App
    New
    App
    New
    App

    View Slide

  64. @rmoff / What is Apache Kafka, and What is a Streaming Platform? / Budapest Data Forum, June 2018 52
    Evolve processing from old systems to new
    RDBMS
    Existing
    App
    CDC

    View Slide

  65. @rmoff / What is Apache Kafka, and What is a Streaming Platform? / Budapest Data Forum, June 2018 52
    Evolve processing from old systems to new
    Stream
    Processing
    RDBMS
    Existing
    App
    CDC
    New App

    View Slide

  66. @rmoff / What is Apache Kafka, and What is a Streaming Platform? / Budapest Data Forum, June 2018 53
    Evolve processing from old systems to new
    Stream
    Processing
    RDBMS
    Existing
    App
    New App

    New App

    CDC

    View Slide

  67. @rmoff / What is Apache Kafka, and What is a Streaming Platform? / Budapest Data Forum, June 2018
    Confluent Open Source :
    Apache Kafka with a bunch of cool stuff! For free!
    Database Changes Log Events loT Data Web Events …
    CRM
    Data Warehouse
    Database
    Hadoop
    Data

    Integration

    Monitoring
    Analytics
    Custom Apps
    Transformations
    Real-time Applications

    Apache Open Source Confluent Open Source Confluent Enterprise
    Confluent Platform
    Confluent Platform
    Apache Kafka®
    Core | Connect API | Streams API
    Data Compatibility
    Schema Registry
    Monitoring & Administration
    Confluent Control Center | Security
    Operations
    Replicator | Auto Data Balancing
    Development and Connectivity
    Clients | Connectors | REST Proxy | CLI
    Apache Open Source Confluent Open Source Confluent Enterprise
    SQL Stream Processing
    KSQL

    View Slide

  68. @rmoff / What is Apache Kafka, and What is a Streaming Platform? / Budapest Data Forum, June 2018
    Free Books!
    https://www.confluent.io/apache-kafka-stream-processing-book-bundle

    View Slide

  69. @rmoff / What is Apache Kafka, and What is a Streaming Platform? / Budapest Data Forum, June 2018
    @rmoff [email protected]
    https://slackpass.io/confluentcommunity
    https://www.confluent.io/download/

    View Slide

  70. @rmoff / What is Apache Kafka, and What is a Streaming Platform? / Budapest Data Forum, June 2018
    #EOF

    View Slide