Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Budapest Data Forum: What is Apache Kafka, and What is a Streaming Platform?

Budapest Data Forum: What is Apache Kafka, and What is a Streaming Platform?

Robin Moffatt

June 14, 2018
Tweet

More Decks by Robin Moffatt

Other Decks in Technology

Transcript

  1. 1 What is Apache Kafka, and What is a Streaming

    Platform? Budapest Data Forum, 14 Jun 2018 Robin Moffatt @rmoff [email protected] https://speakerdeck.com/rmoff/
  2. @rmoff / What is Apache Kafka, and What is a

    Streaming Platform? / Budapest Data Forum, June 2018 • Developer Advocate @ Confluent • Working in data & analytics since 2001 • Oracle ACE Director & Dev Champion • Blogging : http://rmoff.net & http://cnfl.io/rmoff • Twitter: @rmoff • Geek stuff • Beer & Fried Breakfasts $ whoami https://speakerdeck.com/rmoff/
  3. “ @rmoff / What is Apache Kafka, and What is

    a Streaming Platform? / Budapest Data Forum, June 2018 Apache Kafka is a Streaming Platform
  4. @rmoff / What is Apache Kafka, and What is a

    Streaming Platform? / Budapest Data Forum, June 2018
  5. @rmoff / What is Apache Kafka, and What is a

    Streaming Platform? / Budapest Data Forum, June 2018 Three Lenses
  6. @rmoff / What is Apache Kafka, and What is a

    Streaming Platform? / Budapest Data Forum, June 2018 01 Messaging Done Right 02 Scalable Streaming 
 Data Pipelines 03 Foundation for 
 Stream Processing What is Apache Kafka?
  7. @rmoff / What is Apache Kafka, and What is a

    Streaming Platform? / Budapest Data Forum, June 2018 Scalability True Storage Real-Time Processing Lens 1: Messaging Done Right
  8. @rmoff / What is Apache Kafka, and What is a

    Streaming Platform? / Budapest Data Forum, June 2018 Lens 2: Scalable Streaming Data Pipelines
  9. @rmoff / What is Apache Kafka, and What is a

    Streaming Platform? / Budapest Data Forum, June 2018 Lens 2: Scalable Streaming Data Pipelines
  10. @rmoff / What is Apache Kafka, and What is a

    Streaming Platform? / Budapest Data Forum, June 2018 Lens 3: Foundation for Stream Processing KSQL is the Streaming SQL Engine for Apache Kafka
  11. @rmoff / What is Apache Kafka, and What is a

    Streaming Platform? / Budapest Data Forum, June 2018 The Streaming Platform
  12. @rmoff / What is Apache Kafka, and What is a

    Streaming Platform? / Budapest Data Forum, June 2018 The Streaming Platform Event-Driven Scalable Decoupled
  13. “ @rmoff / What is Apache Kafka, and What is

    a Streaming Platform? / Budapest Data Forum, June 2018 Bold claim: all your data is event streams
  14. @rmoff / What is Apache Kafka, and What is a

    Streaming Platform? / Budapest Data Forum, June 2018 A Customer Experience
  15. @rmoff / What is Apache Kafka, and What is a

    Streaming Platform? / Budapest Data Forum, June 2018 A Sale
  16. @rmoff / What is Apache Kafka, and What is a

    Streaming Platform? / Budapest Data Forum, June 2018 A Sensor Reading
  17. @rmoff / What is Apache Kafka, and What is a

    Streaming Platform? / Budapest Data Forum, June 2018 An Application Log Entry
  18. @rmoff / What is Apache Kafka, and What is a

    Streaming Platform? / Budapest Data Forum, June 2018 Databases
  19. @rmoff / What is Apache Kafka, and What is a

    Streaming Platform? / Budapest Data Forum, June 2018 Do you think that’s a table you are querying?
  20. @rmoff / What is Apache Kafka, and What is a

    Streaming Platform? / Budapest Data Forum, June 2018 Account ID Balance 12345 €50 The Stream-Table Duality
  21. @rmoff / What is Apache Kafka, and What is a

    Streaming Platform? / Budapest Data Forum, June 2018 Account ID Balance 12345 €50 Account ID Amount 12345 + €50 Time The Stream-Table Duality
  22. @rmoff / What is Apache Kafka, and What is a

    Streaming Platform? / Budapest Data Forum, June 2018 Account ID Balance 12345 €50 Account ID Amount 12345 + €50 12345 + €25 Account ID Balance 12345 €75 Time The Stream-Table Duality
  23. @rmoff / What is Apache Kafka, and What is a

    Streaming Platform? / Budapest Data Forum, June 2018 Account ID Balance 12345 €50 Account ID Amount 12345 + €50 12345 + €25 12345 -€60 Account ID Balance 12345 €75 Account ID Balance 12345 €15 Time The Stream-Table Duality
  24. @rmoff / What is Apache Kafka, and What is a

    Streaming Platform? / Budapest Data Forum, June 2018 Account ID Balance 12345 €50 Account ID Amount 12345 + €50 12345 + €25 12345 -€60 Account ID Balance 12345 €75 Account ID Balance 12345 €15 Time Stream Table The Stream-Table Duality
  25. @rmoff / What is Apache Kafka, and What is a

    Streaming Platform? / Budapest Data Forum, June 2018 The truth is the log. The database is a cache of a subset of the log. —Pat Helland Immutability Changes Everything http://cidrdb.org/cidr2015/Papers/CIDR15_Paper16.pdf Photo by Bobby Burch on Unsplash
  26. @rmoff / What is Apache Kafka, and What is a

    Streaming Platform? / Budapest Data Forum, June 2018 Event-Driven architectures in action…
  27. @rmoff / What is Apache Kafka, and What is a

    Streaming Platform? / Budapest Data Forum, June 2018 Event-Driven architectures in action…
  28. @rmoff / What is Apache Kafka, and What is a

    Streaming Platform? / Budapest Data Forum, June 2018 Event-Driven architectures in action…
  29. @rmoff / What is Apache Kafka, and What is a

    Streaming Platform? / Budapest Data Forum, June 2018 Event-Driven architectures in action…
  30. @rmoff / What is Apache Kafka, and What is a

    Streaming Platform? / Budapest Data Forum, June 2018 Event-Driven architectures in action…
  31. @rmoff / What is Apache Kafka, and What is a

    Streaming Platform? / Budapest Data Forum, June 2018 Event-Driven architectures in action…
  32. @rmoff / What is Apache Kafka, and What is a

    Streaming Platform? / Budapest Data Forum, June 2018 Event-Driven architectures in action…
  33. “ @rmoff / What is Apache Kafka, and What is

    a Streaming Platform? / Budapest Data Forum, June 2018 A Brief Look at Kafka's Technology
  34. @rmoff / What is Apache Kafka, and What is a

    Streaming Platform? / Budapest Data Forum, June 2018 22 Apache Kafka Reads are a single seek & scan Writes are append only Kafka A Distributed Commit Log. Publish and subscribe to 
 streams of records. Highly scalable, high throughput. 
 Supports transactions. Persisted data. Stream processing. Producer & Consumer APIs Open-source client libraries for numerous languages, to directly integrate with your applications.
  35. @rmoff / What is Apache Kafka, and What is a

    Streaming Platform? / Budapest Data Forum, June 2018 23 Apache Kafka Orders Table Customers Kafka Streams API Kafka Connect API Reliable and scalable integration of Kafka with other systems – no coding required. Kafka Streams API Write standard Java applications & microservices
 to process your data in real-time
  36. @rmoff / What is Apache Kafka, and What is a

    Streaming Platform? / Budapest Data Forum, June 2018 KSQL is the Streaming SQL Engine for Apache Kafka
  37. @rmoff / What is Apache Kafka, and What is a

    Streaming Platform? / Budapest Data Forum, June 2018 KSQL for Real-Time Monitoring 25 • Log data monitoring, tracking and alerting • syslog data • Sensor / IoT data CREATE STREAM SYSLOG_INVALID_USERS AS SELECT HOST, MESSAGE FROM SYSLOG WHERE MESSAGE LIKE '%Invalid user%'; http://cnfl.io/syslogs-filtering / http://cnfl.io/syslog-alerting
  38. @rmoff / What is Apache Kafka, and What is a

    Streaming Platform? / Budapest Data Forum, June 2018 KSQL for Streaming ETL 26 CREATE STREAM platinum_customer_ratings AS 
 SELECT r.message, r.rating, c.customer_name, c.level FROM ratings r LEFT JOIN customers c ON r.userid = c.id 
 WHERE c.level = 'Platinum'; Joining, filtering, and aggregating streams of event data
  39. @rmoff / What is Apache Kafka, and What is a

    Streaming Platform? / Budapest Data Forum, June 2018 27 MySQL Debezium Kafka Connect Producer API Elasticsearch Kafka Connect Streaming ETL with Apache Kafka and KSQL
  40. “ @rmoff / What is Apache Kafka, and What is

    a Streaming Platform? / Budapest Data Forum, June 2018 What Problems does Kafka Solve?
  41. @rmoff / What is Apache Kafka, and What is a

    Streaming Platform? / Budapest Data Forum, June 2018 Event-Centric Thinking Streaming Platform “A product was viewed” Hadoop Web app
  42. @rmoff / What is Apache Kafka, and What is a

    Streaming Platform? / Budapest Data Forum, June 2018 Event-Centric Thinking Streaming Platform “A product was viewed” Hadoop Web app mobile app APIs
  43. @rmoff / What is Apache Kafka, and What is a

    Streaming Platform? / Budapest Data Forum, June 2018 mobile app web app APIs Streaming Platform Hadoop Security Monitoring Rec engine “A product was viewed” Event-Centric Thinking
  44. @rmoff / What is Apache Kafka, and What is a

    Streaming Platform? / Budapest Data Forum, June 2018 Producer Consumer System Availability and Event Buffering
  45. @rmoff / What is Apache Kafka, and What is a

    Streaming Platform? / Budapest Data Forum, June 2018 Producer Consumer System Availability and Event Buffering
  46. @rmoff / What is Apache Kafka, and What is a

    Streaming Platform? / Budapest Data Forum, June 2018 Consumer A Producer 24hr batch extract Varying Latency Requirements / Batch vs Stream
  47. @rmoff / What is Apache Kafka, and What is a

    Streaming Platform? / Budapest Data Forum, June 2018 Producer 24hr batch extract Consumer A Consumer B Varying Latency Requirements / Batch vs Stream
  48. @rmoff / What is Apache Kafka, and What is a

    Streaming Platform? / Budapest Data Forum, June 2018 Producer 24hr batch extract Consumer A Consumer B Varying Latency Requirements / Batch vs Stream
  49. @rmoff / What is Apache Kafka, and What is a

    Streaming Platform? / Budapest Data Forum, June 2018 Producer 24hr batch extract Realtime Consumer A Consumer B Varying Latency Requirements / Batch vs Stream
  50. @rmoff / What is Apache Kafka, and What is a

    Streaming Platform? / Budapest Data Forum, June 2018 Producer 24hr batch extract Realtime Consumer A Consumer B Varying Latency Requirements / Batch vs Stream
  51. @rmoff / What is Apache Kafka, and What is a

    Streaming Platform? / Budapest Data Forum, June 2018 Producer Consumer A 24hr batch extract Realtime Realtime Consumer B Varying Latency Requirements / Batch vs Stream
  52. @rmoff / What is Apache Kafka, and What is a

    Streaming Platform? / Budapest Data Forum, June 2018 Technology & Code/Algo Version Changes Producer Consumer (v1)
  53. @rmoff / What is Apache Kafka, and What is a

    Streaming Platform? / Budapest Data Forum, June 2018 Technology & Code/Algo Version Changes Producer Consumer (v1) Consumer (V2)
  54. @rmoff / What is Apache Kafka, and What is a

    Streaming Platform? / Budapest Data Forum, June 2018 Technology & Code/Algo Version Changes Producer Consumer (V2)
  55. “ @rmoff / What is Apache Kafka, and What is

    a Streaming Platform? / Budapest Data Forum, June 2018 Architectural Patterns with Apache Kafka
  56. @rmoff / What is Apache Kafka, and What is a

    Streaming Platform? / Budapest Data Forum, June 2018 Photo by Christopher Burns on Unsplash Building for the Future
  57. @rmoff / What is Apache Kafka, and What is a

    Streaming Platform? / Budapest Data Forum, June 2018 45 Tightly-coupled = Inflexible
  58. @rmoff / What is Apache Kafka, and What is a

    Streaming Platform? / Budapest Data Forum, June 2018 46 Database offload→Hadoop/Object Storage/Cloud DW for Analytics HDFS / S3 / BigQuery etc RDBMS CDC
  59. @rmoff / What is Apache Kafka, and What is a

    Streaming Platform? / Budapest Data Forum, June 2018 47 Streaming ETL with Apache Kafka and KSQL order items customer customer orders Stream Processing RDBMS CDC
  60. @rmoff / What is Apache Kafka, and What is a

    Streaming Platform? / Budapest Data Forum, June 2018 48 Real-time Event Stream Enrichment with Apache Kafka and KSQL order events customer Stream Processing customer orders RDBMS <y> CDC
  61. @rmoff / What is Apache Kafka, and What is a

    Streaming Platform? / Budapest Data Forum, June 2018 49 Transform Once, Use Many order events customer Stream Processing customer orders RDBMS <y> New App <x> CDC
  62. @rmoff / What is Apache Kafka, and What is a

    Streaming Platform? / Budapest Data Forum, June 2018 50 Transform Once, Use Many order events customer Stream Processing customer orders RDBMS <y> HDFS / S3 / etc New App <x> CDC
  63. @rmoff / What is Apache Kafka, and What is a

    Streaming Platform? / Budapest Data Forum, June 2018 51 Drive new realtime applications using data from existing systems Existing App New App New App New App New App
  64. @rmoff / What is Apache Kafka, and What is a

    Streaming Platform? / Budapest Data Forum, June 2018 52 Evolve processing from old systems to new RDBMS Existing App CDC
  65. @rmoff / What is Apache Kafka, and What is a

    Streaming Platform? / Budapest Data Forum, June 2018 52 Evolve processing from old systems to new Stream Processing RDBMS Existing App CDC New App <x>
  66. @rmoff / What is Apache Kafka, and What is a

    Streaming Platform? / Budapest Data Forum, June 2018 53 Evolve processing from old systems to new Stream Processing RDBMS Existing App New App <x> New App <y> CDC
  67. @rmoff / What is Apache Kafka, and What is a

    Streaming Platform? / Budapest Data Forum, June 2018 Confluent Open Source : Apache Kafka with a bunch of cool stuff! For free! Database Changes Log Events loT Data Web Events … CRM Data Warehouse Database Hadoop Data
 Integration … Monitoring Analytics Custom Apps Transformations Real-time Applications … Apache Open Source Confluent Open Source Confluent Enterprise Confluent Platform Confluent Platform Apache Kafka® Core | Connect API | Streams API Data Compatibility Schema Registry Monitoring & Administration Confluent Control Center | Security Operations Replicator | Auto Data Balancing Development and Connectivity Clients | Connectors | REST Proxy | CLI Apache Open Source Confluent Open Source Confluent Enterprise SQL Stream Processing KSQL
  68. @rmoff / What is Apache Kafka, and What is a

    Streaming Platform? / Budapest Data Forum, June 2018 Free Books! https://www.confluent.io/apache-kafka-stream-processing-book-bundle
  69. @rmoff / What is Apache Kafka, and What is a

    Streaming Platform? / Budapest Data Forum, June 2018 @rmoff [email protected] https://slackpass.io/confluentcommunity https://www.confluent.io/download/
  70. @rmoff / What is Apache Kafka, and What is a

    Streaming Platform? / Budapest Data Forum, June 2018 #EOF