Embrace the Anarchy : Apache Kafka's Role in Modern Data Architectures

Embrace the Anarchy : Apache Kafka's Role in Modern Data Architectures

Building a flexible, scalable, real-time data architecture for the enterprise is no simple matter. Rarely does one single technology suit for all requirements, and frequently many different teams are involved which drives solutions with varying levels of [dis-]integration.
Apache Kafka is a streaming platform that acts as the 'data backbone' for the enterprise. By streaming events into Kafka as they occur, they can be used in any dependent system, in real time or batch. Search replicas, NoSQL stores, caches, graph databases - these all have their place in solving specific requirements, and all need to be fed with data! Kafka is the enabling platform that supports the real-time, high performance, scalable integration of data throughout the enterprise, whilst also providing the messaging capabilities to drive applications directly.
This talk will discuss the role and benefits of Kafka in an architecture, the Kafka ecosystem, and several design patterns used to address specific challenges that organisations face with managing their flows and availability of data.

2bded62396ea66c84bd10e91c718dea9?s=128

Robin Moffatt

August 01, 2018
Tweet

Transcript

  1. 1.

    @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data

    Architectures 1 Apache Kafka's Role in Modern Data Architectures Embrace the Anarchy : Robin Moffatt / Confluent Photo by Jaak Horn on Unsplash
  2. 2.

    @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data

    Architectures 2 • Developer Advocate @ Confluent • Working in data & analytics since 2001 • Oracle Developer Champion • Blogging : http://rmoff.net & http://cnfl.io/rmoff • Twitter: @rmoff • Geek stuff • Beer & Fried Breakfasts $ whoami https://speakerdeck.com/rmoff/
  3. 3.

    “ @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern

    Data Architectures Apache Kafka is a Streaming Platform
  4. 4.

    “ @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern

    Data Architectures Why do we need a streaming platform?
  5. 5.

    “ @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern

    Data Architectures One of the reasons: Decoupling
  6. 6.

    “ @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern

    Data Architectures A case in point…Analytics
  7. 7.

    @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data

    Architectures 7 Sales DWH Analytics—In the beginning…
  8. 8.

    @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data

    Architectures 8 Sales DWH Inventory And then there were more data sources…
  9. 9.

    @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data

    Architectures 9 Sales DWH Inventory Batch Transformations … (ETL / ELT)
  10. 10.

    @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data

    Architectures 10 Sales DWH Inventory Data Lake Add a Data Lake…
  11. 11.

    @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data

    Architectures 11 Sales Inventory Data Lake …or Replace the Data Warehouse
  12. 12.

    @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data

    Architectures 12 Sales Inventory Data Lake Still need to do Batch transformations…
  13. 13.

    @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data

    Architectures 13 Want your data anytime ? Batch is Latency built in by Design
  14. 14.

    @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data

    Architectures 14 Photo by Denys Nevozhai on Unsplash Microservices Mobile Machine 
 Learning Internet of 
 Things The World has Changed
  15. 15.

    @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data

    Architectures 15 Photo by Rosie Fraser on Unsplash Lots of new technologies (whether you like it or not)
  16. 16.

    @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data

    Architectures 16 App App App App search Hadoop DWH monitoring security MQ MQ cache cache
  17. 17.

    @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data

    Architectures 17 KAFKA DWH Hadoop App App App App App App App App request-response messaging OR stream processing streaming data pipelines changelogs
  18. 18.

    “ @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern

    Data Architectures Apache Kafka is a Streaming Platform
  19. 20.

    @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data

    Architectures 01 Messaging Done Right 02 Scalable Streaming 
 Data Pipelines 03 Foundation for 
 Stream Processing 20 What is Apache Kafka?
  20. 21.

    @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data

    Architectures Scalability True Storage Real-Time Processing 21 Lens 1: Messaging Done Right
  21. 22.

    @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data

    Architectures 22 Lens 2: Scalable Streaming Data Pipelines
  22. 23.

    @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data

    Architectures Lens 3: Foundation for Stream Processing KSQL is the Streaming SQL Engine for Apache Kafka 23
  23. 24.
  24. 25.

    @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data

    Architectures 25 The Streaming Platform Event-Driven Scalable Decoupled
  25. 26.

    “ @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern

    Data Architectures Bold claim: all your data is event streams
  26. 27.
  27. 30.

    @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data

    Architectures 30 An Application Log Entry
  28. 32.

    @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data

    Architectures 32 Do you think that’s a table you are querying?
  29. 33.

    @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data

    Architectures 33 The Table Stream Duality Account ID Balance 12345 €50 Account ID Amount 12345 + €50 12345 + €25 12345 -€60 Account ID Balance 12345 €75 Account ID Balance 12345 €15 Time Stream Table
  30. 34.

    @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data

    Architectures 34 The truth is the log. The database is a cache of a subset of the log. —Pat Helland Immutability Changes Everything http://cidrdb.org/cidr2015/Papers/CIDR15_Paper16.pdf Photo by Bobby Burch on Unsplash
  31. 35.

    “ @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern

    Data Architectures A Brief Look at Kafka's Technology
  32. 36.

    @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data

    Architectures 36 Apache Kafka Reads are a single seek & scan Writes are append only Kafka A Distributed Commit Log. Publish and subscribe to 
 streams of records. Highly scalable, high throughput. 
 Supports transactions. Persisted data. Stream processing. Producer & Consumer APIs Open-source client libraries for numerous languages, to directly integrate with your applications.
  33. 37.

    @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data

    Architectures 37 Apache Kafka Orders Table Customers Kafka Streams API Kafka Connect API Reliable and scalable integration of Kafka with other systems – no coding required. Kafka Streams API Write standard Java applications & microservices
 to process your data in real-time
  34. 38.

    @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data

    Architectures Declarative Stream Language Processing KSQL is a
  35. 39.

    @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data

    Architectures KSQL is the Streaming SQL Engine for Apache Kafka
  36. 40.

    @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data

    Architectures 40 KSQL in Development and Production Interactive KSQL
 for development and testing Headless KSQL
 for Production Desired KSQL queries have been identified REST “Hmm, let me try
 out this idea...”
  37. 41.

    @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data

    Architectures 41 • Log data monitoring, tracking and alerting • syslog data • Sensor / IoT data CREATE STREAM SYSLOG_INVALID_USERS AS SELECT HOST, MESSAGE FROM SYSLOG WHERE MESSAGE LIKE '%Invalid user%'; http://cnfl.io/syslogs-filtering / http://cnfl.io/syslog-alerting KSQL for Real-Time Monitoring
  38. 42.

    @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data

    Architectures 42 CREATE TABLE possible_fraud AS
 SELECT card_number, count(*)
 FROM authorization_attempts 
 WINDOW TUMBLING (SIZE 5 SECONDS)
 GROUP BY card_number
 HAVING count(*) > 3; Identifying patterns or anomalies in real-time data, surfaced in milliseconds KSQL for Anomaly Detection
  39. 43.

    @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data

    Architectures 43 CREATE STREAM vip_actions AS 
 SELECT userid, page, action FROM clickstream c LEFT JOIN users u ON c.userid = u.user_id 
 WHERE u.level = 'Platinum'; Joining, filtering, and aggregating streams of event data KSQL for Streaming ETL
  40. 44.

    “ @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern

    Data Architectures What Problems does Kafka Solve?
  41. 45.

    @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data

    Architectures 45 Streaming Platform “A product was viewed” Hadoop Web app Event-Centric Thinking
  42. 46.

    @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data

    Architectures 46 Event-Centric Thinking Streaming Platform “A product was viewed” Hadoop Web app mobile app APIs
  43. 47.

    @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data

    Architectures 47 Event-Centric Thinking mobile app web app APIs Streaming Platform Hadoop Security Monitoring Rec engine “A product was viewed”
  44. 48.

    @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data

    Architectures 48 Producer Consumer System Availability and Event Buffering
  45. 49.

    @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data

    Architectures 49 Producer Consumer System Availability and Event Buffering
  46. 50.

    @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data

    Architectures 50 Consumer A Producer 24hr batch extract Varying Latency Requirements / Batch vs Stream
  47. 51.

    @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data

    Architectures 51 Producer 24hr batch extract Consumer A Consumer B Varying Latency Requirements / Batch vs Stream
  48. 52.

    @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data

    Architectures 52 Producer 24hr batch extract Consumer A Consumer B Varying Latency Requirements / Batch vs Stream
  49. 53.

    @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data

    Architectures 53 Producer 24hr batch extract Realtime Consumer A Consumer B Varying Latency Requirements / Batch vs Stream
  50. 54.

    @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data

    Architectures 54 Producer 24hr batch extract Realtime Consumer A Consumer B Varying Latency Requirements / Batch vs Stream
  51. 55.

    @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data

    Architectures 55 Producer Consumer A 24hr batch extract Realtime Realtime Consumer B Varying Latency Requirements / Batch vs Stream
  52. 56.

    @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data

    Architectures 56 Technology & Code/Algo Version Changes Producer Consumer (v1)
  53. 57.

    @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data

    Architectures 57 Technology & Code/Algo Version Changes Producer Consumer (v1) Consumer (V2)
  54. 58.

    @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data

    Architectures 58 Technology & Code/Algo Version Changes Producer Consumer (V2)
  55. 59.

    “ @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern

    Data Architectures Architectural Patterns with Apache Kafka
  56. 60.

    @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data

    Architectures 60 Photo by Christopher Burns on Unsplash Building for the Future
  57. 61.

    @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data

    Architectures 61 Tightly-coupled = Inflexible
  58. 62.

    @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data

    Architectures 62 Analytics - Database Offload HDFS / S3 / BigQuery etc RDBMS CDC
  59. 63.

    @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data

    Architectures 63 Stream Processing with Apache Kafka and KSQL order events customer customer orders Stream Processing RDBMS CDC
  60. 64.

    @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data

    Architectures 64 Real-time Event Stream Enrichment order events customer Stream Processing customer orders RDBMS <y> CDC
  61. 65.

    @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data

    Architectures 65 Transform Once, Use Many order events customer Stream Processing customer orders RDBMS <y> New App <x> CDC
  62. 66.

    @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data

    Architectures 66 Transform Once, Use Many order events customer Stream Processing customer orders RDBMS <y> HDFS / S3 / etc New App <x> CDC
  63. 67.

    @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data

    Architectures 67 Evolve processing from old systems to new Stream Processing RDBMS Existing App CDC New App <x>
  64. 68.

    @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data

    Architectures 68 Evolve processing from old systems to new Stream Processing RDBMS Existing App New App <x> New App <y> CDC
  65. 69.

    @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data

    Architectures 69 Want your data anytime ? Batch is Latency built in by Design You say that like "latency" is a synonym for "evil"
  66. 70.

    @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data

    Architectures 70 It's all about the Events!
  67. 71.

    “ @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern

    Data Architectures So…Analytics and Kafka
  68. 72.

    @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data

    Architectures 72 The Vision! "One version of the truth"
  69. 74.

    @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data

    Architectures 74 Pragmatism is… "One version of the truth"
  70. 75.

    @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data

    Architectures 75 Streaming Platform Stream Processing "One version of the truth"
  71. 76.

    @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data

    Architectures 76 Streaming Platform M L App <y> NoSQL Search Graph Stream Processing "One version of the truth"
  72. 77.

    @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data

    Architectures Database Changes Log Events loT Data Web Events … CRM Data Warehouse Database Hadoop Data
 Integration … Monitoring Analytics Custom Apps Transformations Real-time Applications … Apache Open Source Confluent Open Source Confluent Enterprise Confluent Platform Confluent Platform Apache Kafka® Core | Connect API | Streams API Data Compatibility Schema Registry Monitoring & Administration Confluent Control Center | Security Operations Replicator | Auto Data Balancing Development and Connectivity Clients | Connectors | REST Proxy | CLI Apache Open Source Confluent Open Source Confluent Enterprise SQL Stream Processing KSQL 77 Confluent Open Source : Apache Kafka with a bunch of cool stuff! For free!
  73. 78.

    @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data

    Architectures 78 Free Books! https://www.confluent.io/apache-kafka-stream-processing-book-bundle
  74. 79.

    @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data

    Architectures 79 Confluent Streaming Event, Munich http://cnfl.io/streaming-event-munich
  75. 81.

    @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data

    Architectures 81 • CDC Spreadsheet • Blog: No More Silos: How to Integrate your Databases with Apache Kafka and CDC • #partner-engineering on Slack for questions • BD team (#partners / partners@confluent.io) can help with introductions on a given sales op Resources #EOF