$30 off During Our Annual Pro Sale. View Details »

Embrace the Anarchy : Apache Kafka's Role in Modern Data Architectures

Embrace the Anarchy : Apache Kafka's Role in Modern Data Architectures

Building a flexible, scalable, real-time data architecture for the enterprise is no simple matter. Rarely does one single technology suit for all requirements, and frequently many different teams are involved which drives solutions with varying levels of [dis-]integration.
Apache Kafka is a streaming platform that acts as the 'data backbone' for the enterprise. By streaming events into Kafka as they occur, they can be used in any dependent system, in real time or batch. Search replicas, NoSQL stores, caches, graph databases - these all have their place in solving specific requirements, and all need to be fed with data! Kafka is the enabling platform that supports the real-time, high performance, scalable integration of data throughout the enterprise, whilst also providing the messaging capabilities to drive applications directly.
This talk will discuss the role and benefits of Kafka in an architecture, the Kafka ecosystem, and several design patterns used to address specific challenges that organisations face with managing their flows and availability of data.

Robin Moffatt

August 01, 2018
Tweet

More Decks by Robin Moffatt

Other Decks in Technology

Transcript

  1. @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures 1
    Apache Kafka's Role in
    Modern Data Architectures
    Embrace the Anarchy :
    Robin Moffatt / Confluent
    Photo by Jaak Horn on Unsplash

    View Slide

  2. @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures 2
    • Developer Advocate @ Confluent
    • Working in data & analytics since 2001
    • Oracle Developer Champion
    • Blogging : http://rmoff.net & http://cnfl.io/rmoff
    • Twitter: @rmoff
    • Geek stuff
    • Beer & Fried Breakfasts
    $ whoami
    https://speakerdeck.com/rmoff/

    View Slide


  3. @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures
    Apache Kafka is a
    Streaming Platform

    View Slide


  4. @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures
    Why do we need a
    streaming platform?

    View Slide


  5. @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures
    One of the reasons:
    Decoupling

    View Slide


  6. @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures
    A case in point…Analytics

    View Slide

  7. @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures 7
    Sales DWH
    Analytics—In the beginning…

    View Slide

  8. @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures 8
    Sales DWH
    Inventory
    And then there were more data sources…

    View Slide

  9. @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures 9
    Sales DWH
    Inventory
    Batch Transformations … (ETL / ELT)

    View Slide

  10. @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures 10
    Sales DWH
    Inventory Data Lake
    Add a Data Lake…

    View Slide

  11. @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures 11
    Sales
    Inventory Data Lake
    …or Replace the Data Warehouse

    View Slide

  12. @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures 12
    Sales
    Inventory Data Lake
    Still need to do Batch transformations…

    View Slide

  13. @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures 13
    Want your data anytime ?
    Batch is Latency built in by Design

    View Slide

  14. @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures 14
    Photo by Denys Nevozhai on Unsplash
    Microservices Mobile Machine 

    Learning
    Internet of 

    Things
    The World has Changed

    View Slide

  15. @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures 15
    Photo by Rosie Fraser on Unsplash
    Lots of new technologies
    (whether you like it or not)

    View Slide

  16. @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures 16
    App App App App
    search
    Hadoop
    DWH
    monitoring security
    MQ MQ
    cache
    cache

    View Slide

  17. @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures 17
    KAFKA
    DWH Hadoop
    App
    App App App App
    App
    App
    App
    request-response
    messaging
    OR
    stream
    processing
    streaming data pipelines
    changelogs

    View Slide


  18. @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures
    Apache Kafka is a
    Streaming Platform

    View Slide

  19. @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures
    Three Lenses
    19

    View Slide

  20. @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures
    01
    Messaging
    Done Right
    02
    Scalable Streaming 

    Data Pipelines
    03
    Foundation for 

    Stream Processing
    20
    What is Apache Kafka?

    View Slide

  21. @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures
    Scalability True Storage Real-Time
    Processing
    21
    Lens 1: Messaging Done Right

    View Slide

  22. @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures 22
    Lens 2: Scalable Streaming Data Pipelines

    View Slide

  23. @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures
    Lens 3: Foundation for Stream Processing
    KSQL
    is the
    Streaming
    SQL Engine
    for
    Apache Kafka
    23

    View Slide

  24. @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures 24
    The Streaming Platform

    View Slide

  25. @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures 25
    The Streaming Platform
    Event-Driven
    Scalable
    Decoupled

    View Slide


  26. @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures
    Bold claim: all your data
    is event streams

    View Slide

  27. @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures 27
    A Customer
    Experience

    View Slide

  28. @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures 28
    A Sale

    View Slide

  29. @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures 29
    A Sensor
    Reading

    View Slide

  30. @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures 30
    An Application
    Log Entry

    View Slide

  31. @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures 31
    Databases

    View Slide

  32. @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures 32
    Do you think that’s a table
    you are querying?

    View Slide

  33. @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures 33
    The Table Stream Duality
    Account ID Balance
    12345 €50
    Account ID Amount
    12345 + €50
    12345 + €25
    12345 -€60
    Account ID Balance
    12345 €75
    Account ID Balance
    12345 €15
    Time
    Stream Table

    View Slide

  34. @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures 34
    The truth is the log.
    The database is a cache
    of a subset of the log.
    —Pat Helland
    Immutability Changes Everything
    http://cidrdb.org/cidr2015/Papers/CIDR15_Paper16.pdf
    Photo by Bobby Burch on Unsplash

    View Slide


  35. @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures
    A Brief Look at
    Kafka's Technology

    View Slide

  36. @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures 36
    Apache Kafka
    Reads are a single seek & scan
    Writes are
    append only
    Kafka
    A Distributed Commit Log. Publish and subscribe to 

    streams of records. Highly scalable, high throughput. 

    Supports transactions. Persisted data. Stream processing.
    Producer & Consumer APIs
    Open-source client libraries for numerous
    languages, to directly integrate with your
    applications.

    View Slide

  37. @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures 37
    Apache Kafka
    Orders
    Table
    Customers
    Kafka Streams API
    Kafka Connect API
    Reliable and scalable integration of Kafka
    with other systems – no coding required.
    Kafka Streams API
    Write standard Java applications & microservices

    to process your data in real-time

    View Slide

  38. @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures
    Declarative
    Stream
    Language
    Processing
    KSQL
    is a

    View Slide

  39. @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures
    KSQL
    is the
    Streaming
    SQL Engine
    for
    Apache Kafka

    View Slide

  40. @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures 40
    KSQL in Development and Production
    Interactive KSQL

    for development and testing
    Headless KSQL

    for Production
    Desired KSQL queries
    have been identified
    REST
    “Hmm, let me try

    out this idea...”

    View Slide

  41. @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures 41
    • Log data monitoring, tracking and alerting
    • syslog data
    • Sensor / IoT data
    CREATE STREAM SYSLOG_INVALID_USERS AS
    SELECT HOST, MESSAGE
    FROM SYSLOG
    WHERE MESSAGE LIKE '%Invalid user%';
    http://cnfl.io/syslogs-filtering / http://cnfl.io/syslog-alerting
    KSQL for Real-Time Monitoring

    View Slide

  42. @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures 42
    CREATE TABLE possible_fraud AS

    SELECT card_number, count(*)

    FROM authorization_attempts 

    WINDOW TUMBLING (SIZE 5 SECONDS)

    GROUP BY card_number

    HAVING count(*) > 3;
    Identifying patterns or anomalies in real-time data,
    surfaced in milliseconds
    KSQL for Anomaly Detection

    View Slide

  43. @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures 43
    CREATE STREAM vip_actions AS 

    SELECT userid, page, action
    FROM clickstream c
    LEFT JOIN users u
    ON c.userid = u.user_id 

    WHERE u.level = 'Platinum';
    Joining, filtering, and aggregating streams of event data
    KSQL for Streaming ETL

    View Slide


  44. @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures
    What Problems does
    Kafka Solve?

    View Slide

  45. @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures 45
    Streaming
    Platform
    “A product was viewed”
    Hadoop
    Web
    app
    Event-Centric Thinking

    View Slide

  46. @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures 46
    Event-Centric Thinking
    Streaming
    Platform
    “A product was viewed”
    Hadoop
    Web
    app
    mobile
    app
    APIs

    View Slide

  47. @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures 47
    Event-Centric Thinking
    mobile
    app
    web
    app
    APIs
    Streaming
    Platform
    Hadoop
    Security
    Monitoring
    Rec
    engine
    “A product was viewed”

    View Slide

  48. @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures 48
    Producer Consumer
    System Availability and Event Buffering

    View Slide

  49. @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures 49
    Producer Consumer
    System Availability and Event Buffering

    View Slide

  50. @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures 50
    Consumer A
    Producer
    24hr batch
    extract
    Varying Latency Requirements / Batch vs Stream

    View Slide

  51. @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures 51
    Producer
    24hr batch
    extract
    Consumer A
    Consumer B
    Varying Latency Requirements / Batch vs Stream

    View Slide

  52. @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures 52
    Producer
    24hr batch
    extract
    Consumer A
    Consumer B
    Varying Latency Requirements / Batch vs Stream

    View Slide

  53. @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures 53
    Producer
    24hr batch extract
    Realtime
    Consumer A
    Consumer B
    Varying Latency Requirements / Batch vs Stream

    View Slide

  54. @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures 54
    Producer
    24hr batch extract
    Realtime
    Consumer A
    Consumer B
    Varying Latency Requirements / Batch vs Stream

    View Slide

  55. @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures 55
    Producer Consumer A
    24hr batch extract
    Realtime
    Realtime
    Consumer B
    Varying Latency Requirements / Batch vs Stream

    View Slide

  56. @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures 56
    Technology & Code/Algo Version Changes
    Producer
    Consumer
    (v1)

    View Slide

  57. @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures 57
    Technology & Code/Algo Version Changes
    Producer
    Consumer
    (v1)
    Consumer
    (V2)

    View Slide

  58. @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures 58
    Technology & Code/Algo Version Changes
    Producer
    Consumer
    (V2)

    View Slide


  59. @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures
    Architectural Patterns
    with Apache Kafka

    View Slide

  60. @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures 60
    Photo by Christopher Burns on Unsplash
    Building for the
    Future

    View Slide

  61. @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures 61
    Tightly-coupled =
    Inflexible

    View Slide

  62. @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures 62
    Analytics - Database Offload
    HDFS / S3 /
    BigQuery etc
    RDBMS
    CDC

    View Slide

  63. @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures 63
    Stream Processing with Apache Kafka and KSQL
    order events
    customer
    customer orders
    Stream
    Processing
    RDBMS CDC

    View Slide

  64. @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures 64
    Real-time Event Stream Enrichment
    order events
    customer
    Stream
    Processing
    customer orders
    RDBMS

    CDC

    View Slide

  65. @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures 65
    Transform Once, Use Many
    order events
    customer
    Stream
    Processing
    customer orders
    RDBMS

    New App

    CDC

    View Slide

  66. @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures 66
    Transform Once, Use Many
    order events
    customer
    Stream
    Processing
    customer orders
    RDBMS

    HDFS / S3 / etc
    New App

    CDC

    View Slide

  67. @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures 67
    Evolve processing from old systems to new
    Stream
    Processing
    RDBMS
    Existing
    App
    CDC
    New App

    View Slide

  68. @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures 68
    Evolve processing from old systems to new
    Stream
    Processing
    RDBMS
    Existing
    App
    New App

    New App

    CDC

    View Slide

  69. @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures 69
    Want your data anytime ?
    Batch is Latency built in by Design
    You say that like
    "latency" is a synonym
    for "evil"

    View Slide

  70. @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures 70
    It's all about the Events!

    View Slide


  71. @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures
    So…Analytics and Kafka

    View Slide

  72. @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures 72
    The Vision!
    "One
    version
    of the
    truth"

    View Slide

  73. @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures 73
    The Reality…

    View Slide

  74. @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures 74
    Pragmatism is…
    "One version
    of the truth"

    View Slide

  75. @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures 75
    Streaming Platform
    Stream
    Processing
    "One version
    of the truth"

    View Slide

  76. @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures 76
    Streaming Platform
    M L App
    NoSQL
    Search
    Graph
    Stream
    Processing
    "One version
    of the truth"

    View Slide

  77. @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures
    Database Changes Log Events loT Data Web Events …
    CRM
    Data Warehouse
    Database
    Hadoop
    Data

    Integration

    Monitoring
    Analytics
    Custom Apps
    Transformations
    Real-time Applications

    Apache Open Source Confluent Open Source Confluent Enterprise
    Confluent Platform
    Confluent Platform
    Apache Kafka®
    Core | Connect API | Streams API
    Data Compatibility
    Schema Registry
    Monitoring & Administration
    Confluent Control Center | Security
    Operations
    Replicator | Auto Data Balancing
    Development and Connectivity
    Clients | Connectors | REST Proxy | CLI
    Apache Open Source Confluent Open Source Confluent Enterprise
    SQL Stream Processing
    KSQL
    77
    Confluent Open Source :
    Apache Kafka with a bunch of cool stuff! For free!

    View Slide

  78. @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures 78
    Free Books!
    https://www.confluent.io/apache-kafka-stream-processing-book-bundle

    View Slide

  79. @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures 79
    Confluent Streaming Event, Munich
    http://cnfl.io/streaming-event-munich

    View Slide

  80. @rmoff
    [email protected]
    https://www.confluent.io/download/
    http://cnfl.io/slack

    View Slide

  81. @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures 81
    • CDC Spreadsheet
    • Blog: No More Silos: How to Integrate your Databases with Apache Kafka and CDC
    • #partner-engineering on Slack for questions
    • BD team (#partners / partne[email protected]) can help with introductions on a given sales op
    Resources
    #EOF

    View Slide