$30 off During Our Annual Pro Sale. View Details »

The Changing Face of ETL: Event-Driven Architectures for Data Engineers

Robin Moffatt
February 27, 2019

The Changing Face of ETL: Event-Driven Architectures for Data Engineers

Data integration in architectures built on static, update-in-place datastores inevitably end up with pathologically high degrees of coupling and poor scalability. This has been the standard practice for decades, as we attempt to build data pipelines on top of databases that do a poor job modeling the fundamental objects that drive our businesses and systems: events.

Events carry both notification and state, and form a powerful primitive on which to build systems for developers and data engineers alike. Developers benefit from the asynchronous communication that events enable between services, and data engineers benefit from the integration capabilities. Everyone gains from using the standards-based, scalable and resilient streaming platform.

In this talk, we’ll discuss the concepts of events, their relevance to both software engineers and data engineers and their ability to unify architectures in a powerful way. We’ll see how stream processing makes sense in both a microservices and ETL environment, and why analytics, data integration and ETL fit naturally into a streaming world.

Robin Moffatt

February 27, 2019
Tweet

More Decks by Robin Moffatt

Other Decks in Technology

Transcript

  1. The Changing Face of ETL
    Event-Driven Architectures
    for Data Engineers
    @rmoff
    Photo by rmoff

    View Slide

  2. Photo by Samuel Sianipar on Unsplash

    View Slide

  3. Photo by Khai Sze Ong on Unsplash

    View Slide

  4. Photo by Rainier Ridao on Unsplash

    View Slide

  5. Photo by Rohit Tandon on Unsplash

    View Slide

  6. Photo by Theodore Moore on Unsplash

    View Slide

  7. Photo by Cristian Grecu on Unsplash

    View Slide

  8. The Changing Face of ETL: Event-Driven Architectures for Data Engineers
    @rmoff
    It used to be so simple
    Photo by Patrick Fore on Unsplash

    View Slide

  9. The Changing Face of ETL: Event-Driven Architectures for Data Engineers
    @rmoff
    Photo by Eugenio Mazzone on Unsplash
    More
    Sources

    View Slide

  10. The Changing Face of ETL: Event-Driven Architectures for Data Engineers
    @rmoff
    Photo by Tom Barrett on Unsplash
    More
    Targets

    View Slide

  11. The Changing Face of ETL: Event-Driven Architectures for Data Engineers
    @rmoff
    Photo by Kirill on Unsplash
    More Data

    View Slide

  12. The Changing Face of ETL: Event-Driven Architectures for Data Engineers
    @rmoff
    Batches
    and
    Buckets

    View Slide

  13. The Changing Face of ETL: Event-Driven Architectures for Data Engineers
    @rmoff
    Photo by Deva Darshan from Pexels
    Applications
    Respond
    → an order was placed!
    Analytics
    Tell Us What
    Happened
    → how many orders
    were placed

    View Slide

  14. The Changing Face of ETL: Event-Driven Architectures for Data Engineers
    @rmoff

    View Slide

  15. The Changing Face of ETL: Event-Driven Architectures for Data Engineers
    @rmoff
    Photo by NASA on Unsplash

    View Slide

  16. Photo by Mark Kamalov on Unsplash
    Events
    Events

    View Slide


  17. The Changing Face of ETL: Event-Driven Architectures for Data Engineers
    @rmoff
    An event is both:
    * Notification
    * State transfer

    View Slide

  18. The Changing Face of ETL: Event-Driven Architectures for Data Engineers
    @rmoff
    A Customer
    Experience

    View Slide

  19. The Changing Face of ETL: Event-Driven Architectures for Data Engineers
    @rmoff
    A Sensor
    Reading

    View Slide

  20. The Changing Face of ETL: Event-Driven Architectures for Data Engineers
    @rmoff
    Events
    Bread
    Tinned
    Spaghetti
    Basket

    View Slide

  21. The Changing Face of ETL: Event-Driven Architectures for Data Engineers
    @rmoff
    Events
    Bread
    Basket
    Bread
    ItemAdd

    View Slide

  22. The Changing Face of ETL: Event-Driven Architectures for Data Engineers
    @rmoff
    Events
    Bread
    Baked
    Beans
    Basket
    Bread
    Baked
    Beans
    ItemAdd ItemAdd

    View Slide

  23. The Changing Face of ETL: Event-Driven Architectures for Data Engineers
    @rmoff
    Events
    Bread
    Basket
    Bread
    Baked
    Beans
    Baked
    Beans
    ItemAdd ItemAdd ItemRemove

    View Slide

  24. The Changing Face of ETL: Event-Driven Architectures for Data Engineers
    @rmoff
    Events
    Bread
    Tinned
    Spaghetti
    Basket
    Bread
    Tinned
    Spaghetti
    Baked
    Beans
    Baked
    Beans
    ItemAdd ItemAdd ItemRemove ItemAdd

    View Slide

  25. The Changing Face of ETL: Event-Driven Architectures for Data Engineers
    @rmoff
    Events
    Bread
    Tinned
    Spaghetti
    Basket
    Bread
    Tinned
    Spaghetti
    Baked
    Beans
    Baked
    Beans
    ItemAdd ItemAdd ItemRemove ItemAdd

    View Slide

  26. The Changing Face of ETL: Event-Driven Architectures for Data Engineers
    @rmoff
    Events
    Bread
    Tinned
    Spaghetti
    Basket
    Bread
    Tinned
    Spaghetti
    Baked
    Beans
    Baked
    Beans
    ItemAdd ItemAdd ItemRemove ItemAdd

    View Slide

  27. The Changing Face of ETL: Event-Driven Architectures for Data Engineers
    @rmoff
    Events
    Bread
    Tinned
    Spaghetti
    Basket
    Bread
    Tinned
    Spaghetti
    Baked
    Beans
    Baked
    Beans
    ItemAdd ItemAdd ItemRemove ItemAdd

    View Slide

  28. The Changing Face of ETL: Event-Driven Architectures for Data Engineers
    @rmoff
    What is an Event Streaming Platform?
    The Log Connectors
    Connectors
    Producer Consumer
    Streaming Engine

    View Slide

  29. The Changing Face of ETL: Event-Driven Architectures for Data Engineers
    @rmoff
    Immutable Event Log
    Old New
    Messages are added at the end of the log

    View Slide

  30. The Changing Face of ETL: Event-Driven Architectures for Data Engineers
    @rmoff
    Consumers have a position all of their own
    Sally
    is here
    Old New
    Scan

    View Slide

  31. The Changing Face of ETL: Event-Driven Architectures for Data Engineers
    @rmoff
    Consumers have a position all of their own
    Sally
    is here
    Fred
    is here
    Old New
    Scan
    Scan

    View Slide

  32. The Changing Face of ETL: Event-Driven Architectures for Data Engineers
    @rmoff
    Consumers have a position all of their own
    Sally
    is here
    George
    is here
    Fred
    is here
    Old New
    Scan
    Scan
    Scan

    View Slide

  33. The Changing Face of ETL: Event-Driven Architectures for Data Engineers
    @rmoff
    The Connect API
    The Log Connectors
    Connectors
    Producer Consumer
    Streaming Engine

    View Slide

  34. The Changing Face of ETL: Event-Driven Architectures for Data Engineers
    @rmoff
    Streaming Integration with Kafka Connect
    Kafka Brokers
    Kafka Connect
    Tasks Workers
    Sources
    syslog
    flat file
    CSV
    JSON
    MQTT

    View Slide

  35. The Changing Face of ETL: Event-Driven Architectures for Data Engineers
    @rmoff
    Streaming Integration with Kafka Connect
    Kafka Brokers
    Kafka Connect
    Tasks Workers
    Sinks
    Amazon S3
    MQTT

    View Slide

  36. The Changing Face of ETL: Event-Driven Architectures for Data Engineers
    @rmoff
    Streaming Integration with Kafka Connect
    Kafka Brokers
    Kafka Connect
    Tasks Workers
    Sources Sinks
    syslog
    flat file
    CSV
    JSON
    MQTT
    Amazon S3
    MQTT

    View Slide

  37. The Changing Face of ETL: Event-Driven Architectures for Data Engineers
    @rmoff
    Stream Processing in Kafka
    The Log Connectors
    Connectors
    Producer Consumer
    Streaming Engine

    View Slide

  38. The Changing Face of ETL: Event-Driven Architectures for Data Engineers
    @rmoff
    Kafka Streams API
    final StreamsBuilder builder = new StreamsBuilder()
    .stream("orders", Consumed.with(stringSerde, ordersSerde))
    .filter( (key, order) -> order.getStatus().equals("COMPLETE") )
    .to("complete_orders", Produced.with(stringSerde, ordersSerde));

    View Slide

  39. The Changing Face of ETL: Event-Driven Architectures for Data Engineers
    @rmoff
    Stream Processing with KSQL
    CREATE STREAM completedOrders AS
    SELECT *
    FROM orders

    WHERE status='COMPLETE';

    View Slide

  40. The Changing Face of ETL: Event-Driven Architectures for Data Engineers
    @rmoff
    This is
    Something
    New
    Photo by Ash from Modern Afflatus on Unsplash

    View Slide

  41. The Changing Face of ETL: Event-Driven Architectures for Data Engineers
    @rmoff
    Events in Action
    Review
    events
    reviews

    View Slide

  42. The Changing Face of ETL: Event-Driven Architectures for Data Engineers
    @rmoff
    Events in Action
    Review
    events
    Operational
    dashboard
    reviews

    View Slide

  43. The Changing Face of ETL: Event-Driven Architectures for Data Engineers
    @rmoff
    Events in Action
    Review
    events
    Operational
    dashboard
    reviews
    Data
    lake

    View Slide

  44. The Changing Face of ETL: Event-Driven Architectures for Data Engineers
    @rmoff
    Events in Action
    Review
    events
    Filter out bad
    data
    Operational
    dashboard
    Data
    lake
    reviews
    reviews_clean
    CREATE STREAM reviews_clean AS
    SELECT * FROM reviews
    WHERE id IS NOT NULL;

    View Slide

  45. The Changing Face of ETL: Event-Driven Architectures for Data Engineers
    @rmoff
    Events in Action
    Existing apps
    User
    data
    RDBMS txn log
    Kafka
    Connect
    Kafka
    users

    View Slide

  46. The Changing Face of ETL: Event-Driven Architectures for Data Engineers
    @rmoff
    Events in Action
    Review
    events
    Operational
    dashboard
    Data
    lake
    User
    data
    users
    reviews
    reviews_clean
    Join events to
    users, and filter

    View Slide

  47. The Changing Face of ETL: Event-Driven Architectures for Data Engineers
    @rmoff
    Events in Action
    Review
    events
    Operational
    dashboard
    Data
    lake
    User
    data
    CREATE STREAM reviews_clean AS
    SELECT * FROM reviews
    WHERE id IS NOT NULL
    CREATE STREAM enriched_reviews AS
    SELECT * FROM reviews_clean r
    INNER JOIN users u
    ON r.userid=u.userid;
    enriched_reviews
    reviews
    reviews_clean
    users
    Join events to
    users, and filter

    View Slide

  48. The Changing Face of ETL: Event-Driven Architectures for Data Engineers
    @rmoff
    Events in Action
    Review
    events
    Operational
    dashboard
    Data
    lake
    User
    data
    Join events to
    users, and filter
    Notification
    service

    View Slide

  49. The Changing Face of ETL: Event-Driven Architectures for Data Engineers
    @rmoff
    Events in Action
    Review
    events
    Notification
    service
    Operational
    dashboard
    Data
    lake
    User
    data
    CREATE STREAM unhappy_vips AS
    SELECT * FROM enriched_reviews
    WHERE rating < 3
    AND status = 'Platinum';
    unhappy_vips
    enriched_reviews
    reviews
    reviews_clean
    users
    Join events to
    users, and filter

    View Slide

  50. The Power of an
    Event-Driven Architecture
    Photo by rmoff

    View Slide

  51. The Changing Face of ETL: Event-Driven Architectures for Data Engineers
    @rmoff
    Not Everything is a Nail
    Events
    RDBMS

    View Slide

  52. The Changing Face of ETL: Event-Driven Architectures for Data Engineers
    @rmoff
    Not Everything is a Nail
    Events
    Elasticsearch
    RDBMS

    View Slide

  53. The Changing Face of ETL: Event-Driven Architectures for Data Engineers
    @rmoff
    Not Everything is a Nail
    Events
    Elasticsearch
    RDBMS
    Graph

    View Slide

  54. The Changing Face of ETL: Event-Driven Architectures for Data Engineers
    @rmoff
    Side-by-Side Tech Evaluation
    Events
    HDFS

    View Slide

  55. The Changing Face of ETL: Event-Driven Architectures for Data Engineers
    @rmoff
    Side-by-Side Tech Evaluation
    Events
    BiqQuery
    HDFS

    View Slide

  56. The Changing Face of ETL: Event-Driven Architectures for Data Engineers
    @rmoff
    Side-by-Side Tech Evaluation
    Events
    BiqQuery
    HDFS
    Snowflake

    View Slide

  57. The Changing Face of ETL: Event-Driven Architectures for Data Engineers
    @rmoff
    Evolve Data Sources
    Producer
    Consuming
    App A
    On-
    premises
    Consuming
    App B

    View Slide

  58. The Changing Face of ETL: Event-Driven Architectures for Data Engineers
    @rmoff
    Evolve Data Sources
    Producer
    On-
    premises
    Producer
    Cloud
    Consuming
    App A
    Consuming
    App B

    View Slide

  59. The Changing Face of ETL: Event-Driven Architectures for Data Engineers
    @rmoff
    Evolve Data Sources
    Producer
    Cloud
    Consuming
    App A
    Consuming
    App B

    View Slide

  60. The Changing Face of ETL: Event-Driven Architectures for Data Engineers
    @rmoff
    Tight Coupling != Flexible
    Orders RDBMS

    View Slide

  61. The Changing Face of ETL: Event-Driven Architectures for Data Engineers
    @rmoff
    Tight Coupling != Flexible
    Orders HDFS
    RDBMS

    View Slide

  62. The Changing Face of ETL: Event-Driven Architectures for Data Engineers
    @rmoff
    Tight Coupling != Flexible
    Orders
    App
    HDFS
    RDBMS

    View Slide

  63. The Changing Face of ETL: Event-Driven Architectures for Data Engineers
    @rmoff
    Loose Coupling == Freedom to Evolve
    Orders
    RDBMS

    View Slide

  64. The Changing Face of ETL: Event-Driven Architectures for Data Engineers
    @rmoff
    Loose Coupling == Freedom to Evolve
    Orders
    HDFS
    RDBMS

    View Slide

  65. The Changing Face of ETL: Event-Driven Architectures for Data Engineers
    @rmoff
    Loose Coupling == Freedom to Evolve
    Orders
    App
    HDFS
    RDBMS

    View Slide

  66. The Changing Face of ETL: Event-Driven Architectures for Data Engineers
    @rmoff
    Transform Once, Use Many: Data Cleansing
    IoT
    App
    RDBMS
    App
    temp_raw

    View Slide

  67. The Changing Face of ETL: Event-Driven Architectures for Data Engineers
    @rmoff
    Transform Once, Use Many: Data Cleansing
    IoT
    App
    RDBMS
    App
    temp_raw
    sensor_id time_epoch reading
    42 1551136074 13.05
    42 1551136125 13.11
    1551136125 13.11
    42 1551138129 13.04

    View Slide

  68. The Changing Face of ETL: Event-Driven Architectures for Data Engineers
    @rmoff
    Transform Once, Use Many: Data Cleansing
    IoT
    App
    RDBMS
    App
    temp_raw
    Cleanse
    Cleanse
    Cleanse
    sensor_id time_epoch reading
    42 1551136074 13.05
    42 1551136125 13.11
    1551136125 13.11
    42 1551138129 13.04

    View Slide

  69. The Changing Face of ETL: Event-Driven Architectures for Data Engineers
    @rmoff
    Transform Once, Use Many: Data Cleansing
    IoT
    App
    RDBMS
    App
    SENSOR_ID
    IS NOT NULL
    temp_raw
    sensor_id time_epoch reading
    42 1551136074 13.05
    42 1551136125 13.11
    42 1551138129 13.04
    temp_clean
    sensor_id time_epoch reading
    42 1551136074 13.05
    42 1551136125 13.11
    1551136125 13.11
    42 1551138129 13.04

    View Slide

  70. Say NO to brittle
    pipelines
    Photo by rmoff

    View Slide

  71. The Changing Face of ETL: Event-Driven Architectures for Data Engineers
    @rmoff
    Photo by Benjamin Lambert on Unsplash

    View Slide

  72. Photo by Benjamin Lambert on Unsplash
    Latency requirements
    Users of the data
    Scale
    Data fidelity
    !
    Photo by Benjamin Lambert on Unsplash

    View Slide

  73. The Changing Face of ETL: Event-Driven Architectures for Data Engineers
    @rmoff
    App App App App
    search
    Hadoop
    DWH
    monitoring security
    MQ MQ
    cache
    cache

    View Slide

  74. The Changing Face of ETL: Event-Driven Architectures for Data Engineers
    @rmoff
    KAFKA
    DWH Hadoop
    App
    App App App App
    App
    App
    App
    request-response
    messaging
    OR
    stream
    processing
    streaming data pipelines
    changelogs

    View Slide

  75. Events model
    the real world
    Photo by rmoff

    View Slide

  76. Event streaming platform
    Flexibility
    & scalability
    Data when
    you need it
    Data persistence
    Native stream
    processing
    Photo by rmoff

    View Slide

  77. The Changing Face of ETL: Event-Driven Architectures for Data Engineers
    @rmoff
    http://cnfl.io/book-bundle

    View Slide

  78. @rmoff
    confluent.io/download
    http://cnfl.io/slack
    http://cnfl.io/book-bundle
    Photo by rmoff

    View Slide

  79. The Changing Face of ETL: Event-Driven Architectures for Data Engineers
    @rmoff
    • CDC Spreadsheet
    • Blog: No More Silos: How to Integrate your Databases with Apache Kafka and CDC
    • #partner-engineering on Slack for questions
    • BD team (#partners / [email protected]) can help with introductions on a given sales op
    Resources
    #EOF

    View Slide