The Changing Face of ETL: Event-Driven Architectures for Data Engineers

2bded62396ea66c84bd10e91c718dea9?s=47 Robin Moffatt
February 27, 2019

The Changing Face of ETL: Event-Driven Architectures for Data Engineers

Data integration in architectures built on static, update-in-place datastores inevitably end up with pathologically high degrees of coupling and poor scalability. This has been the standard practice for decades, as we attempt to build data pipelines on top of databases that do a poor job modeling the fundamental objects that drive our businesses and systems: events.

Events carry both notification and state, and form a powerful primitive on which to build systems for developers and data engineers alike. Developers benefit from the asynchronous communication that events enable between services, and data engineers benefit from the integration capabilities. Everyone gains from using the standards-based, scalable and resilient streaming platform.

In this talk, we’ll discuss the concepts of events, their relevance to both software engineers and data engineers and their ability to unify architectures in a powerful way. We’ll see how stream processing makes sense in both a microservices and ETL environment, and why analytics, data integration and ETL fit naturally into a streaming world.

2bded62396ea66c84bd10e91c718dea9?s=128

Robin Moffatt

February 27, 2019
Tweet

Transcript

  1. 8.

    The Changing Face of ETL: Event-Driven Architectures for Data Engineers

    @rmoff It used to be so simple Photo by Patrick Fore on Unsplash
  2. 9.

    The Changing Face of ETL: Event-Driven Architectures for Data Engineers

    @rmoff Photo by Eugenio Mazzone on Unsplash More Sources
  3. 10.

    The Changing Face of ETL: Event-Driven Architectures for Data Engineers

    @rmoff Photo by Tom Barrett on Unsplash More Targets
  4. 11.
  5. 13.

    The Changing Face of ETL: Event-Driven Architectures for Data Engineers

    @rmoff Photo by Deva Darshan from Pexels Applications Respond → an order was placed! Analytics Tell Us What Happened → how many orders were placed
  6. 17.

    “ The Changing Face of ETL: Event-Driven Architectures for Data

    Engineers @rmoff An event is both: * Notification * State transfer
  7. 20.
  8. 22.

    The Changing Face of ETL: Event-Driven Architectures for Data Engineers

    @rmoff Events Bread Baked Beans Basket Bread Baked Beans ItemAdd ItemAdd
  9. 23.

    The Changing Face of ETL: Event-Driven Architectures for Data Engineers

    @rmoff Events Bread Basket Bread Baked Beans Baked Beans ItemAdd ItemAdd ItemRemove
  10. 24.

    The Changing Face of ETL: Event-Driven Architectures for Data Engineers

    @rmoff Events Bread Tinned Spaghetti Basket Bread Tinned Spaghetti Baked Beans Baked Beans ItemAdd ItemAdd ItemRemove ItemAdd
  11. 25.

    The Changing Face of ETL: Event-Driven Architectures for Data Engineers

    @rmoff Events Bread Tinned Spaghetti Basket Bread Tinned Spaghetti Baked Beans Baked Beans ItemAdd ItemAdd ItemRemove ItemAdd
  12. 26.

    The Changing Face of ETL: Event-Driven Architectures for Data Engineers

    @rmoff Events Bread Tinned Spaghetti Basket Bread Tinned Spaghetti Baked Beans Baked Beans ItemAdd ItemAdd ItemRemove ItemAdd
  13. 27.

    The Changing Face of ETL: Event-Driven Architectures for Data Engineers

    @rmoff Events Bread Tinned Spaghetti Basket Bread Tinned Spaghetti Baked Beans Baked Beans ItemAdd ItemAdd ItemRemove ItemAdd
  14. 28.

    The Changing Face of ETL: Event-Driven Architectures for Data Engineers

    @rmoff What is an Event Streaming Platform? The Log Connectors Connectors Producer Consumer Streaming Engine
  15. 29.

    The Changing Face of ETL: Event-Driven Architectures for Data Engineers

    @rmoff Immutable Event Log Old New Messages are added at the end of the log
  16. 30.

    The Changing Face of ETL: Event-Driven Architectures for Data Engineers

    @rmoff Consumers have a position all of their own Sally is here Old New Scan
  17. 31.

    The Changing Face of ETL: Event-Driven Architectures for Data Engineers

    @rmoff Consumers have a position all of their own Sally is here Fred is here Old New Scan Scan
  18. 32.

    The Changing Face of ETL: Event-Driven Architectures for Data Engineers

    @rmoff Consumers have a position all of their own Sally is here George is here Fred is here Old New Scan Scan Scan
  19. 33.

    The Changing Face of ETL: Event-Driven Architectures for Data Engineers

    @rmoff The Connect API The Log Connectors Connectors Producer Consumer Streaming Engine
  20. 34.

    The Changing Face of ETL: Event-Driven Architectures for Data Engineers

    @rmoff Streaming Integration with Kafka Connect Kafka Brokers Kafka Connect Tasks Workers Sources syslog flat file CSV JSON MQTT
  21. 35.

    The Changing Face of ETL: Event-Driven Architectures for Data Engineers

    @rmoff Streaming Integration with Kafka Connect Kafka Brokers Kafka Connect Tasks Workers Sinks Amazon S3 MQTT
  22. 36.

    The Changing Face of ETL: Event-Driven Architectures for Data Engineers

    @rmoff Streaming Integration with Kafka Connect Kafka Brokers Kafka Connect Tasks Workers Sources Sinks syslog flat file CSV JSON MQTT Amazon S3 MQTT
  23. 37.

    The Changing Face of ETL: Event-Driven Architectures for Data Engineers

    @rmoff Stream Processing in Kafka The Log Connectors Connectors Producer Consumer Streaming Engine
  24. 38.

    The Changing Face of ETL: Event-Driven Architectures for Data Engineers

    @rmoff Kafka Streams API final StreamsBuilder builder = new StreamsBuilder() .stream("orders", Consumed.with(stringSerde, ordersSerde)) .filter( (key, order) -> order.getStatus().equals("COMPLETE") ) .to("complete_orders", Produced.with(stringSerde, ordersSerde));
  25. 39.

    The Changing Face of ETL: Event-Driven Architectures for Data Engineers

    @rmoff Stream Processing with KSQL CREATE STREAM completedOrders AS SELECT * FROM orders
 WHERE status='COMPLETE';
  26. 40.

    The Changing Face of ETL: Event-Driven Architectures for Data Engineers

    @rmoff This is Something New Photo by Ash from Modern Afflatus on Unsplash
  27. 41.

    The Changing Face of ETL: Event-Driven Architectures for Data Engineers

    @rmoff Events in Action Review events reviews
  28. 42.

    The Changing Face of ETL: Event-Driven Architectures for Data Engineers

    @rmoff Events in Action Review events Operational dashboard reviews
  29. 43.

    The Changing Face of ETL: Event-Driven Architectures for Data Engineers

    @rmoff Events in Action Review events Operational dashboard reviews Data lake
  30. 44.

    The Changing Face of ETL: Event-Driven Architectures for Data Engineers

    @rmoff Events in Action Review events Filter out bad data Operational dashboard Data lake reviews reviews_clean CREATE STREAM reviews_clean AS SELECT * FROM reviews WHERE id IS NOT NULL;
  31. 45.

    The Changing Face of ETL: Event-Driven Architectures for Data Engineers

    @rmoff Events in Action Existing apps User data RDBMS txn log Kafka Connect Kafka users
  32. 46.

    The Changing Face of ETL: Event-Driven Architectures for Data Engineers

    @rmoff Events in Action Review events Operational dashboard Data lake User data users reviews reviews_clean Join events to users, and filter
  33. 47.

    The Changing Face of ETL: Event-Driven Architectures for Data Engineers

    @rmoff Events in Action Review events Operational dashboard Data lake User data CREATE STREAM reviews_clean AS SELECT * FROM reviews WHERE id IS NOT NULL CREATE STREAM enriched_reviews AS SELECT * FROM reviews_clean r INNER JOIN users u ON r.userid=u.userid; enriched_reviews reviews reviews_clean users Join events to users, and filter
  34. 48.

    The Changing Face of ETL: Event-Driven Architectures for Data Engineers

    @rmoff Events in Action Review events Operational dashboard Data lake User data Join events to users, and filter Notification service
  35. 49.

    The Changing Face of ETL: Event-Driven Architectures for Data Engineers

    @rmoff Events in Action Review events Notification service Operational dashboard Data lake User data CREATE STREAM unhappy_vips AS SELECT * FROM enriched_reviews WHERE rating < 3 AND status = 'Platinum'; unhappy_vips enriched_reviews reviews reviews_clean users Join events to users, and filter
  36. 51.
  37. 52.

    The Changing Face of ETL: Event-Driven Architectures for Data Engineers

    @rmoff Not Everything is a Nail Events Elasticsearch RDBMS
  38. 53.

    The Changing Face of ETL: Event-Driven Architectures for Data Engineers

    @rmoff Not Everything is a Nail Events Elasticsearch RDBMS Graph
  39. 54.

    The Changing Face of ETL: Event-Driven Architectures for Data Engineers

    @rmoff Side-by-Side Tech Evaluation Events HDFS
  40. 55.

    The Changing Face of ETL: Event-Driven Architectures for Data Engineers

    @rmoff Side-by-Side Tech Evaluation Events BiqQuery HDFS
  41. 56.

    The Changing Face of ETL: Event-Driven Architectures for Data Engineers

    @rmoff Side-by-Side Tech Evaluation Events BiqQuery HDFS Snowflake
  42. 57.

    The Changing Face of ETL: Event-Driven Architectures for Data Engineers

    @rmoff Evolve Data Sources Producer Consuming App A On- premises Consuming App B
  43. 58.

    The Changing Face of ETL: Event-Driven Architectures for Data Engineers

    @rmoff Evolve Data Sources Producer On- premises Producer Cloud Consuming App A Consuming App B
  44. 59.

    The Changing Face of ETL: Event-Driven Architectures for Data Engineers

    @rmoff Evolve Data Sources Producer Cloud Consuming App A Consuming App B
  45. 60.

    The Changing Face of ETL: Event-Driven Architectures for Data Engineers

    @rmoff Tight Coupling != Flexible Orders RDBMS
  46. 61.

    The Changing Face of ETL: Event-Driven Architectures for Data Engineers

    @rmoff Tight Coupling != Flexible Orders HDFS RDBMS
  47. 62.

    The Changing Face of ETL: Event-Driven Architectures for Data Engineers

    @rmoff Tight Coupling != Flexible Orders App HDFS RDBMS
  48. 63.

    The Changing Face of ETL: Event-Driven Architectures for Data Engineers

    @rmoff Loose Coupling == Freedom to Evolve Orders RDBMS
  49. 64.

    The Changing Face of ETL: Event-Driven Architectures for Data Engineers

    @rmoff Loose Coupling == Freedom to Evolve Orders HDFS RDBMS
  50. 65.

    The Changing Face of ETL: Event-Driven Architectures for Data Engineers

    @rmoff Loose Coupling == Freedom to Evolve Orders App HDFS RDBMS
  51. 66.

    The Changing Face of ETL: Event-Driven Architectures for Data Engineers

    @rmoff Transform Once, Use Many: Data Cleansing IoT App RDBMS App temp_raw
  52. 67.

    The Changing Face of ETL: Event-Driven Architectures for Data Engineers

    @rmoff Transform Once, Use Many: Data Cleansing IoT App RDBMS App temp_raw sensor_id time_epoch reading 42 1551136074 13.05 42 1551136125 13.11 1551136125 13.11 42 1551138129 13.04
  53. 68.

    The Changing Face of ETL: Event-Driven Architectures for Data Engineers

    @rmoff Transform Once, Use Many: Data Cleansing IoT App RDBMS App temp_raw Cleanse Cleanse Cleanse sensor_id time_epoch reading 42 1551136074 13.05 42 1551136125 13.11 1551136125 13.11 42 1551138129 13.04
  54. 69.

    The Changing Face of ETL: Event-Driven Architectures for Data Engineers

    @rmoff Transform Once, Use Many: Data Cleansing IoT App RDBMS App SENSOR_ID IS NOT NULL temp_raw sensor_id time_epoch reading 42 1551136074 13.05 42 1551136125 13.11 42 1551138129 13.04 temp_clean sensor_id time_epoch reading 42 1551136074 13.05 42 1551136125 13.11 1551136125 13.11 42 1551138129 13.04
  55. 71.
  56. 72.

    Photo by Benjamin Lambert on Unsplash Latency requirements Users of

    the data Scale Data fidelity ! Photo by Benjamin Lambert on Unsplash
  57. 73.

    The Changing Face of ETL: Event-Driven Architectures for Data Engineers

    @rmoff App App App App search Hadoop DWH monitoring security MQ MQ cache cache
  58. 74.

    The Changing Face of ETL: Event-Driven Architectures for Data Engineers

    @rmoff KAFKA DWH Hadoop App App App App App App App App request-response messaging OR stream processing streaming data pipelines changelogs
  59. 76.

    Event streaming platform Flexibility & scalability Data when you need

    it Data persistence Native stream processing Photo by rmoff
  60. 79.

    The Changing Face of ETL: Event-Driven Architectures for Data Engineers

    @rmoff • CDC Spreadsheet • Blog: No More Silos: How to Integrate your Databases with Apache Kafka and CDC • #partner-engineering on Slack for questions • BD team (#partners / partners@confluent.io) can help with introductions on a given sales op Resources #EOF