Upgrade to Pro — share decks privately, control downloads, hide ads and more …

The Changing Face of ETL: Event-Driven Architectures for Data Engineers

2bded62396ea66c84bd10e91c718dea9?s=47 Robin Moffatt
February 27, 2019

The Changing Face of ETL: Event-Driven Architectures for Data Engineers

Data integration in architectures built on static, update-in-place datastores inevitably end up with pathologically high degrees of coupling and poor scalability. This has been the standard practice for decades, as we attempt to build data pipelines on top of databases that do a poor job modeling the fundamental objects that drive our businesses and systems: events.

Events carry both notification and state, and form a powerful primitive on which to build systems for developers and data engineers alike. Developers benefit from the asynchronous communication that events enable between services, and data engineers benefit from the integration capabilities. Everyone gains from using the standards-based, scalable and resilient streaming platform.

In this talk, we’ll discuss the concepts of events, their relevance to both software engineers and data engineers and their ability to unify architectures in a powerful way. We’ll see how stream processing makes sense in both a microservices and ETL environment, and why analytics, data integration and ETL fit naturally into a streaming world.

2bded62396ea66c84bd10e91c718dea9?s=128

Robin Moffatt

February 27, 2019
Tweet

Transcript

  1. The Changing Face of ETL Event-Driven Architectures for Data Engineers

    @rmoff Photo by rmoff
  2. Photo by Samuel Sianipar on Unsplash

  3. Photo by Khai Sze Ong on Unsplash

  4. Photo by Rainier Ridao on Unsplash

  5. Photo by Rohit Tandon on Unsplash

  6. Photo by Theodore Moore on Unsplash

  7. Photo by Cristian Grecu on Unsplash

  8. The Changing Face of ETL: Event-Driven Architectures for Data Engineers

    @rmoff It used to be so simple Photo by Patrick Fore on Unsplash
  9. The Changing Face of ETL: Event-Driven Architectures for Data Engineers

    @rmoff Photo by Eugenio Mazzone on Unsplash More Sources
  10. The Changing Face of ETL: Event-Driven Architectures for Data Engineers

    @rmoff Photo by Tom Barrett on Unsplash More Targets
  11. The Changing Face of ETL: Event-Driven Architectures for Data Engineers

    @rmoff Photo by Kirill on Unsplash More Data
  12. The Changing Face of ETL: Event-Driven Architectures for Data Engineers

    @rmoff Batches and Buckets
  13. The Changing Face of ETL: Event-Driven Architectures for Data Engineers

    @rmoff Photo by Deva Darshan from Pexels Applications Respond → an order was placed! Analytics Tell Us What Happened → how many orders were placed
  14. The Changing Face of ETL: Event-Driven Architectures for Data Engineers

    @rmoff
  15. The Changing Face of ETL: Event-Driven Architectures for Data Engineers

    @rmoff Photo by NASA on Unsplash
  16. Photo by Mark Kamalov on Unsplash Events Events

  17. “ The Changing Face of ETL: Event-Driven Architectures for Data

    Engineers @rmoff An event is both: * Notification * State transfer
  18. The Changing Face of ETL: Event-Driven Architectures for Data Engineers

    @rmoff A Customer Experience
  19. The Changing Face of ETL: Event-Driven Architectures for Data Engineers

    @rmoff A Sensor Reading
  20. The Changing Face of ETL: Event-Driven Architectures for Data Engineers

    @rmoff Events Bread Tinned Spaghetti Basket
  21. The Changing Face of ETL: Event-Driven Architectures for Data Engineers

    @rmoff Events Bread Basket Bread ItemAdd
  22. The Changing Face of ETL: Event-Driven Architectures for Data Engineers

    @rmoff Events Bread Baked Beans Basket Bread Baked Beans ItemAdd ItemAdd
  23. The Changing Face of ETL: Event-Driven Architectures for Data Engineers

    @rmoff Events Bread Basket Bread Baked Beans Baked Beans ItemAdd ItemAdd ItemRemove
  24. The Changing Face of ETL: Event-Driven Architectures for Data Engineers

    @rmoff Events Bread Tinned Spaghetti Basket Bread Tinned Spaghetti Baked Beans Baked Beans ItemAdd ItemAdd ItemRemove ItemAdd
  25. The Changing Face of ETL: Event-Driven Architectures for Data Engineers

    @rmoff Events Bread Tinned Spaghetti Basket Bread Tinned Spaghetti Baked Beans Baked Beans ItemAdd ItemAdd ItemRemove ItemAdd
  26. The Changing Face of ETL: Event-Driven Architectures for Data Engineers

    @rmoff Events Bread Tinned Spaghetti Basket Bread Tinned Spaghetti Baked Beans Baked Beans ItemAdd ItemAdd ItemRemove ItemAdd
  27. The Changing Face of ETL: Event-Driven Architectures for Data Engineers

    @rmoff Events Bread Tinned Spaghetti Basket Bread Tinned Spaghetti Baked Beans Baked Beans ItemAdd ItemAdd ItemRemove ItemAdd
  28. The Changing Face of ETL: Event-Driven Architectures for Data Engineers

    @rmoff What is an Event Streaming Platform? The Log Connectors Connectors Producer Consumer Streaming Engine
  29. The Changing Face of ETL: Event-Driven Architectures for Data Engineers

    @rmoff Immutable Event Log Old New Messages are added at the end of the log
  30. The Changing Face of ETL: Event-Driven Architectures for Data Engineers

    @rmoff Consumers have a position all of their own Sally is here Old New Scan
  31. The Changing Face of ETL: Event-Driven Architectures for Data Engineers

    @rmoff Consumers have a position all of their own Sally is here Fred is here Old New Scan Scan
  32. The Changing Face of ETL: Event-Driven Architectures for Data Engineers

    @rmoff Consumers have a position all of their own Sally is here George is here Fred is here Old New Scan Scan Scan
  33. The Changing Face of ETL: Event-Driven Architectures for Data Engineers

    @rmoff The Connect API The Log Connectors Connectors Producer Consumer Streaming Engine
  34. The Changing Face of ETL: Event-Driven Architectures for Data Engineers

    @rmoff Streaming Integration with Kafka Connect Kafka Brokers Kafka Connect Tasks Workers Sources syslog flat file CSV JSON MQTT
  35. The Changing Face of ETL: Event-Driven Architectures for Data Engineers

    @rmoff Streaming Integration with Kafka Connect Kafka Brokers Kafka Connect Tasks Workers Sinks Amazon S3 MQTT
  36. The Changing Face of ETL: Event-Driven Architectures for Data Engineers

    @rmoff Streaming Integration with Kafka Connect Kafka Brokers Kafka Connect Tasks Workers Sources Sinks syslog flat file CSV JSON MQTT Amazon S3 MQTT
  37. The Changing Face of ETL: Event-Driven Architectures for Data Engineers

    @rmoff Stream Processing in Kafka The Log Connectors Connectors Producer Consumer Streaming Engine
  38. The Changing Face of ETL: Event-Driven Architectures for Data Engineers

    @rmoff Kafka Streams API final StreamsBuilder builder = new StreamsBuilder() .stream("orders", Consumed.with(stringSerde, ordersSerde)) .filter( (key, order) -> order.getStatus().equals("COMPLETE") ) .to("complete_orders", Produced.with(stringSerde, ordersSerde));
  39. The Changing Face of ETL: Event-Driven Architectures for Data Engineers

    @rmoff Stream Processing with KSQL CREATE STREAM completedOrders AS SELECT * FROM orders
 WHERE status='COMPLETE';
  40. The Changing Face of ETL: Event-Driven Architectures for Data Engineers

    @rmoff This is Something New Photo by Ash from Modern Afflatus on Unsplash
  41. The Changing Face of ETL: Event-Driven Architectures for Data Engineers

    @rmoff Events in Action Review events reviews
  42. The Changing Face of ETL: Event-Driven Architectures for Data Engineers

    @rmoff Events in Action Review events Operational dashboard reviews
  43. The Changing Face of ETL: Event-Driven Architectures for Data Engineers

    @rmoff Events in Action Review events Operational dashboard reviews Data lake
  44. The Changing Face of ETL: Event-Driven Architectures for Data Engineers

    @rmoff Events in Action Review events Filter out bad data Operational dashboard Data lake reviews reviews_clean CREATE STREAM reviews_clean AS SELECT * FROM reviews WHERE id IS NOT NULL;
  45. The Changing Face of ETL: Event-Driven Architectures for Data Engineers

    @rmoff Events in Action Existing apps User data RDBMS txn log Kafka Connect Kafka users
  46. The Changing Face of ETL: Event-Driven Architectures for Data Engineers

    @rmoff Events in Action Review events Operational dashboard Data lake User data users reviews reviews_clean Join events to users, and filter
  47. The Changing Face of ETL: Event-Driven Architectures for Data Engineers

    @rmoff Events in Action Review events Operational dashboard Data lake User data CREATE STREAM reviews_clean AS SELECT * FROM reviews WHERE id IS NOT NULL CREATE STREAM enriched_reviews AS SELECT * FROM reviews_clean r INNER JOIN users u ON r.userid=u.userid; enriched_reviews reviews reviews_clean users Join events to users, and filter
  48. The Changing Face of ETL: Event-Driven Architectures for Data Engineers

    @rmoff Events in Action Review events Operational dashboard Data lake User data Join events to users, and filter Notification service
  49. The Changing Face of ETL: Event-Driven Architectures for Data Engineers

    @rmoff Events in Action Review events Notification service Operational dashboard Data lake User data CREATE STREAM unhappy_vips AS SELECT * FROM enriched_reviews WHERE rating < 3 AND status = 'Platinum'; unhappy_vips enriched_reviews reviews reviews_clean users Join events to users, and filter
  50. The Power of an Event-Driven Architecture Photo by rmoff

  51. The Changing Face of ETL: Event-Driven Architectures for Data Engineers

    @rmoff Not Everything is a Nail Events RDBMS
  52. The Changing Face of ETL: Event-Driven Architectures for Data Engineers

    @rmoff Not Everything is a Nail Events Elasticsearch RDBMS
  53. The Changing Face of ETL: Event-Driven Architectures for Data Engineers

    @rmoff Not Everything is a Nail Events Elasticsearch RDBMS Graph
  54. The Changing Face of ETL: Event-Driven Architectures for Data Engineers

    @rmoff Side-by-Side Tech Evaluation Events HDFS
  55. The Changing Face of ETL: Event-Driven Architectures for Data Engineers

    @rmoff Side-by-Side Tech Evaluation Events BiqQuery HDFS
  56. The Changing Face of ETL: Event-Driven Architectures for Data Engineers

    @rmoff Side-by-Side Tech Evaluation Events BiqQuery HDFS Snowflake
  57. The Changing Face of ETL: Event-Driven Architectures for Data Engineers

    @rmoff Evolve Data Sources Producer Consuming App A On- premises Consuming App B
  58. The Changing Face of ETL: Event-Driven Architectures for Data Engineers

    @rmoff Evolve Data Sources Producer On- premises Producer Cloud Consuming App A Consuming App B
  59. The Changing Face of ETL: Event-Driven Architectures for Data Engineers

    @rmoff Evolve Data Sources Producer Cloud Consuming App A Consuming App B
  60. The Changing Face of ETL: Event-Driven Architectures for Data Engineers

    @rmoff Tight Coupling != Flexible Orders RDBMS
  61. The Changing Face of ETL: Event-Driven Architectures for Data Engineers

    @rmoff Tight Coupling != Flexible Orders HDFS RDBMS
  62. The Changing Face of ETL: Event-Driven Architectures for Data Engineers

    @rmoff Tight Coupling != Flexible Orders App HDFS RDBMS
  63. The Changing Face of ETL: Event-Driven Architectures for Data Engineers

    @rmoff Loose Coupling == Freedom to Evolve Orders RDBMS
  64. The Changing Face of ETL: Event-Driven Architectures for Data Engineers

    @rmoff Loose Coupling == Freedom to Evolve Orders HDFS RDBMS
  65. The Changing Face of ETL: Event-Driven Architectures for Data Engineers

    @rmoff Loose Coupling == Freedom to Evolve Orders App HDFS RDBMS
  66. The Changing Face of ETL: Event-Driven Architectures for Data Engineers

    @rmoff Transform Once, Use Many: Data Cleansing IoT App RDBMS App temp_raw
  67. The Changing Face of ETL: Event-Driven Architectures for Data Engineers

    @rmoff Transform Once, Use Many: Data Cleansing IoT App RDBMS App temp_raw sensor_id time_epoch reading 42 1551136074 13.05 42 1551136125 13.11 1551136125 13.11 42 1551138129 13.04
  68. The Changing Face of ETL: Event-Driven Architectures for Data Engineers

    @rmoff Transform Once, Use Many: Data Cleansing IoT App RDBMS App temp_raw Cleanse Cleanse Cleanse sensor_id time_epoch reading 42 1551136074 13.05 42 1551136125 13.11 1551136125 13.11 42 1551138129 13.04
  69. The Changing Face of ETL: Event-Driven Architectures for Data Engineers

    @rmoff Transform Once, Use Many: Data Cleansing IoT App RDBMS App SENSOR_ID IS NOT NULL temp_raw sensor_id time_epoch reading 42 1551136074 13.05 42 1551136125 13.11 42 1551138129 13.04 temp_clean sensor_id time_epoch reading 42 1551136074 13.05 42 1551136125 13.11 1551136125 13.11 42 1551138129 13.04
  70. Say NO to brittle pipelines Photo by rmoff

  71. The Changing Face of ETL: Event-Driven Architectures for Data Engineers

    @rmoff Photo by Benjamin Lambert on Unsplash
  72. Photo by Benjamin Lambert on Unsplash Latency requirements Users of

    the data Scale Data fidelity ! Photo by Benjamin Lambert on Unsplash
  73. The Changing Face of ETL: Event-Driven Architectures for Data Engineers

    @rmoff App App App App search Hadoop DWH monitoring security MQ MQ cache cache
  74. The Changing Face of ETL: Event-Driven Architectures for Data Engineers

    @rmoff KAFKA DWH Hadoop App App App App App App App App request-response messaging OR stream processing streaming data pipelines changelogs
  75. Events model the real world Photo by rmoff

  76. Event streaming platform Flexibility & scalability Data when you need

    it Data persistence Native stream processing Photo by rmoff
  77. The Changing Face of ETL: Event-Driven Architectures for Data Engineers

    @rmoff http://cnfl.io/book-bundle
  78. @rmoff confluent.io/download http://cnfl.io/slack http://cnfl.io/book-bundle Photo by rmoff

  79. The Changing Face of ETL: Event-Driven Architectures for Data Engineers

    @rmoff • CDC Spreadsheet • Blog: No More Silos: How to Integrate your Databases with Apache Kafka and CDC • #partner-engineering on Slack for questions • BD team (#partners / partners@confluent.io) can help with introductions on a given sales op Resources #EOF