Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Streaming ETL on the Shoulders of Giants @ MongoDB World 2019

Streaming ETL on the Shoulders of Giants @ MongoDB World 2019

* Abstract:

Life doesn’t happen in batch mode which is why application engineers and data architects need to closely cooperate to get the best out of streaming platforms like Apache Kafka and NoSQL data stores such as MongoDB. This session explores ways and means to integrate both worlds in a streaming fashion.

* Description:

Without doubt stream processing is a big deal these days and oftentimes we find Apache Kafka as the central nervous system of company-wide data architectures. However, many real-world uses cases simply need an operational data store which is flexible, robust and scalable enough to live up to diverse application-related requirements and challenges. This session discusses different options in order to build solid data integration pipelines between MongoDB and Apache Kafka. The focus lies on configuration-based data in motion scenarios leveraging the Kafka Connect framework in order to lay out streaming ETL pipeline examples without writing a single line of code.

* Video Recording: pending...

744f1c2c6cbea2ff5104b0ac512936bd?s=128

Hans-Peter Grahsl

June 17, 2019
Tweet

Transcript

  1. Streaming ETL on the Shoulders of Giants Scott L’Hommedieu, MongoDB

    Hans-Peter Grahsl, NETCONOMY llamadew hpgrahsl
  2. Streaming ETL on the Shoulders of Giants Why ETL is

    important How we can “ETL better” Let’s see (some use cases) + a DEMO!
  3. Speed & Agility For businesses to stay relevant they must

    deliver value at a breakneck pace and be constantly seeking new sources of value. A Top 5 Tech Risk* *google ”top tech risks”
  4. Managing, Processing and Analyzing Data We use Data To unlock

    insights And drive value
  5. But, historic ETL is painful An antipattern for Speed and

    Agility ETL = Batch( Error Prone , Brittle, Slow )
  6. Solving the pain of ETL through Streaming Data Speed and

    Agility ETL = DataStream ( Resilient, Loosely Coupled, Realtime)
  7. Streaming ETL on the Shoulders of Giants Why ETL is

    important How we can “ETL” better Let’s see (some use cases) + a DEMO!
  8. Architecture of a Modern Data Platform

  9. Architecture of a Modern Data Platform Streaming Data Platform

  10. Connected Apps Architecture of a Modern Data Platform Streaming Data

    Platform Datastores
  11. Stream Processors Connected Apps Architecture of a Modern Data Platform

    Streaming Data Platform Datastores Connected Apps Datastores
  12. On the shoulders of Giants Kafka MongoDB

  13. Modern Data Platform

  14. Modern Data Platform Doc Model Run Anywhere Distributed and Scalable

    Resilient and Performant
  15. Apache Kafka 101

  16. Streaming Platform

  17. Streaming Platform • distributed • horizontally scalable • highly fault-tolerant

  18. What is Streaming? “a type of data processing that is

    designed with infinite data sets in mind” –Tyler Akidau
  19. "…everything that happens in a company – every customer interaction,

    every API request, every database change – can be represented as real-time stream that anything else can tap into, process or react to."
  20. "…Kafka and the whole category of stream processing represents a

    fundamental paradigm shift in how the digital part of a company is built, how data is used, and how applications are built. This is actually a pretty rare thing…" – Jay Kreps
  21. KStreams App Data Sources Data Sinks KSQL App Streams API

    KSQL Consumer API Connect API App Apps App Apps Connect API Producer API
  22. Kafka APIs in a Nutshell… § Producer & Consumer API

    à publish-subscribe scenarios § Connect API à streaming data integration scenarios § Streams API & KSQL à code or SQL-based streaming scenarios
  23. A bit more about Kafka Connect …

  24. Kafka Connect Basics ANY Sink Connect Connect ANY Source ANY

    à e.g. file systems, data stores, REST endpoints, …
  25. Kafka Connect Basics often about data stores Connect Connect SOURCE

    SINK
  26. Kafka Connect Basics or more concretely Source Connectors Sink Connectors

    https://hub.confluent.io à many many more
  27. Kafka Connect Basics or more concretely MongoDB Source MongoDB Sink

    https://hub.confluent.io à many many more
  28. How do connectors operate?

  29. Kafka Source Connectors Source Connector Converter Serialize S M T

    1 … N Single Message Transforms for basic in-flight manipulations … S M T
  30. Kafka Sink Connectors Converter Deserialize Sink Connector S M T

    1 … N Single Message Transforms for basic in-flight manipulations … S M T
  31. Announcing …

  32. Available on the Confluent Hub: https://www.confluent.io/hub/mongodb/kafka-connect-mongodb MongoDB Connector for Apache

    Kafka Supported by MongoDB Verified Gold by
  33. MongoDB Connector for Apache Kafka Available on the Confluent Hub:

    https://www.confluent.io/hub/mongodb/kafka-connect-mongodb
  34. Streaming ETL on the Shoulders of Giants Why ETL is

    important How we can “ETL better” Let’s see (some use cases) + a DEMO!
  35. Streaming ETL Use Cases

  36. Single Customer View for eCommerce MongoDB Sinks Single Source of

    Truth Source Connectors
  37. Data Synchronization between Microservices Service 1 Service N MongoDB Sinks

    . . .
  38. Recommendation Engine for Opinion Mining Surveys & Polls Data MongoDB

    Source Change Streams Change Streams User Recommendation Engine
  39. IoT Demo Scenario in Action

  40. Producer API data generation Stream Processor KSQL data serving REST

    Change Streams device management SSE IoT Demo Scenario !
  41. Producer API data generation Stream Processor KSQL MongoDB Sink Connector

    MongoDB Source Connector data serving REST Change Streams device management SSE IoT Demo Scenario
  42. That’s all folks! THANK YOU

  43. None