Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Streaming ETL on the Shoulders of Giants @ VoxxedDays Ticino 2019

Streaming ETL on the Shoulders of Giants @ VoxxedDays Ticino 2019

* Abstract:

Life doesn’t happen in batch mode which is why application engineers and data architects need to closely cooperate to get the best out of streaming platforms like Apache Kafka and operational NoSQL data stores such as MongoDB. This session explores ways and means to integrate both worlds in a streaming fashion.

* Description:

Without doubt stream processing is a big deal these days and oftentimes we find Apache Kafka as the central nervous system of company-wide data architectures. However, many real-world uses cases simply need an operational data store which is flexible, robust and scalable enough to live up to diverse application-related requirements and challenges. This session discusses different options in order to build solid data integration pipelines between MongoDB and Apache Kafka. The focus lies on configuration-based data in motion scenarios leveraging the Kafka Connect framework in order to lay out streaming ETL pipeline examples without writing a single line of code.

* Video Recording: https://www.youtube.com/watch?v=Uyu4TpVNzp8

Hans-Peter Grahsl

October 05, 2019
Tweet

More Decks by Hans-Peter Grahsl

Other Decks in Programming

Transcript

  1. Hans-Peter Grahsl • working & living in Graz • technical

    trainer at • independent consultant & engineer • associate lecturer • " occasional conference speaker @hpgrahsl | #VDT19 #VoxxedDays Ticino, 05th October 2019, Lugano - Switzerland 2
  2. For businesses to stay relevant they must deliver value at

    a breakneck pace and be constantly seeking new sources of value ... @hpgrahsl | #VDT19 #VoxxedDays Ticino, 05th October 2019, Lugano - Switzerland 4
  3. Historic ETL causes Pain • batch-driven • brittle / error

    prone • slow & late answers @hpgrahsl | #VDT19 #VoxxedDays Ticino, 05th October 2019, Lugano - Switzerland 15
  4. Streaming ETL alleviates Pain • event-centric • stream-oriented • fast

    & timely answers @hpgrahsl | #VDT19 #VoxxedDays Ticino, 05th October 2019, Lugano - Switzerland 17
  5. Enabler for Speed & Agility @hpgrahsl | #VDT19 #VoxxedDays Ticino,

    05th October 2019, Lugano - Switzerland 18
  6. On the Shoulders of G I A N T S

    @hpgrahsl | #VDT19 #VoxxedDays Ticino, 05th October 2019, Lugano - Switzerland 25
  7. MongoDB • rich document model • powerful queries & indexing

    • ACID transactions • transparent sharding & replication @hpgrahsl | #VDT19 #VoxxedDays Ticino, 05th October 2019, Lugano - Switzerland 27
  8. Apache Kafka • pub / sub to event streams •

    (permanently) store event streams • event streaming in near real-time @hpgrahsl | #VDT19 #VoxxedDays Ticino, 05th October 2019, Lugano - Switzerland 29
  9. "... data processing that is designed with infinite data sets

    in mind." — Tyler Akidau @hpgrahsl | #VDT19 #VoxxedDays Ticino, 05th October 2019, Lugano - Switzerland 31
  10. Kafka APIs for "everything" • simple pub / sub scenario

    ❓ Producer & Consumer API • streaming data integration ❓ Connect API • powerful stream processing ❓ KStreams API + KSQL @hpgrahsl | #VDT19 #VoxxedDays Ticino, 05th October 2019, Lugano - Switzerland 33
  11. Kafka Connect • often about data stores @hpgrahsl | #VDT19

    #VoxxedDays Ticino, 05th October 2019, Lugano - Switzerland 39
  12. MongoDB Connector • officially supported by MongoDB • developed open-source

    on GitHub • verified Gold by Confluent @hpgrahsl | #VDT19 #VoxxedDays Ticino, 05th October 2019, Lugano - Switzerland 51