Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Putting AI into Real-Time ETL with Apache Flink...

Putting AI into Real-Time ETL with Apache Flink, Debezium, and LangChain4j @ Devoxx Belgium 2024

Abstract:

As the saying goes: nothing is older than yesterday’s news, uhm, data. Join us for an immersive hands-on lab to explore real-time ETL using the triumphant trio Apache Flink, Debezium, and LangChain4j.

Participants will gain practical experience in setting up different end-to-end real-time data pipelines, streaming data from an operational database to an analytics data store—continuously, efficiently, and with a very low latency—enabling use cases such as full-text search and live dashboarding, enriched with LLM-derived metadata.

In the lab, you will learn how to:

* Build a real-time data pipeline from Postgres to OpenSearch, based on Apache Flink and Debezium for change data capture (CDC)
* Use Flink's connector capabilities to set up seamless real-time ETL pipelines between various data sources and sinks
* Implement data transformations, filtering, and aggregations on top of CDC streams in real time with the help of streaming SQL
* Integrate a large language model (LLM) for sentiment analysis based on LangChain4j, enabling deeper insights into the processed data

Join this lab to advance your skills in working with real-time data and learn how robust and leading open-source technologies support your business-critical stream processing workloads.

Repository: https://github.com/decodableco/oss-streaming-lab

Recording: hands-on labs haven't been recorded

Hans-Peter Grahsl

October 08, 2024
Tweet

More Decks by Hans-Peter Grahsl

Other Decks in Programming

Transcript

  1. Image © NASA Goddard Space Flight Center https://flic.kr/p/YdWQqe (CC BY

    2.0) Gunnar Morling Hans-Peter Grahsl @gunnarmorling @hpgrahsl Putting AI Into Real-time ETL with Apache Flink, Debezium, and LangChain4j
  2. Real-Time ETL with Apache Flink and Debezium | @hpgrahsl @gunnarmorling

    • Software engineer at Decodable • Former project lead of Debezium • kcctl 🧸, JfrUnit, ModiTect, MapStruct • Java Champion • 1⃣ 🐝 🏎 Gunnar Morling
  3. Real-Time ETL with Apache Flink and Debezium | @hpgrahsl @gunnarmorling

    • Developer Advocate at Decodable • Open-source community enthusiast • Stream processing addict • Regular conference speaker Hans-Peter Grahsl
  4. Real-Time ETL with Apache Flink and Debezium | @hpgrahsl @gunnarmorling

    Apache Flink Stateful Computations over Data Streams https://flink.apache.org/
  5. Real-Time ETL with Apache Flink and Debezium | @hpgrahsl @gunnarmorling

    Apache Flink APIs for Application Development Image source: “Change Data Capture with Flink SQL and Debezium” by Marta Paes at DataEngBytes (https://noti.st/morsapaes/liQzgs/change-data-capture-with-flink-sql-and-debezium)
  6. Real-Time ETL with Apache Flink and Debezium | @hpgrahsl @gunnarmorling

    Change Data Capture Liberation for Your Data https://www.decodable.co/blog/seven-ways-to-put-cdc-to-work
  7. Real-Time ETL with Apache Flink and Debezium | @hpgrahsl @gunnarmorling

    langchain4j Simplify integrating LLMs into Java Apps Integration with Unified APIs • 15 LLM Providers • 15 Embedding Models • 20 Vector Stores Comprehensive AI Toolbox • Prompt Templating • Chat Memory • Function Calling
  8. Real-Time ETL with Apache Flink and Debezium | @hpgrahsl @gunnarmorling

    langchain4j Nicely Bridges to ONNX ONNX runtime + format • Use customized models in ONNX format • Supports many existing HuggingFace models & tokenizers • In-process model inference → nicely fits stream processing