Building Streaming ETL Pipelines with Redpanda Ecosystem

2024 Serverless Streaming ETL Pipelines with Redpanda Dunith Danushka Senior
Developer Advocate @ Redpanda

© 2023 REDPANDA DATA About the presenter 2 Dunith Dhanushka
Senior Developer Advocate, Redpanda Data • Event streaming, real-time analytics, and stream processing enthusiast • Frequent blogger, speaker, and an educator

Agenda 1. Two ways of building data pipelines. 2. Streaming
ETL use case - Payment processing. 3. Making it serverless. 4. Demo. 5. Wrap up.

Data pipelines

A data pipeline is a series of processes and tools
that automate the ﬂow, transformation, and storage of data from various sources to a ﬁnal destination for analysis or use.

Two ways of building data pipelines

Batch Pipelines • Comparatively easy to build, debug, and learn.
• Increased latency is a concern for data freshness. • Most common in the space and easy to get started with.

Streaming Pipelines • Extracts, transforms, and loads data as it
is generated. • Ideal for latency-sensitve use cases, like fraud detection, recommendation, etc. • Challenging to implement, debug, and scale.

Let’s take a practical use case

Pipeline goals • PI redaction - Scrub sensitive ﬁelds for
compliance. • Data transformation - Normalize and optimize data for downstream systems.

Streaming ETL pipeline with Apache Kafka and Flink

Turning this into a serverless solution

Redpanda

Redpanda Serverless • A Kafka API-compatible streaming data platform. •
Written in C++, offering more performance and resource efﬁciency than Kafka. • Simpler to work with and developers love it!

Redpanda’s role in the solution • Support high-throughput low-latency payment
data ingestion. • Offer scalable and cost-efﬁcient long term data retention. • Store transformed data and allow scalable downstream consumption

Decodable

Decodable • A serverless platform for building real-time ETL pipelines.
• Managed Apache Flink and Debezium as a service.

Decodable’s role in the solution • Redact PIs and transform
payment events with Flink SQL. • Manage the Flink job. • Scale the processing as needed.

Visit cloud.redpanda.com

Beneﬁts of making it serverless • Management overhead has been
taken care by vendors. • Usage-based pricing, pay-as-you-grow! • Less learning curve for developers, reduced onboarding time. • On-demand scaling, storage, and compute.

Concerns • Data sovereignty. • Security - data at rest
as well as data in transit. • Interoperability

Questions?

© 2024 REDPANDA DATA 34 Keep Learning University Self-paced, online
courses. https://university.redpanda.com Docs Get a peak under the hood. https://docs.redpanda.com/ Slack Engage with our community. https://redpanda.com/slack Blogs Keep up to date with Redpanda. https://redpanda.com/blog Code Check out the source. https://github.com/redpanda-data Serverless Or just get started in seconds! https://redpanda.com/try-redpanda

Thank You! Contact me at: [email protected] @dunithd linkedin.com/in/dunithd

Building Streaming ETL Pipelines with Redpanda ...

Building Streaming ETL Pipelines with Redpanda Ecosystem

Dunith Dhanushka

More Decks by Dunith Dhanushka

Featured

Transcript

2024 Serverless Streaming ETL Pipelines with Redpanda Dunith Danushka Senior

© 2023 REDPANDA DATA About the presenter 2 Dunith Dhanushka

Agenda 1. Two ways of building data pipelines. 2. Streaming

Data pipelines

A data pipeline is a series of processes and tools

Two ways of building data pipelines

Batch Pipelines • Comparatively easy to build, debug, and learn.

Streaming Pipelines • Extracts, transforms, and loads data as it

Let’s take a practical use case

Pipeline goals • PI redaction - Scrub sensitive ﬁelds for

Streaming ETL pipeline with Apache Kafka and Flink

Turning this into a serverless solution

Redpanda

Redpanda Serverless • A Kafka API-compatible streaming data platform. •

Redpanda’s role in the solution • Support high-throughput low-latency payment

Decodable

Decodable • A serverless platform for building real-time ETL pipelines.

Decodable’s role in the solution • Redact PIs and transform

Demo

Visit cloud.redpanda.com

Beneﬁts of making it serverless • Management overhead has been

Concerns • Data sovereignty. • Security - data at rest

Questions?

© 2024 REDPANDA DATA 34 Keep Learning University Self-paced, online

Thank You! Contact me at: [email protected] @dunithd linkedin.com/in/dunithd