Streaming systems like Apache Kafka, Amazon Kinesis, and Google Pubsub have become the defacto standard to capture real-time events and CDC events, which are then ingested into an OLAP system like Apache Pinot to derive real-time insights on data.
However, most of the time, the data in the stream needs to undergo transformations prior to entering an OLAP system in order to be useful for a user-facing application. Such pre-processing is typically achieved through stream processing pipelines running in systems like Apache Flink, Apache Samza, KStreams etc. While this approach works, it brings in operational overhead that is expensive and tedious to maintain.
In this talk, we will explore some powerful real-time ingestion features in Apache Pinot that almost eliminates the need for stream processing pipelines. Starting from ingestion operations like filtering and column transformations to handling CDC data from Debezium supported-sources, Apache Pinot reduces the effort needed to build a user-facing analytical application.