It has been a challenge to bootstrap or backfill upsert table (e.g. for correction) with long retention in Pinot, given upsert table must be a real-time table. However, in most organizations, streams (e.g. Kafka) have a limited retention period.
To address this challenge, we developed a Flink/Pinot connector to generate Upsert segments directly from batch data sources (e.g. Hive), and thus solved the backfilling problem with the historical data without dependency on Kafka.