Streaming Ingestion & Processing at Flipkart

Presented at the Bangalore Hadoop Meetup held on 15th May 2015.


Siddhartha Reddy

May 15, 2015


  2. Flipkart Data Platform (an oversimplified view)

  3. Streaming Ingestion

  4. Choices • push, not pull • schemas & validations

  5. Streaming Ingestion v1.0

  7. • Push 㱺 accountability (with source teams) • good call!

    • Schemas 㱺 contracts for consumers • can make assumptions that are assured to be true • Insufficient tooling 㱺 too many “ingestion frameworks” • adopt some frameworks & offer as tools! • Synchronous error handling 㱺 complexity • accept all data
  8. Streaming Ingestion v2.0

  9. Stream Processing

  10. An Example

  11. Streaming Joins: Example It works! But… how do we deal

    with lookup failures?
  12. Streaming Joins: Handling Failures

  15. Streaming Joins: Bootstrapping With a little help from MR friends

  16. Streaming Joins: But… The example that doesn’t really work correctly

  17. Streaming Joins

  18. In summary • Streaming Ingestion: push, schemas & validation, HTTP

    service, local daemon, change data capture • Streaming Joins: indexing, lookup tables, map-joins, retry queue, batch re-driver