Streaming Ingestion & Processing at Flipkart

Streaming Ingestion & Processing at Flipkart

Presented at the Bangalore Hadoop Meetup held on 15th May 2015.

4dcec668adc940079e59c9fce1666253?s=128

Siddhartha Reddy

May 15, 2015
Tweet

Transcript

  1. Streaming Ingestion & Processing at Flipkart Siddhartha Reddy @sids

  2. Flipkart Data Platform (an oversimplified view)

  3. Streaming Ingestion

  4. Choices • push, not pull • schemas & validations

  5. Streaming Ingestion v1.0

  6. None
  7. • Push 㱺 accountability (with source teams) • good call!

    • Schemas 㱺 contracts for consumers • can make assumptions that are assured to be true • Insufficient tooling 㱺 too many “ingestion frameworks” • adopt some frameworks & offer as tools! • Synchronous error handling 㱺 complexity • accept all data
  8. Streaming Ingestion v2.0

  9. Stream Processing

  10. An Example

  11. Streaming Joins: Example It works! But… how do we deal

    with lookup failures?
  12. Streaming Joins: Handling Failures

  13. None
  14. None
  15. Streaming Joins: Bootstrapping With a little help from MR friends

  16. Streaming Joins: But… The example that doesn’t really work correctly

  17. Streaming Joins

  18. In summary • Streaming Ingestion: push, schemas & validation, HTTP

    service, local daemon, change data capture • Streaming Joins: indexing, lookup tables, map-joins, retry queue, batch re-driver sid@flipkart.com