Upgrade to Pro — share decks privately, control downloads, hide ads and more …

SF Data Engineering Feb 2017

Avatar for yairwein yairwein
February 26, 2017

SF Data Engineering Feb 2017

Avatar for yairwein

yairwein

February 26, 2017
Tweet

Other Decks in Programming

Transcript

  1. Agenda → whoami → Why stateful processing? → Common Challenges

    → Real-life examples → Alooma in a nutshell → Q&A
  2. whoami → Yair (pronounced Ya- ) → Obsessed with data

    in the past 15 years → Alooma Founder & CTO → Convert Media (acq. by Taboola) → Military
  3. Stateful Processing -- Incentives Complex pipelines require state → User

    Sessions → Fraud Detection → Machine Learning models → Real-time stateful processing is powering an increasing number of business use cases
  4. Stateful Processing -- Challenges → Out of order / late

    events → Processing errors → Joining multiple sources → Changing computation logic
  5. Use Case - Building User Sessions → Understanding user operations

    in “one sitting end-to-end” → Which events are causing a desired operation → Which events are causing the users to get stuck → Average session time → User flows → etc etc etc
  6. Out of Order events → Google Data Flow Model (@francesjperry,

    2014) → https://www.youtube.com/watch?v=3UfZN59Nsk8
  7. Out of Order events → Intrinsic timestamp in the event

    (if possible) provides the ability to calculate the watermark → The state can hold data for older events as well (does not solve all cases) → When to send a result downstream / “Best effort” approach
  8. Processing Errors / Logic Change → Idempotency as a way

    to auto-recover from failures (https://www.alooma.com/blog/trident-exactly-on ce) → Ability to keep the erroneous events isolated while letting the rest of the pipeline flow → Ability to reprocess the raw data for a specific period of time
  9. Analytics Security And more… Built-in services Monitoring & Alerts, Filtering,

    Enrichments, State, ReStream, Dynamic Data Modelling, MDM Alooma Code Engine Stream Processor, Scheduler Distributed MQ Connectors M anagem ent Layer Connectors M anagem ent Layer AI Alooma Platform - The Real-Time Data Hub External APIs CONFIDENTIAL