Upgrade to Pro — share decks privately, control downloads, hide ads and more …

SF Data Engineering Feb 2017

yairwein
February 26, 2017

SF Data Engineering Feb 2017

yairwein

February 26, 2017
Tweet

Other Decks in Programming

Transcript

  1. Agenda → whoami → Why stateful processing? → Common Challenges

    → Real-life examples → Alooma in a nutshell → Q&A
  2. whoami → Yair (pronounced Ya- ) → Obsessed with data

    in the past 15 years → Alooma Founder & CTO → Convert Media (acq. by Taboola) → Military
  3. Stateful Processing -- Incentives Complex pipelines require state → User

    Sessions → Fraud Detection → Machine Learning models → Real-time stateful processing is powering an increasing number of business use cases
  4. Stateful Processing -- Challenges → Out of order / late

    events → Processing errors → Joining multiple sources → Changing computation logic
  5. Use Case - Building User Sessions → Understanding user operations

    in “one sitting end-to-end” → Which events are causing a desired operation → Which events are causing the users to get stuck → Average session time → User flows → etc etc etc
  6. Out of Order events → Google Data Flow Model (@francesjperry,

    2014) → https://www.youtube.com/watch?v=3UfZN59Nsk8
  7. Out of Order events → Intrinsic timestamp in the event

    (if possible) provides the ability to calculate the watermark → The state can hold data for older events as well (does not solve all cases) → When to send a result downstream / “Best effort” approach
  8. Processing Errors / Logic Change → Idempotency as a way

    to auto-recover from failures (https://www.alooma.com/blog/trident-exactly-on ce) → Ability to keep the erroneous events isolated while letting the rest of the pipeline flow → Ability to reprocess the raw data for a specific period of time
  9. Analytics Security And more… Built-in services Monitoring & Alerts, Filtering,

    Enrichments, State, ReStream, Dynamic Data Modelling, MDM Alooma Code Engine Stream Processor, Scheduler Distributed MQ Connectors M anagem ent Layer Connectors M anagem ent Layer AI Alooma Platform - The Real-Time Data Hub External APIs CONFIDENTIAL