Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Mechanics of Data Pipelines

Sean Braithwaite
October 05, 2016
32

Mechanics of Data Pipelines

This talk focused the topic on how to model data pipelines as retroactive, immutable data structures. Essentially it covers the topic of how do you build a data pipelines for a growing organization where different teams depend on each others data and need to be able to re-process data when errors occur upstream.

Sean Braithwaite

October 05, 2016
Tweet

Transcript

  1. Who Am I Data Science & Engineering * Machine Learning

    * Distributed Systems * Computation Art
  2. Organisations which design systems ... are constrained to produce designs

    which are copies of the communication structures of these organisations — M. Conway Conway's law
  3. Challenges * Counts are subject to spam * Deleting data

    is painful * Can’t do full lambda
  4. TODO It’s a mess Emergent Data Pipelines * Different runtimes

    * Untouchable Legacy * Different processing steps * Cross dependencies
  5. Failure * Transient Failure * Input set variance * Hard

    to test new code * Misunderstood dependency
  6. TODO It’s a mess Consequence * Blocking Failures * Risk

    Aversion * Manual Intervention * Low coherence
  7. * Conways law is real * Design as data structure

    * Abstract and apply Conclusion
  8. Emily Green Omid Aladini S e b a s t

    i a n O h m F ro n x Wurmus Matthias Georgi Thank You David Whiting Lorand Kasler Gavin Bell Jon Glover Erik Bartels