Data Pipelines as
Software Structures
Berlin Buzzwords 2017
@brapse
Slide 2
Slide 2 text
No content
Slide 3
Slide 3 text
Data Pipelines
The software structures which emerge to
process and disseminate information.
A connected set of map reduce jobs for loading data into (a)
database(s).
Slide 4
Slide 4 text
No content
Slide 5
Slide 5 text
Why Data Pipelines?
To integrate diverse perspectives.
Enable and empower collaboration between diverse sets of
domain experts.
Slide 6
Slide 6 text
How do we build them?
Often Badly.
Misunderstood domain.
Misunderstood integration.
Misunderstood coordination.
Slide 7
Slide 7 text
Thesis:
Data Pipelines emerge and grow to reflect
collaboration between domains and are
impeded by incidental coordination.
Slide 8
Slide 8 text
Abstract
Slide 9
Slide 9 text
Evolution
Slide 10
Slide 10 text
Bounded Context
Slide 11
Slide 11 text
Storage
Slide 12
Slide 12 text
Teams
Slide 13
Slide 13 text
Nested splits
Teams > Contexts > Storage
Slide 14
Slide 14 text
Mapping
Slide 15
Slide 15 text
Mapping Storage Boundaries
Slide 16
Slide 16 text
Mapping Context Boundaries
Slide 17
Slide 17 text
Mapping Team Boundaries
Slide 18
Slide 18 text
Coordination
Slide 19
Slide 19 text
Coordinating Definitions
Slide 20
Slide 20 text
Coordinating Correctness
Slide 21
Slide 21 text
Coordinating Failure
Slide 22
Slide 22 text
Concrete
Slide 23
Slide 23 text
No content
Slide 24
Slide 24 text
Correctness
Slide 25
Slide 25 text
Fingerprints
Slide 26
Slide 26 text
Fingerprints
Slide 27
Slide 27 text
Fingerprints
Slide 28
Slide 28 text
No content
Slide 29
Slide 29 text
No content
Slide 30
Slide 30 text
Coordinate Change
Slide 31
Slide 31 text
Coordinate Change
Slide 32
Slide 32 text
Coordinate Change
Slide 33
Slide 33 text
Convergence
Slide 34
Slide 34 text
Failure
Slide 35
Slide 35 text
Retroactivity
Slide 36
Slide 36 text
Mutation
Slide 37
Slide 37 text
Mutation
Slide 38
Slide 38 text
Immutability
Slide 39
Slide 39 text
No content
Slide 40
Slide 40 text
Conclusions
Data pipelines enable coordination but require
shared protocols to determine when and how
we read and write data.
Slide 41
Slide 41 text
Emily Green Omid Aladini
S e b a s t i a n O h m F ro n x
Wurmus Matthias Georgi
Thank You David Whiting
Lorand Kasler Gavin Bell Jon
Glover Erik Bartels