A simple way to describe computation in a cluster ! • Pipelines are useful because… – they’re reproducible – test locally before you run on the cluster – collaborate on them as easily as code
which tells us what data has changed • Our pipelines tell us how those changes affect our results • Combined, this gets us efficient incremental processing