Data pipelines:! • steps to extract, clean, augment, join data! • every non-trivial project has one! ! From prototype to production:! • as a Data Scientist, the focus is on R&D! • automation and replicability matter
Target: output of a task! ! class MyTarget(luigi.Target): ! def exists(self): pass # return bool ! Off-the-shelf support for local filesystem, S3,! RDBMS, Elasticsearch, …!