$30 off During Our Annual Pro Sale. View Details »

Let Luigi do the plumbing for you @ PyData London meetup

Let Luigi do the plumbing for you @ PyData London meetup

Lighting talk given at the PyData London November meetup about building data pipelines in Python using Luigi.

Marco Bonzanini

November 03, 2015
Tweet

More Decks by Marco Bonzanini

Other Decks in Technology

Transcript

  1. Let Luigi do the
    plumbing for you
    Building Data Pipelines in Python
    1
    Marco Bonzanini @ PyData London
    !
    3rd November 2015

    View Slide

  2. Data pipelines:!
    • steps to extract, clean, augment, join data!
    • every non-trivial project has one!
    !
    From prototype to production:!
    • as a Data Scientist, the focus is on R&D!
    • automation and replicability matter

    View Slide

  3. Luigi: GNU make + Unix pipelines + steroids!
    • Workflow manager in Python!
    • Dependency management!
    • Error control, checkpoints, failure recovery!
    • Minimal boilerplate code!
    • Dependency graph visualisation!
    !
    $ pip install luigi
    https://github.com/spotify/luigi

    View Slide

  4. Task: unit of execution!
    !
    class MyTask(luigi.Task):
    !
    def requires(self):
    pass # list of dependencies
    def output(self):
    pass # task output
    def run(self):
    pass # task logic

    View Slide

  5. Target: output of a task!
    !
    class MyTarget(luigi.Target):
    !
    def exists(self):
    pass # return bool
    !
    Off-the-shelf support for local filesystem, S3,!
    RDBMS, Elasticsearch, …!

    View Slide

  6. Suggestions to Ease Deployment!
    • Don’t re-invent the wheel
    • Develop Python packages (setup.py)
    • Parameterise everything (env variables: good)
    • Use decent logging mechanism
    • Docker: probably good idea

    View Slide

  7. Thank You!!
    http://marcobonzanini.com
    http://twitter.com/marcobonzanini

    View Slide