Let Luigi do the plumbing for you @ PyData Lond...

Marco Bonzanini

November 03, 2015

440

Let Luigi do the plumbing for you @ PyData London meetup

Lighting talk given at the PyData London November meetup about building data pipelines in Python using Luigi.

Marco Bonzanini

November 03, 2015

Tweet

More Decks by Marco Bonzanini

See All by Marco Bonzanini

Pitfalls in Data Science Projects (and how to avoid them)

0

34

Is Your Open-source LLM Really Open?

0

41

Perambulations in Football Analytics

0

32

Natural Language Processing Expert Briefing @ PyData Global 2022

0

87

Natural Language Processing Expert Briefing @ PyData Global 2021

0

110

Getting into Data Science @ HisarCS 2021

0

250

Mining topics in documents with topic modelling and Python @ London Python meetup

1

200

Topic Modelling workshop @ PyCon UK 2019

2

100

Lies, Damned Lies, and Statistics @ PyCon UK 2019

0

110

Other Decks in Technology

See All in Technology

How to Quickly Call American Airlines®️ U.S. Customer Care : Full Guide

0

240

PHPからはじめるコンピュータアーキテクチャ / From Scripts to Silicon: A Journey Through the Layers of Computing

2

130

毎晩の負荷試験自動実行による効果

recruitengineers

5

180

「現場で活躍するAIエージェント」を実現するチームと開発プロセス

3

360

“日本一のM&A企業”を支える、少人数SREの効率化戦略 / SRE NEXT 2025

1

270

ソフトウェアQAがハードウェアの人になったの

3

200

名刺メーカーDevグループ紹介資料

0

820

TLSから見るSREの未来

2

310

セキュアなAI活用のためのLiteLLMの可能性

1

340

How Do I Contact Jetblue Airlines® Reservation Number: Fast Support Guide

thejetblueairhelpsupport

0

150

Snowflake Intelligenceという名のAI Agentが切り開くデータ活用の未来とその実現に必要なこと@SnowVillage『Data Management #1 Summit 2025 Recap!!』

1

160

Talk to Someone At Delta Airlines™️ USA Contact Numbers

travelcarecenter

0

160

Featured

See All Featured

The World Runs on Bad Software

70

11k

184

16k

Git: the NoSQL Database

430

65k

CSS Pre-Processors: Stylus, Less & Sass

357

30k

146

16k

Imperfection Machines: The Place of Print at Facebook

267

13k

ピンチをチャンスに：未来をつくるプロダクトロードマップ #pmconf2020

126

53k

The Psychology of Web Performance [Beyond Tellerrand 2023]

48

2.9k

RailsConf & Balkan Ruby 2019: The Past, Present, and Future of Rails at GitHub

138

34k

Embracing the Ebb and Flow

86

4.8k

Adopting Sorbet at Scale

77

9.5k

455

42k

Transcript

Let Luigi do the plumbing for you Building Data Pipelines
in Python 1 Marco Bonzanini @ PyData London ! 3rd November 2015
Data pipelines:! • steps to extract, clean, augment, join data!
• every non-trivial project has one! ! From prototype to production:! • as a Data Scientist, the focus is on R&D! • automation and replicability matter
Luigi: GNU make + Unix pipelines + steroids! • Workﬂow
manager in Python! • Dependency management! • Error control, checkpoints, failure recovery! • Minimal boilerplate code! • Dependency graph visualisation! ! $ pip install luigi https://github.com/spotify/luigi
Task: unit of execution! ! class MyTask(luigi.Task): ! def requires(self):
pass # list of dependencies def output(self): pass # task output def run(self): pass # task logic
Target: output of a task! ! class MyTarget(luigi.Target): ! def
exists(self): pass # return bool ! Off-the-shelf support for local ﬁlesystem, S3,! RDBMS, Elasticsearch, …!
Suggestions to Ease Deployment! • Don’t re-invent the wheel •
Develop Python packages (setup.py) • Parameterise everything (env variables: good) • Use decent logging mechanism • Docker: probably good idea
Thank You!! http://marcobonzanini.com http://twitter.com/marcobonzanini