Parameters
Used to idenNfy the task
From arguments or from configuraNon
Many types of Parameters (int, date,
boolean, date range, Nme delta, dict,
enum)
Slide 31
Slide 31 text
Targets
Slide 32
Slide 32 text
Targets
Resources produced by a Task
Typically Local files or files distributed file
system (HDFS)
Must implement the method exists()
Many targets available
Batteries Included
Package contrib filled with goodies
Good support for Hadoop
Different Targets
Extensible
Slide 39
Slide 39 text
Task Types
Task -‐ Local
Hadoop MR, Pig, Spark, etc
SalesForce, ElasNcsearch, etc.
ExternalProgram
check luigi.contrib
!
Slide 40
Slide 40 text
Target
LocalTarget
HDFS, S3, FTP, SSH, WebHDFS, etc.
ESTarget, MySQLTarget, MSQL, Hive,
SQLAlchemy, etc.
Slide 41
Slide 41 text
No content
Slide 42
Slide 42 text
Tips & Tricks
Slide 43
Slide 43 text
Separate pipeline and logic
Slide 44
Slide 44 text
Extend to avoid boilerplate code
Slide 45
Slide 45 text
DRY
Slide 46
Slide 46 text
Conclusion
Luigi is a mature, baVeries-‐included
alternaNve for building data pipelines
Lacks of powerful visualizaNon of the
pipelines
Requires a external way of launching jobs
(i.e. cron).
Hard to debug MR Jobs
Slide 47
Slide 47 text
Lear More
hVps:/
/github.com/spoNfy/luigi
hVp:/
/luigi.readthedocs.io/en/stable/
Slide 48
Slide 48 text
Thanks!
Slide 49
Slide 49 text
Credits
• pipe icon by Oliviu Stoian from the Noun Project
• Photo Credit: (CC) h@ps:/
/www.flickr.com/photos/
47244853@N03/29988510886 from hb.s via Compfight
• Concrete Mixer: (CC) h@ps:/
/www.flickr.com/photos/
145708285@N03/30138453986 by MasLabor via
Compfight