Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Lessons from 6 Months of using Luigi
Search
Sponsored
·
Ship Features Fearlessly
Turn features on and off without deploys. Used by thousands of Ruby developers.
→
peteowlett
May 07, 2016
Technology
970
4
Share
Embed
Copy iframe code
Copy JS code
Copy link
Start on current slide
Lessons from 6 Months of using Luigi
AKA Why it's better to be woken up by you cat than by the server alarm
peteowlett
May 07, 2016
More Decks by peteowlett
See All by peteowlett
Takeaway Tales
peteowlett
1
220
Send More Riders
peteowlett
4
970
Other Decks in Technology
See All in Technology
マーケットプレイス版Oracle WebCenter Content For OCI
oracle4engineer
PRO
5
1.8k
Rancherの紹介&Update情報(RancherJP Online Meetup #09)
yoshiyuki_kono
0
130
AI Engineering Summit Tokyo 2026 AIの前に、やることがある 〜医療データ企業の4フェーズ〜
dtaniwaki
0
2.2k
AI駆動開発が変える、大規模開発の前提 ーHuman in the Loop から Human on the Loop へ / AIE2026
visional_engineering_and_design
27
17k
AI Testing Talks: Challenges of Applying AI in Software Testing: From Hype to Practical Use
exactpro
PRO
1
140
Oracle AI Database@Google Cloud:サービス概要のご紹介
oracle4engineer
PRO
6
1.5k
PHP と TypeScript の型システム比較:AI 時代の「型」は誰のためにあるのか? #frontend_phpcon_do / frontend_phpcon_do_2026
shogogg
1
260
ポケモンの型をTypeScriptの型システムで表現してみた
subroh0508
0
350
AI活用を推進するために ファインディが下した、一つの小さな決断
starfish719
0
260
「コーディング」しない人のための Claude Code 入門 ChatGPT の次の一歩 — 業務に組み込む 育成・共有・自動化
rfdnxbro
2
1.2k
作って終わりにしない タイミーのセマンティックレイヤー育成の現在地
chanyou0311
0
270
SIer20年! 培ったスキルがスタートアップで輝く時
shucho0103
0
670
Featured
See All Featured
Exploring anti-patterns in Rails
aemeredith
3
400
Exploring the Power of Turbo Streams & Action Cable | RailsConf2023
kevinliebholz
37
6.5k
Sam Torres - BigQuery for SEOs
techseoconnect
PRO
0
280
AI: The stuff that nobody shows you
jnunemaker
PRO
8
690
Evolution of real-time – Irina Nazarova, EuRuKo, 2024
irinanazarova
9
1.4k
The MySQL Ecosystem @ GitHub 2015
samlambert
251
13k
The SEO Collaboration Effect
kristinabergwall1
1
480
Automating Front-end Workflow
addyosmani
1370
210k
Design in an AI World
tapps
1
230
The Success of Rails: Ensuring Growth for the Next 100 Years
eileencodes
47
8.2k
Creating an realtime collaboration tool: Agile Flush - .NET Oxford
marcduiker
35
2.5k
Gemini Prompt Engineering: Practical Techniques for Tangible AI Outcomes
mfonobong
2
430
Transcript
Lessons from 6 months of using Luigi in production @peterowlett
@deliveroo
Hello! I’m Pete
None
I work for these folks
WE DO THIS
Why it’s better to be woken up by your cat
than by the server alarm A BETTER TITLE
This is Kitty
NO RESPECT FOR PERSONAL SPACE
This is PagerDuty
Even less respect for personal space
Let’s Compare! - Goes off at any time, day or
night - Loud ring tone, text messages, answer phone messages and flashing - Resolution can take hours - Goes off only once at precisely 6am - Cute batting motion to wake - Resolved in time it takes to open cat food packet
I think we can all agree with my premise Kitty
>> PagerDuty
Lets get to it
Chapter 1 “The Model”
Let’s build a model
I’m ready, where’s the data?
“Just pg_dump the prod db”
OH PLS PLS NO DON’T DO THAT
Lets spin up a read slave and ETL the data
to a warehouse …
… then train our models from that
How do we ensure tasks run in Order?
I want them to run one after the other 2
1 3 4 5 6 7 8
Directed Acyclic Graph
None
Enter Stage Left …
Simple Task
Postgres Loader Task
We string these together to make DAGs CHECK MAX ROW
ID LOAD DATA MOD DATA MAKE MODEL CHECK MAX ROW ID LOAD DATA TABLE1 TABLE2
DAGs solve the dependency problem
Bung it all on EC2
Define an entry point
Run the scheduler
Kick it all off with CRON
With luigi we were up and running in a few
hours
Chapter 2 “The Nuclear Option”
A few weeks later, something happened …
None
Schema can change anytime without warning HAS THE SCHEMA CHANGED?
RELOAD JUST NEW ROWS DROP AND CREATE WHOLE SCHEMA RELOAD ALL TABLES NO! YES!
None
Handle schema changes robustly
Let’s test our pipeline before we deploy it. But how?
Two new operating modes TEST MODE Run the whole pipeline
but only write to a test schema UNIT MODE Run the current task, ignoring its dependencies
Configure these modes in the pipeline using luigi.Parameter
Now nothing will ever go wrong, ever again …
Make your testing comprehensive
Make your testing fast
Adding in external API services
Build Loaders for each API
Loading Schedules
Plumbing them in
Keep def rows as short as possible
Be consistent in loader design pattern
Expect external API services to misbehave
Expect external API services to misbehave X
Trust external API services as if they actively want to
hurt you
Hey cool, all our data is in one place, we
might as well use it for BI Reporting
Chapter 3 “The Management Report”
This happened
And then your ad hoc database is now supporting global
business critical apps
None
Stuff that was happening • Irrelevant upstream failures • Low
priority upstream failures • Flakey Data (but it worked!)
Our DAG looked like this: START LOAD2 LOAD1 LOAD3 LOAD
DONE MAKE1 MAKE2 MAKE3 END
And this was happening START LOAD2 LOAD1 LOAD3 LOAD DONE
MAKE1 MAKE2 MAKE3 END
This one doesn’t need LOAD3 START LOAD2 LOAD1 LOAD3 LOAD
DONE MAKE1 MAKE2 MAKE3 END
So we changed it to this START LOAD2 LOAD1 LOAD3
LOAD ALL MAKE1 MAKE2 MAKE3 END
Now when 3 fails: START LOAD2 LOAD1 LOAD3 LOAD ALL
MAKE1 MAKE2 MAKE3 END
Decide about what can be allowed to fail, and what
can’t
Isolate the path to critical ops jobs
Loading tables more reliably 5AM SATURDAY
The currency table failed to update
Loading tables more reliably DROP TABLE CREATE TABLE LOAD DATA
Task 1 Task 2 Task 3
Loading tables more reliably DROP TABLE CREATE TABLE THIS CAN
GO WRONG LOAD DATA Task 1 Task 2 Task 3 THIS CAN GO WRONG THIS CAN GO WRONG
Expect Failure, Rollback Transaction CREATE TEMP TABLE Task 1 (There
is no task 2) LOAD DATA RENAME OLD TABLE RENAME NEW TABLE ROLLBACK
None
Encapsulate logic in bigger chunks
None
Anticipating problems early
Going beyond system monitoring
Defined Monitoring Tests
Measuring outcomes directly
And get gentler alerts in slack
Monitor (and alert on) outcomes as well as system metrics
Try / Except / Slack Alert low priority tasks
Chapter 4 “Hi, this is Australia calling …”
In the beginning there was the UK YAY! DOWNTIME!! Midnight
Midnight Midday UK Ops
Then Europe Ok cool still loads of downtime Midnight Midnight
Midday
Then Some Other Places No such thing as downtime anymore
Midnight Midnight Midday
Table Loading - Take 2 HASH THE TABLE SCHEMA COMPARE
TO LAST HASH SAME! CHANGED! DROP AND REBUILD JUST LOAD ROWS
Get rid of the nuclear option
SORRY RIPLEY
Chapter 5 “Moving to Scale”
When it comes to BI, Old School Rules Still Apply
Configuration Management (Docker + ECS)
Distributed workers make some pain go away
Protobuf3 on Message Bus
Final Thoughts
I regret nothing!
Everything is defined in code
Two people, tiny budget
Time spent speeding up the build process is time well
spent
Think carefully about what dependencies *mean*
To finish …
None
We’re hiring! Grab me after :) https://roo.it/peteo Also £5 off
your first order!
Sleep Well! @peterowlett @deliveroo Sleep Well!