Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Lessons from 6 Months of using Luigi

Lessons from 6 Months of using Luigi

AKA Why it's better to be woken up by you cat than by the server alarm

peteowlett

May 07, 2016
Tweet

More Decks by peteowlett

Other Decks in Technology

Transcript

  1. Why it’s better to be woken up by your cat

    than by the server alarm A BETTER TITLE
  2. Let’s Compare! - Goes off at any time, day or

    night - Loud ring tone, text messages, answer phone messages and flashing - Resolution can take hours - Goes off only once at precisely 6am - Cute batting motion to wake - Resolved in time it takes to open cat food packet
  3. We string these together to make DAGs CHECK MAX ROW

    ID LOAD DATA MOD DATA MAKE MODEL CHECK MAX ROW ID LOAD DATA TABLE1 TABLE2
  4. Schema can change anytime without warning HAS THE SCHEMA CHANGED?

    RELOAD JUST NEW ROWS DROP AND CREATE WHOLE SCHEMA RELOAD ALL TABLES NO! YES!
  5. Two new operating modes TEST MODE Run the whole pipeline

    but only write to a test schema UNIT MODE Run the current task, ignoring its dependencies
  6. Hey cool, all our data is in one place, we

    might as well use it for BI Reporting
  7. Stuff that was happening • Irrelevant upstream failures • Low

    priority upstream failures • Flakey Data (but it worked!)
  8. So we changed it to this START LOAD2 LOAD1 LOAD3

    LOAD ALL MAKE1 MAKE2 MAKE3 END
  9. Loading tables more reliably DROP TABLE CREATE TABLE THIS CAN

    GO WRONG LOAD DATA Task 1 Task 2 Task 3 THIS CAN GO WRONG THIS CAN GO WRONG
  10. Expect Failure, Rollback Transaction CREATE TEMP TABLE Task 1 (There

    is no task 2) LOAD DATA RENAME OLD TABLE RENAME NEW TABLE ROLLBACK
  11. Table Loading - Take 2 HASH THE TABLE SCHEMA COMPARE

    TO LAST HASH SAME! CHANGED! DROP AND REBUILD JUST LOAD ROWS