Lessons from 6 Months of using Luigi

Lessons from 6 Months of using Luigi

AKA Why it's better to be woken up by you cat than by the server alarm

65bea007fa26257adff7aaf5b7268e09?s=128

peteowlett

May 07, 2016
Tweet

Transcript

  1. 3.
  2. 6.

    Why it’s better to be woken up by your cat

    than by the server alarm A BETTER TITLE
  3. 11.

    Let’s Compare! - Goes off at any time, day or

    night - Loud ring tone, text messages, answer phone messages and flashing - Resolution can take hours - Goes off only once at precisely 6am - Cute batting motion to wake - Resolved in time it takes to open cat food packet
  4. 24.
  5. 28.

    We string these together to make DAGs CHECK MAX ROW

    ID LOAD DATA MOD DATA MAKE MODEL CHECK MAX ROW ID LOAD DATA TABLE1 TABLE2
  6. 37.
  7. 38.

    Schema can change anytime without warning HAS THE SCHEMA CHANGED?

    RELOAD JUST NEW ROWS DROP AND CREATE WHOLE SCHEMA RELOAD ALL TABLES NO! YES!
  8. 39.
  9. 42.

    Two new operating modes TEST MODE Run the whole pipeline

    but only write to a test schema UNIT MODE Run the current task, ignoring its dependencies
  10. 56.

    Hey cool, all our data is in one place, we

    might as well use it for BI Reporting
  11. 60.
  12. 61.

    Stuff that was happening • Irrelevant upstream failures • Low

    priority upstream failures • Flakey Data (but it worked!)
  13. 65.

    So we changed it to this START LOAD2 LOAD1 LOAD3

    LOAD ALL MAKE1 MAKE2 MAKE3 END
  14. 72.

    Loading tables more reliably DROP TABLE CREATE TABLE THIS CAN

    GO WRONG LOAD DATA Task 1 Task 2 Task 3 THIS CAN GO WRONG THIS CAN GO WRONG
  15. 73.

    Expect Failure, Rollback Transaction CREATE TEMP TABLE Task 1 (There

    is no task 2) LOAD DATA RENAME OLD TABLE RENAME NEW TABLE ROLLBACK
  16. 74.
  17. 76.
  18. 88.

    Table Loading - Take 2 HASH THE TABLE SCHEMA COMPARE

    TO LAST HASH SAME! CHANGED! DROP AND REBUILD JUST LOAD ROWS
  19. 103.