Slide 6
Slide 6 text
The downfall of the data engineer
Watching paint dry is exciting in comparison to writing and maintaining Extract
Transform and Load (ETL) logic. Most ETL jobs take a long time to execute and errors
or issues tend to happen at runtime or are post-runtime assertions. Since the
development time to execution time ratio is typically low, being productive means
juggling with multiple pipelines at once and inherently doing a lot of context
switching. By the time one of your 5 running “big data jobs” has finished, you have to
get back in the mind space you were in many hours ago and craft your next iteration.
Depending on how caffeinated you are, how long it’s been since the last iteration,
and how systematic you are, you may fail at restoring the full context in your short-
term memory. This leads to systemic, stupid errors that waste hours.
“
”
Maxime Beauchemin
Data engineer @ Lyft
Also, creator of Apache Airflow and Apache Superset. Ex-Facebook, Ex-Yahoo!, Ex-Airbnb
medium.com/@maximebeauchemin/the-downfall-of-the-data-engineer-5bfb701e5d6b