Some call them research engineers, some machine learning engineers, some BI engineers and some data engineers. Still cloud migrations, Hacker News frontpage darlings and evil APIs can break them. How to become a good one and what makes a good investment to learn in the Dépêche Mode of data technology?
What Makes or Breaks
is a Data Engineer?
Pipeline Data Engineering Academy
» No such thing?
» Maybe still?
» What breaks them?
» What makes them?
» Hope, anyone?
» I co-funded the ﬁrst data engineering bootcamp ...
» ... during Covid
» 12 years of building data teams
@ Shopify, Microsoft, Wunderlist, Zalando
» 23 years building startups
B2C, e-commerce, productivity, music, e-learning
» I built web before Java (1995)
Some call them:
» BI developers,
» research engineers,
» machine learning engineers,
» analytics engineers (aka pissed off data analysts),
» data engineers.
"Is there even such a role
like data engineer???"
— ex-DS, ex-PO at now-DAX, 2018
"Not one of today's data engineers
grew up as a kid imagining
(Peter Fabian, co-founder, MD)
A data engineer
in a maintainable way.
Also talks a lot.
The DE Pyramid
The DE Pyramid
State of the Union at Company X
» (C*O/VP): "I have a Data Scientist. :)"
» me: "What does your Data Scientist do?"
» (C*O/VP): "Bad Data Engineering. At least 80% of the time."
» me: "You need a Data Engineer. You don't need a Data Scientist."
» (C*O/VP): "But it's impossible to hire a Data Engineer."
» me: "
» I dare you to change your title on LinkedIn to
for a week.
» Be nice with the headhunters. It's the hard part.
» Most people getting hired as a data engineer know exactly that
much about DE as you.
What breaks them?
» Trying to hire other DEs
» Dépêche Mode
» (Cloud migrations)
When we were young (2016)
Funding, 2021 Q1-Q3, million$
» Platform: Databricks 2600, Dataiku 400, Datarobot 300
» BI: Grafana 220, Preset 36, Streamlit 35, Metabase 30
» DQ: Monte Carlo 85, Great Expectations 21
» DWH: neo4J 325, Cockroach 160, Dremio 135, Firebolt 127,
Startburst 100, Clickhouse 50, Timescale 40
» ETL: dbt 150, Matillion 100, Prefect 43, Airbyte 31, Snowplow 10,
Hype = bullshit
» data lakehouse,
» reverse ETL,
» data mesh.
» Sound like sex positions to me.
Reﬂect on the Zeitgeist
dbt most Apache
Great Expectations especially Hadoop
Prefect Airﬂow (shit, but popular)
Presto Spark, Databricks (aged badly pretty fast)
Superset/Preset.io chart.io (shot dead by Atlassian)
a data engineer?
Why to become one?
» Plumbers are system relevant workers
» There is a need for them
» DE open positions >> DS open positions
» DE salaries >> DS salaries
» Market justiﬁes added value by salary
Who would become a DE?
» data scientists (Covid reality check),
» business analysts (more moat),
» any engineer.
What to learn?
"One is never over-dressed or under-
dressed with a Little Black Dress."
— Karl Lagerfeld
What to learn?
"If you know how to make a Little Black
Dress, then you can do fast fashion."
Where? Fast? Deep?
» self-driven, tool oriented band aids (cloud providers, OS-as-a-
marketing-tool - Cloudera, Databricks, Udemy),
» self-driven bytesize Sudoku (DataQuest, DataCamp),
» lonely places (Coursera, Udacity),
» bootcamps, few,
» few universities.
The past is the future,
» UNIX Shell (1971),
» SQL (1974),
» Python (1991),
» Kubernetes YAML hell (2014)?
"The longer a technology lives, the longer it can be expected to live."
— Nassim N. Taleb (way of Mandelbrot, aka Lindy effect)
How to become
(a good) one?
Knowing this, ...
... this, ...
... and this.
Read Morozov: "The Meme Hustler" and thanks ﬂy for the to visuals: @mrogati, @xkcd,
@DorsaAmir, @luismisanchez, Coco Chanel, Depeche Mode, Christopher Bolard, Tomasz Dudek,
James Mickens, @bfaludi, @FirstMarkCap.