Upgrade to Pro — share decks privately, control downloads, hide ads and more …

What Makes or Breaks a Data Engineer?

31f243f70d27ad14eed022f88cc7e428?s=47 soobrosa
December 21, 2020

What Makes or Breaks a Data Engineer?

Some call them research engineers, some machine learning engineers, some BI engineers and some data engineers. Still cloud migrations, Hacker News frontpage darlings and evil APIs can break them. How to become a good one and what makes a good investment to learn in the Dépêche Mode of data technology?



December 21, 2020


  1. What Makes or Breaks a Data Engineer? Daniel Molnar Pipeline

    Data Engineering Academy Data Natives Unlimited, 2020
  2. Quick Agenda » Who is this bloke? » What is

    a data engineer? » A good one, maybe? » What breaks them? » What makes them? » Hope, anyone?
  3. What's my perspective? » Startups for 20 years (B2C, e-commerce,

    productivity, music, e-learning). » 10 years data (Shopify, Zalando, Microsoft, Wunderlist). » Assembly, BASIC, Pascal, C, Java, Python, Ruby, R, SQL, ... since 1986. » First website in 1994 - tech agnostic. » Taught sound technology, e-business, presentation -- now data engineering.
  4. Data Engineer? Some call them » research engineers, » machine

    learning engineers, » BI developers, » dataops, » data engineers.
  5. "Is there a role like data engineer???" — ex-data scientist,

    now product owner at listed company, 2018
  6. Not one of today's data engineers grew up as a

    kid imagining becoming one. (Peter Fabian, co-founder, MD)
  7. A data engineer solves data-related engineering problems in a maintainable

    way. Also talks a lot.
  8. The Pyramid

  9. The DE Pyramid

  10. The DE Pyramid

  11. The Field

  12. How to become a good one?

  13. Attitude » Don't write code, solve the problem. » Keep

    It Simple Stupid. » Separation of concerns. » Delete, remove, retire. » Code is dependency. » Others' code is dependency squared.
  14. Skills » Data Acquisition (entities) » Telemetry (queues) » ELT/ETL

    (scheduling, dependencies) » Datastores and Storage (files, latencies) » Data Warehousing and Computation (SQL, MPP) » BI Tooling and Data Quality (metadata) » Clouds and Data Stacks (KISS)
  15. What breaks them? » Trying to hire junior data engineers

    » Dépêche Mode » Marketing » Cloud migrations
  16. Hiring junior DEs "How can I become a data engineer?

    Learn Spark and Kafka, read Kleppmann." — /r/datanengineering
  17. If you get asked to get data from a 3rd

    party API, and your plan is to fire up a Spark cluster, the call is ended.
  18. #1: A patchy Spark... ... is an academical demo app

    for Mesos, an abandoned orchestrator.
  19. #1: A patchy Spark... ... is an academical demo app

    for Mesos, an abandoned orchestrator.
  20. #1: A patchy Spark... ... is an academical demo app

    for Mesos, an abandoned orchestrator. tl;dr Maybe, for some things. Can we just put it where Hadoop is?
  21. #2: Kafka... ...is when you put facts into a queue

    and hope for the best.
  22. Marketing

  23. State of the clouds 2020

  24. Amazon Web Services, AWS (2006) » way too many different

    services. Azure (2008) » all enterprise will have to go there, » they have sales, » random horror stories (things vanish).
  25. Google Cloud Platform, GCP (2008) » "not having a customer-service

    bone in the body" (The Economist), » lots of outages, documented, » service portfolio is kind of random, » ex-Oracle leadership hiring from SAP, » for the cheaps,
  26. What makes them?

  27. Why one would want to become a DE? » There

    is a need for them. » Plumbers are system relevant workers.
  28. Who one would want to become a DE? » data

    scientists (Covid reality check), » business analysts (more moat), » (smart) front end engineers.
  29. What to learn? "One is never over-dressed or under- dressed

    with a Little Black Dress." — Karl Lagerfeld
  30. What to learn? "If you know how to make a

    Little Black Dress, then you can do fast fashion." — me.
  31. Where? Fast? Deep? » band aids aka branded lock-in trainings

    (cloud providers, OS-as- a-marketing-tool - Cloudera, Databricks), » bluff tools out of context (Udemy), » self-driven bytesize Sudoku (DataQuest, DataCamp), » pass teh interview, Spark, Kafka (Coursera, Udacity), » bootcamps, few, » multi-year streaming sub
  32. The past is the future, choose boring » UNIX Shell

    (1971), » SQL (1974), » Python (1991), » Kubernetes YAML hell (2014). "The longer a technology lives, the longer it can be expected to live." — Nassim N. Taleb (way of Mandelbrot, aka Lindy effect)
  33. #HOPE

  34. Reflect on the Zeitgeist » Snowflake (MPP SQL, 2012), »

    Dagster by Elementl (Python, 2019), » DBT by Fishtown Analytics (Python, 2018).
  35. Snowflake (MPP SQL, 2012) » Hottest tech IPO of 2020

    ($70b) » Solid company
  36. Snowflake (MPP SQL, 2012) 2019 2020 Revenue $97m $264.7m Losses

    $178m $348.5m
  37. Dagster by Elementl (Python, 2019, 1.8m$) » "graph of computations

    that consume and produce data assets" (BS), » "acknowledge complexity", "heterogenous" - freeride, » "abstract away running environment" - who cares, » tumbleweed 'architecture' or meta-guano-glue?
  38. Silicone

  39. DBT by Fishtown Analytics (Python, 2018, $42.4m) » "it just

    talks to my heart" (BI lead), » SQL-first (shame on me with Ruby, if all can stand Jinja :shrug:), » enforces good practice vs like notebooks, » a power tool for BI to take over DS, » who gets to engineering best practice first? BI or DS?
  40. None
  41. Thank you! @soobrosa Read Morozov "The Meme Hustler" and thanks

    fly for the to visuals: @mrogati, @xkcd, @rahulj51, @jaykreps, @DorsaAmir, @luismisanchez, Coco Chanel, Depeche Mode, Christopher Bolard, Tomasz Dudek.