Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Data engineers and data scientists: friends or foes?

Data engineers and data scientists: friends or foes?

This talk is focused on the value added to data business by data engineers and how these are paramount to the success of the data science team.


Tania Allard

April 09, 2019


  1. Data engineers and data scientists: friends or foes?

  2. Hello! I am Tania Allard (PhD) she / her I

    am a developer advocate at Microsoft Follow me on Twitter: @ixek 2
  3. Talking about data scientist and data engineering relationships 3 https://pixabay.com/photos/scared-fear-person-stress-young-2840243/

  4. Where every story begins...

  5. Being the only data engineer….. 5 • Writing SQL queries

    • “Monitoring” • Explaining what is and what is not big data or data engineering • Debating SQL vs JVM
  6. Being the first data scientist... ﹡ Logging (???) ﹡ Collecting

    data ﹡ Experimenting ﹡ Fighting with the software engineers to get things done ﹡ Fighting for tools!!!! 6
  7. What they’d rather be doing? Data engineer: ﹡ Interesting engineering

    problems ﹡ Useful pipelines ﹡ Creating useful solutions 7 Data scientist: ﹡ Developing new models ﹡ Interesting analytics ﹡ Build insights
  8. Infinite loop of sadness 8

  9. Basically our lives 9

  10. How can we make our lives better?

  11. Being empathic to each other We need to understand what

    blockers we are facing as a team and work together to: 1. Devise plans 2. Find the right tools 3. Set realistic objectives 11
  12. Data engineers are data scientists’ best friends 12

  13. 13 https://hackernoon.com/the-ai-hierarchy-of-needs-18f111fcc007 Back to the foundations

  14. In smaller companies — where no data infrastructure team has yet been

    formalized — the data engineering role may also cover the workload around setting up and operating the organization’s data infrastructure. This includes tasks like setting up and operating platforms like Hadoop/Hive/HBase, Spark, and the like. Maxime Beauchemin - Airflow https://medium.freecodecamp.org/the-rise-of-the-data-engineer-91be18f1e6 03 14
  15. Let the truth be told Without data warehouses there is

    the risk of any data science activity to become either too expensive or not scalable. 15
  16. The data engineer as centre of excellence Definition of standards,

    best practices and certification and validation processes for data objects 16 https://xkcd.com/927/
  17. We are also librarians Cataloguing and organising metadata, defining processes

    to extract data. 17 https://michaeljswart.com/
  18. ETL We all do ETL

  19. Extract, transform and load Blueprint of how raw-data is transformed

    to analysis-ready data 19 https://robinhood.engineering/why-robinhood-uses-airflow-aed13a9a90c8
  20. Extract Wait for upstream data sources to land (i.e. machine

    or user-generated lofs, relational databases, external datasets….) Once available the data is transported for further transformations 20
  21. Transform The heart of the ETL process. It requires a

    lot of business understanding and domain knowledge. 21
  22. 22

  23. Load Transport the data to their final destination. 23

  24. Services ﹡ Data ingestion ﹡ Metric computation ﹡ Anomaly detection

    -> alerting ﹡ Experimentation -> A/B testing ﹡ Instrumentation -> logging 24
  25. Data engineers facilitate data science Which facilitates data-driven solutions and

    data-driven business…. We are paramount to the success of the data science / ML / deep learning team 25
  26. We are the safeguardians: no more garbage in - garbage

    out 26
  27. 27 Tania Allard @ ixek tania.allard@microsoft.com