Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Data engineers and data scientists: friends or ...

Data engineers and data scientists: friends or foes?

This talk is focused on the value added to data business by data engineers and how these are paramount to the success of the data science team.

Tania Allard

April 09, 2019
Tweet

More Decks by Tania Allard

Other Decks in Technology

Transcript

  1. Hello! I am Tania Allard (PhD) she / her I

    am a developer advocate at Microsoft Follow me on Twitter: @ixek 2
  2. Being the only data engineer….. 5 • Writing SQL queries

    • “Monitoring” • Explaining what is and what is not big data or data engineering • Debating SQL vs JVM
  3. Being the first data scientist... ﹡ Logging (???) ﹡ Collecting

    data ﹡ Experimenting ﹡ Fighting with the software engineers to get things done ﹡ Fighting for tools!!!! 6
  4. What they’d rather be doing? Data engineer: ﹡ Interesting engineering

    problems ﹡ Useful pipelines ﹡ Creating useful solutions 7 Data scientist: ﹡ Developing new models ﹡ Interesting analytics ﹡ Build insights
  5. Being empathic to each other We need to understand what

    blockers we are facing as a team and work together to: 1. Devise plans 2. Find the right tools 3. Set realistic objectives 11
  6. In smaller companies — where no data infrastructure team has yet been

    formalized — the data engineering role may also cover the workload around setting up and operating the organization’s data infrastructure. This includes tasks like setting up and operating platforms like Hadoop/Hive/HBase, Spark, and the like. Maxime Beauchemin - Airflow https://medium.freecodecamp.org/the-rise-of-the-data-engineer-91be18f1e6 03 14
  7. Let the truth be told Without data warehouses there is

    the risk of any data science activity to become either too expensive or not scalable. 15
  8. The data engineer as centre of excellence Definition of standards,

    best practices and certification and validation processes for data objects 16 https://xkcd.com/927/
  9. Extract, transform and load Blueprint of how raw-data is transformed

    to analysis-ready data 19 https://robinhood.engineering/why-robinhood-uses-airflow-aed13a9a90c8
  10. Extract Wait for upstream data sources to land (i.e. machine

    or user-generated lofs, relational databases, external datasets….) Once available the data is transported for further transformations 20
  11. Transform The heart of the ETL process. It requires a

    lot of business understanding and domain knowledge. 21
  12. 22

  13. Services ﹡ Data ingestion ﹡ Metric computation ﹡ Anomaly detection

    -> alerting ﹡ Experimentation -> A/B testing ﹡ Instrumentation -> logging 24
  14. Data engineers facilitate data science Which facilitates data-driven solutions and

    data-driven business…. We are paramount to the success of the data science / ML / deep learning team 25