Slide 1

Slide 1 text

Data engineers and data scientists: friends or foes?

Slide 2

Slide 2 text

Hello! I am Tania Allard (PhD) she / her I am a developer advocate at Microsoft Follow me on Twitter: @ixek 2

Slide 3

Slide 3 text

Talking about data scientist and data engineering relationships 3 https://pixabay.com/photos/scared-fear-person-stress-young-2840243/

Slide 4

Slide 4 text

Where every story begins...

Slide 5

Slide 5 text

Being the only data engineer….. 5 ● Writing SQL queries ● “Monitoring” ● Explaining what is and what is not big data or data engineering ● Debating SQL vs JVM

Slide 6

Slide 6 text

Being the first data scientist... ﹡ Logging (???) ﹡ Collecting data ﹡ Experimenting ﹡ Fighting with the software engineers to get things done ﹡ Fighting for tools!!!! 6

Slide 7

Slide 7 text

What they’d rather be doing? Data engineer: ﹡ Interesting engineering problems ﹡ Useful pipelines ﹡ Creating useful solutions 7 Data scientist: ﹡ Developing new models ﹡ Interesting analytics ﹡ Build insights

Slide 8

Slide 8 text

Infinite loop of sadness 8

Slide 9

Slide 9 text

Basically our lives 9

Slide 10

Slide 10 text

How can we make our lives better?

Slide 11

Slide 11 text

Being empathic to each other We need to understand what blockers we are facing as a team and work together to: 1. Devise plans 2. Find the right tools 3. Set realistic objectives 11

Slide 12

Slide 12 text

Data engineers are data scientists’ best friends 12

Slide 13

Slide 13 text

13 https://hackernoon.com/the-ai-hierarchy-of-needs-18f111fcc007 Back to the foundations

Slide 14

Slide 14 text

In smaller companies — where no data infrastructure team has yet been formalized — the data engineering role may also cover the workload around setting up and operating the organization’s data infrastructure. This includes tasks like setting up and operating platforms like Hadoop/Hive/HBase, Spark, and the like. Maxime Beauchemin - Airflow https://medium.freecodecamp.org/the-rise-of-the-data-engineer-91be18f1e6 03 14

Slide 15

Slide 15 text

Let the truth be told Without data warehouses there is the risk of any data science activity to become either too expensive or not scalable. 15

Slide 16

Slide 16 text

The data engineer as centre of excellence Definition of standards, best practices and certification and validation processes for data objects 16 https://xkcd.com/927/

Slide 17

Slide 17 text

We are also librarians Cataloguing and organising metadata, defining processes to extract data. 17 https://michaeljswart.com/

Slide 18

Slide 18 text

ETL We all do ETL

Slide 19

Slide 19 text

Extract, transform and load Blueprint of how raw-data is transformed to analysis-ready data 19 https://robinhood.engineering/why-robinhood-uses-airflow-aed13a9a90c8

Slide 20

Slide 20 text

Extract Wait for upstream data sources to land (i.e. machine or user-generated lofs, relational databases, external datasets….) Once available the data is transported for further transformations 20

Slide 21

Slide 21 text

Transform The heart of the ETL process. It requires a lot of business understanding and domain knowledge. 21

Slide 22

Slide 22 text

22

Slide 23

Slide 23 text

Load Transport the data to their final destination. 23

Slide 24

Slide 24 text

Services ﹡ Data ingestion ﹡ Metric computation ﹡ Anomaly detection -> alerting ﹡ Experimentation -> A/B testing ﹡ Instrumentation -> logging 24

Slide 25

Slide 25 text

Data engineers facilitate data science Which facilitates data-driven solutions and data-driven business…. We are paramount to the success of the data science / ML / deep learning team 25

Slide 26

Slide 26 text

We are the safeguardians: no more garbage in - garbage out 26

Slide 27

Slide 27 text

27 Tania Allard @ ixek [email protected]