Slide 1

Slide 1 text

Dânia Meira 7 November 2020 developHER Remote Edition A beginner’s guide to Data Science

Slide 2

Slide 2 text

developHER Remote Edition 2020 - A beginner’s guide to Data Science - Dânia Meira l A beginner’s guide to Data Science Introduction Data science, AI, Machine learning? Data Roles and Skills #datacareer Orientation 2 developHER Remote Edition 2020 - A beginner’s guide to Data Science - Dânia Meira

Slide 3

Slide 3 text

developHER Remote Edition 2020 - A beginner’s guide to Data Science - Dânia Meira l A beginner’s guide to Data Science Introduction Data science, AI, Machine learning? Data Roles and Skills #datacareer Orientation 3 developHER Remote Edition 2020 - A beginner’s guide to Data Science - Dânia Meira

Slide 4

Slide 4 text

developHER Remote Edition 2020 - A beginner’s guide to Data Science - Dânia Meira DÂNIA MEIRA ● ML models for predictive analytics ● #datacareer since 2012 ● Former bootcamp teacher ● Data scientist at myToys from 2018 to 2020 ● Founding member, AI Guild linkedin.com/in/daniameira/ 4 Introduction 1/2 developHER Remote Edition 2020 - A beginner’s guide to Data Science - Dânia Meira

Slide 5

Slide 5 text

developHER Remote Edition 2020 - A beginner’s guide to Data Science - Dânia Meira #datacommunity #datacareer #datalift linkedin.com/company/ai-guild twitter.com/ai_guild medium.com/ai-guild eventbrite.de/o/ai-guild-27115216103 bit.ly/youtube-ai-guild theguild.ai 5 Introduction 2/2

Slide 6

Slide 6 text

developHER Remote Edition 2020 - A beginner’s guide to Data Science - Dânia Meira l A beginner’s guide to Data Science Introduction Data science, AI, Machine learning? Data Roles and Skills #datacareer Orientation 6 developHER Remote Edition 2020 - A beginner’s guide to Data Science - Dânia Meira

Slide 7

Slide 7 text

developHER Remote Edition 2020 - A beginner’s guide to Data Science - Dânia Meira Artificial Intelligence (AI) Machine Learning (ML) Data Science (DS) A multidisciplinary field that uses scientific, computational and statistical methods to draw insights and build predictive models from data. Statistical techniques and algorithms that computer systems use to perform a specific task without explicit programming instructions, but instead processing data to detect patterns and inference. Deep Learning (DL) Type of ML methods based on artificial neural networks, algorithms inspired by the human brain, that learn from processing vast amounts of data. 7 Data Science, AI, Machine Learning? 1/8 A set of techniques that enable computers to perform specific tasks that mimic human intelligence using logic, if-else rules, and machine learning.

Slide 8

Slide 8 text

developHER Remote Edition 2020 - A beginner’s guide to Data Science - Dânia Meira AI Use Cases AI is good at focused tasks with a clear outcome: ● It works best when there is a very large amount of training data. ● It works well for specific cases where other methods fail - outlier detection, sparse matrix work. 8 Data Science, AI, Machine Learning? 2/8 https://pair.withgoogle.com/worksheet/user-needs.pdf

Slide 9

Slide 9 text

developHER Remote Edition 2020 - A beginner’s guide to Data Science - Dânia Meira Data Science = Analytics + ML 9 Data Science A multidisciplinary field that uses scientific, computational and statistical methods to draw insights and build predictive models from data Analytics Task - understanding the business and using data to make better decisions Result - slide deck ML Task - Learning A to B mapping where A is the input and B is the output Result - software Data Science, AI, Machine Learning? 3/8

Slide 10

Slide 10 text

developHER Remote Edition 2020 - A beginner’s guide to Data Science - Dânia Meira Examples of ML Input Output Application Picture Are there human faces? Photo tagging Loan application Will they repay the loan? Loan approvals Ad + User information Will user click on ad? Targeted online ads English sentence German sentence Language translation Recipe ingredients + customer reviews Will customer like the food? Food recommendation 10 Data Science, AI, Machine Learning? 4/8 Understand what are the pain points → Automate tasks not jobs

Slide 11

Slide 11 text

developHER Remote Edition 2020 - A beginner’s guide to Data Science - Dânia Meira Where to start? ● Identify the opportunity: Data Science Knowledge + Domain Knowledge ● Define clear KPIs to establish what your model should predict and how. ● Everyone’s on the same page about how the results can(not) be used to influence operations in your business from the very beginning. What DS can do What is valuable for your business How will we know if it helped or not? 11 Data Science, AI, Machine Learning? 5/8

Slide 12

Slide 12 text

developHER Remote Edition 2020 - A beginner’s guide to Data Science - Dânia Meira Data Science Project Workflow ● Cyclic workflow : Iteration ● For Machine Learning: Prepare Data, Train + Evaluate Model, Deploy model. ● For Analytics: Prepare Data, Analyze Data, Share Insights + Suggest Changes. Business & Data Understanding Evaluate Model Prepare Data Train Model Automate Model Serving Analyze Data Gather Results: Business Metrics and Process Performance Share Insights 12 Data Science, AI, Machine Learning? 6/8 Data Pipeline exists? Build Data Pipeline Testing, Deploying and Maintaining

Slide 13

Slide 13 text

developHER Remote Edition 2020 - A beginner’s guide to Data Science - Dânia Meira Examples with code KevinLiao159/MyDataSciencePortfolio: Applying Data Science and Machine Learning to Solve Real World Business Problems 10 Data Science Projects | Data Science and Machine Learning 13 Data Science, AI, Machine Learning? 7/8

Slide 14

Slide 14 text

developHER Remote Edition 2020 - A beginner’s guide to Data Science - Dânia Meira Intro to Machine Learning Tutorial in Kaggle A free online introduction to artificial intelligence for non-experts Learning and practicing - for free! Machine Learning for Everyone :: In simple words. With real-world examples. Yes, again 14 Data Science, AI, Machine Learning? 8/8

Slide 15

Slide 15 text

developHER Remote Edition 2020 - A beginner’s guide to Data Science - Dânia Meira l A beginner’s guide to Data Science Introduction Data science, AI, Machine learning? Data Roles and Skills #datacareer Orientation 15 developHER Remote Edition 2020 - A beginner’s guide to Data Science - Dânia Meira

Slide 16

Slide 16 text

developHER Remote Edition 2020 - A beginner’s guide to Data Science - Dânia Meira Data Science is a team sport! 16 Data Roles and Skills 1/5 Instead of one person with broad range of skill sets Combine experts and people with those skills on different levels to work together in a team Data Science Project Workflow Business & Data Understanding Evaluate Model Prepare Data Train Model Automate Model Serving Analyze Data Gather Results: Business Metrics and Process Performance Share Insights Build Data Pipeline

Slide 17

Slide 17 text

developHER Remote Edition 2020 - A beginner’s guide to Data Science - Dânia Meira Tasks Understand business case, build features to train predictive models to address such use cases Skills Statistics, SQL, programming (e.g. python, R), ML & DL techniques. Data Scientist Tasks Business and data understanding to report on what happens Skills Descriptive analytics, SQL, statistics, dashboarding and visualization tools Data Analyst Data Engineer Tasks Build and maintain infrastructure and pipeline to collect, clean and pre-process data Skills Distributed systems, databases, software engineering Tasks Optimize, deploy and maintain machine learning models in production Skills Software engineering, devOps and systems architecture Machine Learning Engineer 17 Data Roles and Skills 2/5 Data Roles and Skills

Slide 18

Slide 18 text

developHER Remote Edition 2020 - A beginner’s guide to Data Science - Dânia Meira Data Roles: Tasks ML Models Data Collection Data Quality Infrastructure Process Management Tools Monitoring Feature Extraction Analysis Data Preprocessing Parameter Configuration Offline Validation A/B Testing Data Engineer Data Scientist Data Analyst ML Engineer Data roles See also: “Hidden Technical Debt in Machine Learning System” by Sculley et al, Google inc, 2015 Machine Resource Management Configuration Business Logic 18 Data Roles and Skills 3/5

Slide 19

Slide 19 text

developHER Remote Edition 2020 - A beginner’s guide to Data Science - Dânia Meira ‚Cooking‘ data ML Models Data Collection Data Quality Infrastructure Process Management Tools Machine Resource Management Monitoring Configuration Feature Extraction Analysis Data Preprocessing Parameter Configuration Offline Validation Business Logic A/B Testing See also: Understanding a Machine Learning Workflow Through Food by Daniel Godoy Sowing Harvesting Choose recipe Prepare ingredients Customers tasting Kitchen Tasting Use utensils Try combinations of appliances and recipes Kitchen space and available appliances 19 Data Roles and Skills 4/5

Slide 20

Slide 20 text

developHER Remote Edition 2020 - A beginner’s guide to Data Science - Dânia Meira Understanding data roles Create and use recipes to cook Check quality of ingredients and recipes Process ingredients at scale Turn a recipe into many dishes served efficiently Data Engineer Data Scientist Data Analyst ML Engineer 20 Data Roles and Skills 5/5

Slide 21

Slide 21 text

developHER Remote Edition 2020 - A beginner’s guide to Data Science - Dânia Meira l A beginner’s guide to Data Science Introduction Data science, AI, Machine learning? Data Roles and Skills #datacareer Orientation 21 developHER Remote Edition 2020 - A beginner’s guide to Data Science - Dânia Meira

Slide 22

Slide 22 text

developHER Remote Edition 2020 - A beginner’s guide to Data Science - Dânia Meira Skills gap in corporate Europe #datacareer Orientation 1/5 22 developHER Remote Edition 2020 - A beginner’s guide to Data Science - Dânia Meira

Slide 23

Slide 23 text

developHER Remote Edition 2020 - A beginner’s guide to Data Science - Dânia Meira Skills gap in German industry #datacareer Orientation 2/5 23 developHER Remote Edition 2020 - A beginner’s guide to Data Science - Dânia Meira

Slide 24

Slide 24 text

developHER Remote Edition 2020 - A beginner’s guide to Data Science - Dânia Meira Skills gap among AI players in Germany #datacareer Orientation 3/5 24 developHER Remote Edition 2020 - A beginner’s guide to Data Science - Dânia Meira

Slide 25

Slide 25 text

developHER Remote Edition 2020 - A beginner’s guide to Data Science - Dânia Meira Promising use cases among AI players in Germany 25 #datacareer Orientation 4/5

Slide 26

Slide 26 text

developHER Remote Edition 2020 - A beginner’s guide to Data Science - Dânia Meira Prospective use cases in Germany 26 #datacareer Orientation 5/5

Slide 27

Slide 27 text

developHER Remote Edition 2020 - A beginner’s guide to Data Science - Dânia Meira 27 Your predictable path to senior level Friday, 27 November at 12:00 CET