Slide 1

Slide 1 text

[email protected] www.rittmanmead.com @rittmanmead Data Science Best Practises with OAC Francesco Tisiot BI Tech Lead at Rittman Mead

Slide 2

Slide 2 text

[email protected] www.rittmanmead.com @rittmanmead !2 Francesco Tisiot BI Tech Lead at Rittman Mead Verona, Italy Rittman Mead Blog 10 Years Experience in BI/Analytics [email protected] @FTisiot Oracle ACE

Slide 3

Slide 3 text

Specialised in data visualisation, predictive analytics, enterprise reporting and data engineering. Enabling the business, the consumers, the data providers and IT to work towards a common goal, delivering innovative and cost-effective solutions based on our core values of thought leadership, hard work and honesty. We work across multiple verticals on projects that range from mature, large scale implementations to proofs of concept and can provide skills in development, architecture, delivery, training and support. www.rittmanmead.com [email protected] @rittmanmead

Slide 4

Slide 4 text

Before Starting…. Define the Problem!

Slide 5

Slide 5 text

Task Experience Performance Classify Spam/Not Spam TEP Corpus of Emails market as Spam/Not Spam Accuracy

Slide 6

Slide 6 text

Become a Data Scientist with OAC Connect

Slide 7

Slide 7 text

Connection Options in OAC Pre-Defined Data Models Data Sources

Slide 8

Slide 8 text

Select Relevant Columns and Apply Filters

Slide 9

Slide 9 text

Become a Data Scientist with OAC Connect Clean

Slide 10

Slide 10 text

Cleaning What? N/A Missing Values Mark <> MArk Wrong Values City “Rome” Irrelevant Observations Role: CIO Salary:500 K$ Handling Outliers Train: 80% Test: 20% Train/Test Set Split Col1 -> Name Labelling Columns 0-200k 0-1 Feature Scaling # Of Clicks Aggregation

Slide 11

Slide 11 text

Cleaning How? Data Flows - Filter - Aggregate - Join

Slide 12

Slide 12 text

0-200k 0-1 Feature Scaling Train: 80% Test: 20% Train/Test Set Split Col1 -> Name Labelling Columns City “Rome” Irrelevant Observations Mark <> MArk Wrong Values Cleaning What? N/A Missing Values Role: CIO Salary:500 K$ Handling Outliers CASE … WHEN… UPPER FILTER COLUMN RENAME FILTER KPI/ (MAX-MIN) FILTER? # of Clicks Aggragation COUNT

Slide 13

Slide 13 text

How To Find Outliers? One Dimension

Slide 14

Slide 14 text

Become a Data Scientist with OAC Connect Clean Transform & Enrich

Slide 15

Slide 15 text

Feature Engineering Location -> ZIP Code 2 Locations -> Distance Name -> Sex Day/Month/Year -> Date Data Flow Additional Data Sources?

Slide 16

Slide 16 text

Data Preparation Recommendations

Slide 17

Slide 17 text

Become a Data Scientist with OAC Connect Clean Transform & Enrich Analyse

Slide 18

Slide 18 text

Data Overview

Slide 19

Slide 19 text

Explain

Slide 20

Slide 20 text

Explain - Key Drivers

Slide 21

Slide 21 text

Become a Data Scientist with OAC Connect Clean Analyse Train & Evaluate Transform & Enrich

Slide 22

Slide 22 text

What Problem are we Trying to Solve? Supervised Unsupervised “I want to predict the value of Y, here are some examples” “Here is a dataset, make sense out of it!” Classification Regression https://towardsdatascience.com/supervised-vs-unsupervised-learning-14f68e32ea8d Clustering

Slide 23

Slide 23 text

Easy Models

Slide 24

Slide 24 text

DataFlow Train Model

Slide 25

Slide 25 text

Which Model - Parameters To Pick?

Slide 26

Slide 26 text

Select, Try, Save, Change, Try, Save …..

Slide 27

Slide 27 text

Compare

Slide 28

Slide 28 text

Compare

Slide 29

Slide 29 text

Become a Data Scientist with OAC Connect Clean Analyse Train & Evaluate Predict Transform & Enrich

Slide 30

Slide 30 text

Use On the Fly

Slide 31

Slide 31 text

Step of a Data Flow

Slide 32

Slide 32 text

http://ritt.md/OAC-datascience

Slide 33

Slide 33 text

[email protected] www.rittmanmead.com @rittmanmead Data Science Best Practises with OAC Francesco Tisiot BI Tech Lead at Rittman Mead