Save 37% off PRO during our Black Friday Sale! »

Data Science with OAC Best Practises

Data Science with OAC Best Practises

A23789f299ed06fe7d9f1c6940440bfa?s=128

FTisiot

May 09, 2019
Tweet

Transcript

  1. info@rittmanmead.com www.rittmanmead.com @rittmanmead Data Science Best Practises with OAC Francesco

    Tisiot BI Tech Lead at Rittman Mead
  2. info@rittmanmead.com www.rittmanmead.com @rittmanmead !2 Francesco Tisiot BI Tech Lead at

    Rittman Mead Verona, Italy Rittman Mead Blog 10 Years Experience in BI/Analytics francesco.tisiot@rittmanmead.com @FTisiot Oracle ACE
  3. Specialised in data visualisation, predictive analytics, enterprise reporting and data

    engineering. Enabling the business, the consumers, the data providers and IT to work towards a common goal, delivering innovative and cost-effective solutions based on our core values of thought leadership, hard work and honesty. We work across multiple verticals on projects that range from mature, large scale implementations to proofs of concept and can provide skills in development, architecture, delivery, training and support. www.rittmanmead.com info@rittmanmead.com @rittmanmead
  4. Before Starting…. Define the Problem!

  5. Task Experience Performance Classify Spam/Not Spam TEP Corpus of Emails

    market as Spam/Not Spam Accuracy
  6. Become a Data Scientist with OAC Connect

  7. Connection Options in OAC Pre-Defined Data Models Data Sources

  8. Select Relevant Columns and Apply Filters

  9. Become a Data Scientist with OAC Connect Clean

  10. Cleaning What? N/A Missing Values Mark <> MArk Wrong Values

    City “Rome” Irrelevant Observations Role: CIO Salary:500 K$ Handling Outliers Train: 80% Test: 20% Train/Test Set Split Col1 -> Name Labelling Columns 0-200k 0-1 Feature Scaling # Of Clicks Aggregation
  11. Cleaning How? Data Flows - Filter - Aggregate - Join

  12. 0-200k 0-1 Feature Scaling Train: 80% Test: 20% Train/Test Set

    Split Col1 -> Name Labelling Columns City “Rome” Irrelevant Observations Mark <> MArk Wrong Values Cleaning What? N/A Missing Values Role: CIO Salary:500 K$ Handling Outliers CASE … WHEN… UPPER FILTER COLUMN RENAME FILTER KPI/ (MAX-MIN) FILTER? # of Clicks Aggragation COUNT
  13. How To Find Outliers? One Dimension

  14. Become a Data Scientist with OAC Connect Clean Transform &

    Enrich
  15. Feature Engineering Location -> ZIP Code 2 Locations -> Distance

    Name -> Sex Day/Month/Year -> Date Data Flow Additional Data Sources?
  16. Data Preparation Recommendations

  17. Become a Data Scientist with OAC Connect Clean Transform &

    Enrich Analyse
  18. Data Overview

  19. Explain

  20. Explain - Key Drivers

  21. Become a Data Scientist with OAC Connect Clean Analyse Train

    & Evaluate Transform & Enrich
  22. What Problem are we Trying to Solve? Supervised Unsupervised “I

    want to predict the value of Y, here are some examples” “Here is a dataset, make sense out of it!” Classification Regression https://towardsdatascience.com/supervised-vs-unsupervised-learning-14f68e32ea8d Clustering
  23. Easy Models

  24. DataFlow Train Model

  25. Which Model - Parameters To Pick?

  26. Select, Try, Save, Change, Try, Save …..

  27. Compare

  28. Compare

  29. Become a Data Scientist with OAC Connect Clean Analyse Train

    & Evaluate Predict Transform & Enrich
  30. Use On the Fly

  31. Step of a Data Flow

  32. http://ritt.md/OAC-datascience

  33. info@rittmanmead.com www.rittmanmead.com @rittmanmead Data Science Best Practises with OAC Francesco

    Tisiot BI Tech Lead at Rittman Mead