Save 37% off PRO during our Black Friday Sale! »

Become a Data Scientist

A23789f299ed06fe7d9f1c6940440bfa?s=47 FTisiot
September 16, 2019

Become a Data Scientist

How Oracle Analytics Cloud can speed your path to Data Science

A23789f299ed06fe7d9f1c6940440bfa?s=128

FTisiot

September 16, 2019
Tweet

Transcript

  1. Become a Data Scientist Francesco Tisiot Analytics Tech Lead

  2. Verona, Italy http://ritt.md/ftisiot Over10 Years in Analytics ft@rittmanmead.com @FTisiot Oracle

    ACE Director ITOUG Board President Francesco Tisiot Analytics Tech Lead
  3. Copyright © 2017, Oracle and/or its affiliates. All rights reserved.

    | bit.ly/OracleACEProgram 450+ Technical Experts Helping Peers Globally Nominate yourself or someone you know: acenomination.oracle.com
  4. Data Engineering Analytics Data Science www.rittmanmead.com info@rittmanmead.com @rittmanmead

  5. Agenda •OAC •Data Scientist •Become a Data Scientist

  6. Oracle Analytics Cloud • Platform Services (PaaS) • Delivered entirely

    in the cloud: •No infrastructure footprint •Flexibility •Simplified, metered licensing • Several options to suit your needs: •BYOL •Functionality bundled into 2 editions •Professional •Enterprise
  7. Functions OAC supports Every type of analytics Classic Modern

  8. Classic Enterprise BI • Similar to OBIEE 12c •Centrally maintained

    & governed •Semantic model • Interactive Dashboards •KPI measurement & monitoring •Guided navigation paths • BI Publisher •Highly formatted, burst outputs • Action Framework •Navigation actions •Scheduled agents
  9. Modern Data Discovery • Data Preparation •Acquire data •Clean/Enrich •Transform

    •Repeatable Flows • Data Visualisation •Create visual insights rapidly •Construct narrated storyboards •Share findings
  10. Unique Source of Truth Raw Data To Insights Specific Access

    Control Data Enrichment and Cleaning Unified Analytics Free Discovery Centralised Reporting https://speakerdeck.com/ftisiot/become-an-equilibrista-find-the-right-balance-in-the-analytics-tech-ecosystem
  11. Augmented Analytics Data Enrichment Suggestions Explain One-Click Advanced Analytics Advanced

    Machine Learning Natural Language Processing
  12. Data Scientist

  13. https://bigdata-madesimple.com/what-is-a-data-scientist-14-definitions-of-a-data-scientist/ Data Scientist Is a person who has the knowledge

    and skills to conduct sophisticated and systematic analyses of data. A data scientist extracts insights from data sets, and evaluates and identifies strategic opportunities.
  14. https://bigdata-madesimple.com/what-is-a-data-scientist-14-definitions-of-a-data-scientist/ D ata Scientist Is a Data Analyst who lives

    in California!
  15. Data Scientist Skills

  16. https://www.oralytics.com/2012/06/data-science-is-multidisciplinary.html Brendan Tierney Oracle Ace Director

  17. Data Scientist …Company Missing a Data Scientist

  18. Low Hanging Fruit Theory Democratise Data Science

  19. Basic Operations What are the Drivers for My Sales? Based

    on my Experience I can Guess…. Statistically Significant Drivers for Sales Are … Augmented Analytics
  20. Basic Operations Is this Client going to accept the Offer?

    YES/NO 50% 70% Basic ML Model
  21. Become a Data Scientist with OAC

  22. Before Starting…. Define the Problem!

  23. Problem Definition: Predicting Wine Quality

  24. Task Experience Performance Classify Good/Bad Wine TEP Corpus of Wine

    Descriptions with Rating Accuracy
  25. Become a Data Scientist with OAC Connect

  26. Connection Options in OAC Pre-Defined Data Models External Data Sources

  27. Select Relevant Columns and Apply Filters

  28. Become a Data Scientist with OAC Connect Clean

  29. What He Really Does What Everybody Thinks a Data Scientist

    Does
  30. https://www.infoworld.com/article/3228245/data-science/the-80-20-data-science-dilemma.html

  31. Cleaning What? N/A Missing Values Mark <> MArk Wrong Values

    City “Rome” Irrelevant Observations Role: CIO Salary:500 K$ Handling Outliers Train: 80% Test: 20% Train/Test Set Split Col1 -> Name Labelling Columns 0-200k 0-1 Feature Scaling # Of Clicks Aggregation
  32. Cleaning How? Data Flows - Filter - Aggregate - Join

  33. 0-200k 0-1 Feature Scaling Train: 80% Test: 20% Train/Test Set

    Split Col1 -> Name Labelling Columns City “Rome” Irrelevant Observations Mark <> MArk Wrong Values Cleaning What? N/A Missing Values Role: CIO Salary:500 K$ Handling Outliers CASE … WHEN… UPPER FILTER COLUMN RENAME FILTER KPI/ (MAX-MIN) FILTER? # of Clicks Aggregation COUNT Automated Automated Automated
  34. Why Removing an Outlier? Years Experience Salary 1 30.000 2

    32.000 3 35.000 4 35.500 5 36.000 6 40.000 7 50.000 8 70.000 9 90.000 10 500.000
  35. How To Find Outliers? One Dimension

  36. How To Find Outliers? Two Dimensions

  37. Become a Data Scientist with OAC Connect Clean Transform &

    Enrich
  38. Feature Engineering Location -> ZIP Code 2 Locations -> Distance

    Name -> Sex Day/Month/Year -> Date Data Flow Additional Data Sources?
  39. Data Preparation Recommendations

  40. Spatial Enrichment Oracle Spatial Studio http://ritt.md/spatial-studio

  41. Become a Data Scientist with OAC Connect Clean Transform &

    Enrich Analyse
  42. Data Overview

  43. Explain

  44. Explain - Key Drivers

  45. Become a Data Scientist with OAC Connect Clean Analyse Train

    & Evaluate Transform & Enrich
  46. What Problem are we Trying to Solve? Supervised Unsupervised “I

    want to predict the value of Y, here are some examples” “Here is a dataset, make sense out of it!” Classification Regression https://towardsdatascience.com/supervised-vs-unsupervised-learning-14f68e32ea8d Clustering
  47. Easy Models

  48. NLP

  49. DataFlow Train Model

  50. Which Model - Parameters?

  51. Select, Try, Save, Change, Try, Save …..

  52. Compare - Classification Real Value Predicted Value Good Bad Bad

    Good
  53. There is No Single Truth… 502/(502+896) = 64.09% 471/(471+866)=64.77% Precision

  54. Compare - Regression

  55. Become a Data Scientist with OAC Connect Clean Analyse Train

    & Evaluate Predict Transform & Enrich
  56. Use On the Fly

  57. Step of a Data Flow

  58. Congratulations! …You are now a Data Scientist!

  59. Nearly There

  60. 97% 95% 90% 80% 60% 50% . Required Knowledge

  61. …But 80% > 50% Data Cleaning Model Creation & Evaluation

    Feature Engineering Feature Selection
  62. ML Production Deployment Data Scientist ML -> Data Oracle Machine

    Learning
  63. Become a Data Scientist with OAC http://ritt.md/OAC-datascience

  64. ML in Action with OAC http://ritt.md/OAC-ML-Video

  65. https://www.rittmanmead.com/insight-lab/ Insights Lab

  66. Data Science O AC

  67. Become a Data Scientist Francesco Tisiot Analytics Tech Lead