Save 37% off PRO during our Black Friday Sale! »

Become a Data Scientist with OAC

A23789f299ed06fe7d9f1c6940440bfa?s=47 FTisiot
March 21, 2019

Become a Data Scientist with OAC

A23789f299ed06fe7d9f1c6940440bfa?s=128

FTisiot

March 21, 2019
Tweet

Transcript

  1. info@rittmanmead.com www.rittmanmead.com @rittmanmead Become a Data Scientist with OAC Francesco

    Tisiot BI Tech Lead at Rittman Mead
  2. info@rittmanmead.com www.rittmanmead.com @rittmanmead !2 Francesco Tisiot BI Tech Lead at

    Rittman Mead Verona, Italy Rittman Mead Blog 10 Years Experience in BI/Analytics francesco.tisiot@rittmanmead.com @FTisiot Oracle ACE
  3. info@rittmanmead.com www.rittmanmead.com @rittmanmead About Rittman Mead !3 Rittman Mead is

    a data and analytics company who specialise in data visualisation, predictive analytics, enterprise reporting and data engineering. We use our skill, experience and know-how to work with organisations across the world to interpret their data. We enable the business, the consumers, the data providers and IT to work towards a common goal, delivering innovative and cost-effective solutions based on our core values of thought leadership, hard work and honesty. We work across multiple verticals on projects that range from mature, large scale implementations to proofs of concept and can provide skills in development, architecture, delivery, training and support.
  4. info@rittmanmead.com www.rittmanmead.com @rittmanmead Let Me Know My Audience !4

  5. info@rittmanmead.com www.rittmanmead.com @rittmanmead Agenda !5 • OAC • Data Scientist

    • Steps to Become a Data Scientist with OAC
  6. info@rittmanmead.com www.rittmanmead.com @rittmanmead Become a Data Scientist with OAC

  7. info@rittmanmead.com www.rittmanmead.com @rittmanmead !7 Oracle Analytics Cloud • Oracle’s complete

    suite of Platform Services (PaaS) for unified analytics in the cloud • Delivered entirely in the cloud: ‣ No infrastructure footprint ‣ Flexibility to scale up or down based on your immediate needs ‣ Simplified, metered licensing • Several options to suit your needs: ‣ Oracle or customer/partner managed services ‣ Functionality bundled into 3 editions
  8. info@rittmanmead.com www.rittmanmead.com @rittmanmead • OAC supports every type of analytics

    workload across your organisation !8 Functions • Classic enterprise BI: ‣ Analysis & dashboarding ‣ Published reporting ‣ Enterprise Performance Management • Modern departmental/personal discovery: ‣ Extended data mashup & modelling ‣ Data preparation, exploration & visualisation ‣ Data science & machine learning
  9. info@rittmanmead.com www.rittmanmead.com @rittmanmead • Similar User Experience to OBIEE 12c

    ‣ Centrally maintained & governed ‣ Semantic model remains key • Interactive Dashboards ‣ Ideal for KPI measurement & monitoring ‣ Guided navigation paths • BI Publisher ‣ Highly formatted, burst outputs • Action Framework ‣ Navigation actions ‣ Scheduled agents !9 Classic Enterprise BI
  10. info@rittmanmead.com www.rittmanmead.com @rittmanmead • Data Preparation ‣ Acquire data from

    multiple connections ‣ Apply enrichments data prior to analysis ‣ Define repeatable preparation flows • Data Visualisation ‣ Create visual insights rapidly ‣ Construct narated storyboards ‣ Share findings • Machine Learning ‣ Build & train ML models ‣ Apply model to new data sets !10 Modern Data Discovery
  11. info@rittmanmead.com www.rittmanmead.com @rittmanmead !11 Three Edition Options Enterprise Edition Data

    Lake Edition Standard Edition Data Discovery Data Preparation What-If Planning Big Data Storage Data Transformation via Apache Spark Data Lake Connectivity Enterprise Analysis & Dashboarding Published Reporting Day by Day
  12. info@rittmanmead.com www.rittmanmead.com @rittmanmead !12 Two Purchasing Options Monthly Flex Pay

    As You Go Based on Universal Credits model No minimum tenure Payments made in arrears Based on consumption Suitable for: Rapid Prototyping Testing & Sampling Elastic Scalable Based on Universal Credits model 12 month minimum tenure Payments made in advance Unused credits are forfeited Suitable for: Predictable, production workloads Long running platforms
  13. info@rittmanmead.com www.rittmanmead.com @rittmanmead Become a Data Scientist with OAC

  14. https://bigdata-madesimple.com/what-is-a-data-scientist-14-definitions-of-a-data-scientist/ Data Scientist is a person who has the knowledge

    and skills to conduct sophisticated and systematic analyses of data. A data scientist extracts insights from data sets, and evaluates and identifies strategic opportunities.
  15. https://bigdata-madesimple.com/what-is-a-data-scientist-14-definitions-of-a-data-scientist/ Data Scientist Is a Data Analyst who lives in

    California!
  16. Data Scientist Skills

  17. https://www.oralytics.com/2012/06/data-science-is-multidisciplinary.html Brendan Tierney Oracle Ace Director

  18. From Data Analyst to Data Scientist Tools Knowledge Experience Icons

    made by Icon Pond from www.flaticon.com
  19. Tools Knowledge Experience From Data Analyst to Data Scientist Icons

    made by Icon Pond from www.flaticon.com OPEN SOURCE ($) $$$$ $$$$
  20. Data Scientist …Company Missing a Data Scientist

  21. Low Hanging Fruit Theory Democratise Data Science

  22. Basic Operations What are the Drivers for My Sales? Based

    on my Experience I can Guess…. Statistically Significant Drivers for Sales Are … Augmented Analytics
  23. Basic Operations Is this Client going to accept the Offer?

    YES/NO 50% 70% Basic ML Model
  24. Become a Data Scientist with OAC

  25. What He Really Does What Everybody Thinks a Data Scientist

    Does
  26. https://www.infoworld.com/article/3228245/data-science/the-80-20-data-science-dilemma.html

  27. Before Starting…. Define the Problem!

  28. Task Experience Performance Classify Spam/Not Spam TEP Corpus of Emails

    market as Spam/Not Spam Accuracy
  29. Become a Data Scientist with OAC Connect

  30. Connection Options in OAC Pre-Defined Data Models Data Sources

  31. Select Relevant Columns and Apply Filters

  32. Become a Data Scientist with OAC Connect Clean

  33. Cleaning What? N/A Missing Values Mark <> MArk Wrong Values

    City “Rome” Irrelevant Observations Role: CIO Salary:500 K$ Handling Outliers Train: 80% Test: 20% Train/Test Set Split Col1 -> Name Labelling Columns 0-200k 0-1 Feature Scaling # Of Clicks Aggregation
  34. Cleaning How? Data Flows - Filter - Aggregate - Join

  35. 0-200k 0-1 Feature Scaling Train: 80% Test: 20% Train/Test Set

    Split Col1 -> Name Labelling Columns City “Rome” Irrelevant Observations Mark <> MArk Wrong Values Cleaning What? N/A Missing Values Role: CIO Salary:500 K$ Handling Outliers CASE … WHEN… UPPER FILTER COLUMN RENAME FILTER KPI/ (MAX-MIN) FILTER? # of Clicks Aggragation COUNT
  36. Why Removing an Outlier? Years Experience Salary 1 30.000 2

    32.000 3 35.000 4 35.500 5 36.000 6 40.000 7 50.000 8 70.000 9 90.000 10 500.000
  37. How To Find Outliers? One Dimension

  38. How To Find Outliers? Two Dimensions

  39. Become a Data Scientist with OAC Connect Clean Transform &

    Enrich
  40. Feature Engineering Location -> ZIP Code 2 Locations -> Distance

    Name -> Sex Day/Month/Year -> Date Data Flow Additional Data Sources?
  41. Data Preparation Recommendations

  42. Become a Data Scientist with OAC Connect Clean Transform &

    Enrich Analyse
  43. Data Overview

  44. Explain

  45. Explain - Key Drivers

  46. Explain on Attribute

  47. Become a Data Scientist with OAC Connect Clean Analyse Train

    & Evaluate Transform & Enrich
  48. What Problem are we Trying to Solve? Supervised Unsupervised “I

    want to predict the value of Y, here are some examples” “Here is a dataset, make sense out of it!” Classification Regression https://towardsdatascience.com/supervised-vs-unsupervised-learning-14f68e32ea8d Clustering
  49. Easy Models

  50. DataFlow Train Model

  51. Which Model - Parameters To Pick?

  52. Select, Try, Save, Change, Try, Save …..

  53. Compare

  54. Compare

  55. There is No Single Truth… 502/(502+896) = 64.09% 471/(471+866)=64.77% Precision

  56. Custom ML Models Model_train.xml Parameter Definition Python Parameter Parsing Data

    Cleaning Model Storage Statistics Calculation svr=SVR(kernel=kernel, gamma=0.01, C= 5) SVR_Model = svr.fit(train_X, train_y) Model Definition & Training Model_test.xml https://www.oracle.com/solutions/business-analytics/data-visualization/library-overview.html
  57. Become a Data Scientist with OAC Connect Clean Analyse Train

    & Evaluate Predict Transform & Enrich
  58. Use On the Fly

  59. Step of a Data Flow

  60. Congratulations! …You are now a Data Scientist!

  61. … Not Really

  62. 97% 95% 90% 80% 60% 50% . Required Knowledge

  63. …But 80% > 50% Data Cleaning & Transformation Model Creation

    & Evaluation
  64. Deployment Involvement of a Data Scientist Move ML Close to

    the Data Oracle Advanced Analytics
  65. info@rittmanmead.com www.rittmanmead.com @rittmanmead Become a Data Scientist with OAC Francesco

    Tisiot BI Tech Lead at Rittman Mead