Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Become a Data Scientist with OAC

FTisiot
March 21, 2019

Become a Data Scientist with OAC

FTisiot

March 21, 2019
Tweet

More Decks by FTisiot

Other Decks in Technology

Transcript

  1. [email protected] www.rittmanmead.com @rittmanmead
    Become a Data Scientist with OAC
    Francesco Tisiot
    BI Tech Lead at Rittman Mead

    View Slide

  2. [email protected] www.rittmanmead.com @rittmanmead !2
    Francesco Tisiot

    BI Tech Lead at Rittman Mead
    Verona, Italy

    Rittman Mead Blog

    10 Years Experience in BI/Analytics
    [email protected]

    @FTisiot

    Oracle ACE

    View Slide

  3. [email protected] www.rittmanmead.com @rittmanmead
    About Rittman Mead
    !3
    Rittman Mead is a data and analytics company who specialise in data visualisation, predictive
    analytics, enterprise reporting and data engineering.

    We use our skill, experience and know-how to work with organisations across the world to
    interpret their data. We enable the business, the consumers, the data providers and IT to work
    towards a common goal, delivering innovative and cost-effective solutions based on our
    core values of thought leadership, hard work and honesty.

    We work across multiple verticals on projects that range from mature, large scale
    implementations to proofs of concept and can provide skills in development, architecture,
    delivery, training and support.

    View Slide

  4. [email protected] www.rittmanmead.com @rittmanmead
    Let Me Know My Audience
    !4

    View Slide

  5. [email protected] www.rittmanmead.com @rittmanmead
    Agenda
    !5
    • OAC

    • Data Scientist

    • Steps to Become a Data Scientist with OAC

    View Slide

  6. [email protected] www.rittmanmead.com @rittmanmead
    Become a Data Scientist with OAC

    View Slide

  7. [email protected] www.rittmanmead.com @rittmanmead !7
    Oracle Analytics Cloud
    • Oracle’s complete suite of Platform Services (PaaS) for unified analytics in the cloud

    • Delivered entirely in the cloud:

    ‣ No infrastructure footprint

    ‣ Flexibility to scale up or down based on your immediate needs

    ‣ Simplified, metered licensing

    • Several options to suit your needs:

    ‣ Oracle or customer/partner managed services

    ‣ Functionality bundled into 3 editions

    View Slide

  8. [email protected] www.rittmanmead.com @rittmanmead
    • OAC supports every type of analytics workload across your organisation
    !8
    Functions
    • Classic enterprise BI:
    ‣ Analysis & dashboarding

    ‣ Published reporting

    ‣ Enterprise Performance Management
    • Modern departmental/personal discovery:
    ‣ Extended data mashup & modelling

    ‣ Data preparation, exploration & visualisation

    ‣ Data science & machine learning

    View Slide

  9. [email protected] www.rittmanmead.com @rittmanmead
    • Similar User Experience to OBIEE 12c
    ‣ Centrally maintained & governed

    ‣ Semantic model remains key

    • Interactive Dashboards
    ‣ Ideal for KPI measurement & monitoring

    ‣ Guided navigation paths

    • BI Publisher
    ‣ Highly formatted, burst outputs

    • Action Framework
    ‣ Navigation actions

    ‣ Scheduled agents
    !9
    Classic Enterprise BI

    View Slide

  10. [email protected] www.rittmanmead.com @rittmanmead
    • Data Preparation
    ‣ Acquire data from multiple connections

    ‣ Apply enrichments data prior to analysis

    ‣ Define repeatable preparation flows

    • Data Visualisation
    ‣ Create visual insights rapidly

    ‣ Construct narated storyboards

    ‣ Share findings

    • Machine Learning
    ‣ Build & train ML models

    ‣ Apply model to new data sets
    !10
    Modern Data Discovery

    View Slide

  11. [email protected] www.rittmanmead.com @rittmanmead !11
    Three Edition Options
    Enterprise Edition
    Data Lake Edition
    Standard Edition
    Data
    Discovery
    Data
    Preparation
    What-If
    Planning
    Big Data
    Storage
    Data
    Transformation via
    Apache Spark
    Data Lake
    Connectivity
    Enterprise
    Analysis &
    Dashboarding
    Published
    Reporting
    Day by
    Day

    View Slide

  12. [email protected] www.rittmanmead.com @rittmanmead !12
    Two Purchasing Options
    Monthly Flex Pay As You Go
    Based on Universal Credits model
    No minimum tenure
    Payments made in arrears
    Based on consumption
    Suitable for:
    Rapid Prototyping
    Testing & Sampling
    Elastic Scalable
    Based on Universal Credits model
    12 month minimum tenure
    Payments made in advance
    Unused credits are forfeited
    Suitable for:
    Predictable, production workloads
    Long running platforms

    View Slide

  13. [email protected] www.rittmanmead.com @rittmanmead
    Become a Data Scientist with OAC

    View Slide

  14. https://bigdata-madesimple.com/what-is-a-data-scientist-14-definitions-of-a-data-scientist/
    Data Scientist
    is a person who has the knowledge and skills
    to conduct sophisticated and systematic
    analyses of data.
    A data scientist extracts insights from
    data sets, and evaluates and identifies
    strategic opportunities.

    View Slide

  15. https://bigdata-madesimple.com/what-is-a-data-scientist-14-definitions-of-a-data-scientist/
    Data Scientist
    Is a Data Analyst
    who lives in California!

    View Slide

  16. Data Scientist Skills

    View Slide

  17. https://www.oralytics.com/2012/06/data-science-is-multidisciplinary.html
    Brendan Tierney
    Oracle Ace Director

    View Slide

  18. From Data Analyst to Data Scientist
    Tools Knowledge Experience
    Icons made by Icon Pond from www.flaticon.com

    View Slide

  19. Tools Knowledge Experience
    From Data Analyst to Data Scientist
    Icons made by Icon Pond from www.flaticon.com
    OPEN
    SOURCE
    ($)
    $$$$ $$$$

    View Slide

  20. Data Scientist …Company Missing a Data Scientist

    View Slide

  21. Low Hanging Fruit Theory
    Democratise
    Data Science

    View Slide

  22. Basic Operations
    What are the
    Drivers for My
    Sales?
    Based on my Experience
    I can Guess….
    Statistically Significant
    Drivers for Sales Are …
    Augmented
    Analytics

    View Slide

  23. Basic Operations
    Is this Client
    going to accept
    the Offer?
    YES/NO
    50%
    70%
    Basic ML
    Model

    View Slide

  24. Become a Data Scientist with OAC

    View Slide

  25. What He Really Does
    What Everybody Thinks
    a Data Scientist Does

    View Slide

  26. https://www.infoworld.com/article/3228245/data-science/the-80-20-data-science-dilemma.html

    View Slide

  27. Before Starting…. Define the Problem!

    View Slide

  28. Task Experience Performance
    Classify
    Spam/Not Spam
    TEP
    Corpus of Emails
    market as Spam/Not Spam
    Accuracy

    View Slide

  29. Become a Data Scientist with OAC
    Connect

    View Slide

  30. Connection Options in OAC
    Pre-Defined
    Data Models
    Data Sources

    View Slide

  31. Select Relevant Columns and Apply Filters

    View Slide

  32. Become a Data Scientist with OAC
    Connect Clean

    View Slide

  33. Cleaning What?
    N/A
    Missing Values
    Mark <> MArk
    Wrong Values
    City
    “Rome”
    Irrelevant Observations
    Role: CIO
    Salary:500 K$
    Handling Outliers
    Train: 80%
    Test: 20%
    Train/Test Set Split
    Col1 -> Name
    Labelling Columns
    0-200k
    0-1
    Feature Scaling
    # Of Clicks
    Aggregation

    View Slide

  34. Cleaning How?
    Data Flows
    - Filter
    - Aggregate
    - Join

    View Slide

  35. 0-200k
    0-1
    Feature Scaling
    Train: 80%
    Test: 20%
    Train/Test Set Split
    Col1 -> Name
    Labelling Columns
    City
    “Rome”
    Irrelevant Observations
    Mark <> MArk
    Wrong Values
    Cleaning What?
    N/A
    Missing Values
    Role: CIO
    Salary:500 K$
    Handling Outliers
    CASE …
    WHEN…
    UPPER FILTER
    COLUMN
    RENAME
    FILTER
    KPI/
    (MAX-MIN)
    FILTER?
    # of Clicks
    Aggragation
    COUNT

    View Slide

  36. Why Removing an Outlier?
    Years Experience Salary
    1 30.000
    2 32.000
    3 35.000
    4 35.500
    5 36.000
    6 40.000
    7 50.000
    8 70.000
    9 90.000
    10 500.000

    View Slide

  37. How To Find Outliers? One Dimension

    View Slide

  38. How To Find Outliers? Two Dimensions

    View Slide

  39. Become a Data Scientist with OAC
    Connect Clean
    Transform

    &

    Enrich

    View Slide

  40. Feature Engineering
    Location -> ZIP Code
    2 Locations -> Distance
    Name -> Sex
    Day/Month/Year -> Date
    Data Flow
    Additional
    Data
    Sources?

    View Slide

  41. Data Preparation Recommendations

    View Slide

  42. Become a Data Scientist with OAC
    Connect Clean
    Transform

    &

    Enrich
    Analyse

    View Slide

  43. Data Overview

    View Slide

  44. Explain

    View Slide

  45. Explain - Key Drivers

    View Slide

  46. Explain on Attribute

    View Slide

  47. Become a Data Scientist with OAC
    Connect Clean
    Analyse
    Train

    &

    Evaluate
    Transform

    &

    Enrich

    View Slide

  48. What Problem are we Trying to Solve?
    Supervised Unsupervised
    “I want to predict the value of Y,
    here are some examples”
    “Here is a dataset,
    make sense out of it!”
    Classification
    Regression
    https://towardsdatascience.com/supervised-vs-unsupervised-learning-14f68e32ea8d
    Clustering

    View Slide

  49. Easy Models

    View Slide

  50. DataFlow Train Model

    View Slide

  51. Which Model - Parameters To Pick?

    View Slide

  52. Select, Try, Save, Change, Try, Save …..

    View Slide

  53. Compare

    View Slide

  54. Compare

    View Slide

  55. There is No Single Truth…
    502/(502+896) = 64.09%
    471/(471+866)=64.77%
    Precision

    View Slide

  56. Custom ML Models
    Model_train.xml
    Parameter Definition
    Python
    Parameter Parsing
    Data Cleaning
    Model Storage
    Statistics Calculation
    svr=SVR(kernel=kernel, gamma=0.01, C= 5)
    SVR_Model = svr.fit(train_X, train_y)
    Model
    Definition & Training
    Model_test.xml
    https://www.oracle.com/solutions/business-analytics/data-visualization/library-overview.html

    View Slide

  57. Become a Data Scientist with OAC
    Connect Clean
    Analyse
    Train

    &

    Evaluate
    Predict
    Transform

    &

    Enrich

    View Slide

  58. Use On the Fly

    View Slide

  59. Step of a Data Flow

    View Slide

  60. Congratulations!
    …You are now a Data Scientist!

    View Slide

  61. … Not Really

    View Slide

  62. 97%
    95%
    90%
    80%
    60%
    50%
    .
    Required Knowledge

    View Slide

  63. …But
    80% > 50%
    Data Cleaning & Transformation
    Model Creation & Evaluation

    View Slide

  64. Deployment
    Involvement of a Data Scientist
    Move ML Close to the Data
    Oracle Advanced Analytics

    View Slide

  65. [email protected] www.rittmanmead.com @rittmanmead
    Become a Data Scientist with OAC
    Francesco Tisiot
    BI Tech Lead at Rittman Mead

    View Slide